Evaluation of Admission Process: Written Communication 1 Kate Kaiser, BS, PA-S Christine Reichart, BS, PA-S Jennifer Snyder, MPAS, PA-C Larry Vandermolen, BS, MM, PA-S Jennifer Zorn, MS, PA-C Today’s Agenda 2 Cognitive and non-cognitive factors in the admission process Review current admission process Introduce automated essay scoring Current findings utilizing an automatic essay scoring system Our ongoing research Recommendations for improvement to the current admission process Cognitive and Non-Cognitive Factors in the Admission Process 3 Cognitive or quantitative variables are known predictors of success for applicants seeking admission into and graduating from many healthcare programs Pre-professional grade point average (GPA) and standardized scores Non-cognitive abilities are less consistent in predicting success Oral and written communication skills Notwithstanding, many admission committees believe that both cognitive and non-cognitive factors are important Our Current Admission Process 4 Individuals are required to submit an application to the Central Application Service for Physician Assistants (CASPA) This includes: Candidate’s demographics Academic record Experience in healthcare Personal essay Admission Process: The Personal Essay 5 Today, our focus is on the personal essay and the program’s most recent admission process Personal Essay Evaluation 6 A pool of community physician assistants was recruited to participate in the review and evaluation of the candidates’ CASPA essays Two physician assistants evaluate each essay utilizing a program-developed Likert scale rubric Personal Essay Evaluation 7 The program’s idiosyncratic rubric defines a basis for scoring the essay in three categories: 1. 2. 3. Spelling and grammar Organization and readability The ability of the applicant to answer the CASPA essay topic of “describe the motivation towards becoming a PA” The PA evaluators independently assign scores to the three categories, which are then totaled, and an average is determined between the two evaluator’s scores The Coordination and Effort is Significant 8 In the most recent admission cycle, there were over 1,000 essays evaluated Including those essays reviewed by a third evaluator Our Current Admission Process 9 Together, the personal essays, GPA, and healthcare experience points are totaled and ranked from highest to lowest Invitations for interviews to assess oral communication skills are offered to approximately the top 90 ranked candidates After completion of the interviews, the interview scores are combined with previous subtotals and offers of admission are extended to approximately the top 50 candidates Limitations of Personal Essay Evaluation 10 The essays are prepared in advance by the applicants and submitted to CASPA This raises an important question: to what extent does this system actually assess the applicant’s writing ability? Limitations of Personal Essay Evaluation 11 Community PAs are untrained to analyze sample essays and may not be capable themselves of accurately evaluating written work Process relies on a program-developed rubric Two evaluators must disagree on the score for a writing sample by 50% of the total score before a third evaluation of the writing sample takes place Unlimited time to reflect on essays is not congruent with the thinking process required of a PA in professional practice Limitations of Personal Essay Evaluation 12 Resource Depleting Paper Trail Internal Delays Deadline Crunch Essays are not de-identified Interrater reliability; validity? Admission Process: Personal Essay Evaluation 13 As consumers, applicants can be considerably more demanding, increasingly requesting detailed information regarding their performance in the entire admission’s process, not simply accepting an “admit,” “wait list,” or “non-admit” decisions Often, the applicant requests specific information to guide future direction if admission is not initially offered And Yet We Have Been Successful… 14 Our program has been successful in selecting very capable students who regularly achieve above average national board scores, and who in the past, have received positive reviews from physicians employing them 15 What Can the Program Do? Rudimentary Computer Scoring 16 By 1997, Microsoft Office® incorporated grammar checking as a tool for users of the software Readability Scores Determines both Reading Ease, Flesch-Kincaid Grade levels, and word count Grammar-checking software effectively evaluates essays between 500 to 1000 words covering a wide range of topics Automated Essay Scoring (AES) 17 Evaluation and scoring of written prose via computer technology Based on a set of pre-scored essays Perfect test-retest reliability Used to overcome time, cost, reliability, and generalizability issues in writing assessments The automated system applies the scoring criteria uniformly and mechanically avoiding the fluctuations found in untrained graders Works well on short descriptive essays encompassing a wide range of topics Between 500-1000 words Advantage Learning IntelliMetric ® Software Automated Essay Scoring 18 AES product that utilizes artificial intelligence, natural language processing, and statistical analyses to score and evaluate written prose Vantage Learning IntelliMetric® Rubric Domains 19 Domain Area of Evaluation Is there a main idea, and is it consistently Focus and Unity supported? Development and Are the supporting ideas varied, well Elaboration developed, and elaborative? Does the essay logically transition ideas from Organization and introduction, supporting paragraphs, and Structure conclusion? Sentence Structure Is there syntactic complexity and variety? Mechanics and Conventions Does the essay follow rules of standard American English? Limitations of AES 20 There are some common criticisms of AES software like IntelliMetric® First, it is possible to respond to an essay question using appropriate keywords and synonyms but the essay may still lack a comprehensible answer A second criticism of AES software like IntelliMetric® is that it requires a great deal of effort to write multiple model answers to essay topics to “train” the software that properly grades the writing samples Some critics wonder if it is possible for a computer to “artificially think” to generate domain and a holistic score Study Methods 21 The study protocol was reviewed by Butler University’s institutional review board involving human subjects and approved as exempt Of the 521 applicants in the most recent admission cycle, the top 90 were selected for interviews using the program’s standard evaluation process A twenty-five minute, onsite written essay was then required of each candidate as part of the interview process The topic chosen for the onsite essay was non-medical and pre-developed by Vantage Learning Study Methods 22 Two of the 90 candidates did not submit an onsite essay Completed onsite essays were reviewed by a faculty member to excise any identifying names or dates, and were assigned random identification numbers To ensure uniformity, all essays were reduced to single-spaced documents Controls: Fabricated Essays 23 These fabricated essays included: Two essays were well-written but in response to a different essay topic One essay was with simple repetition of the topic Four sentences written on an essay topic and then simply repeated in subsequent paragraphs in a different sequence An essay with the initial half consisting of a well-written response to the essay topic, and the second half consisting of a simple repeat of the essay topic, not a response to the topic One essay that responded to the topic and was considered of good quality Study Methods 24 While the IntelliMetric® license fee was reduced, the study was conducted independent of Vantage Learning, the licensor of IntelliMetric® Study Methods 25 For consistency, the PAs who assessed the onsite and fabricated essays were from a group of community PA volunteers who reviewed CASPA essays in the past Each onsite essay was evaluated by two community PA volunteers using the programmatic rubric and by two other community PAs using a hard copy of the IntelliMetric® rubric Study Methods 26 As a means of rudimentary comparative analysis, onsite and fabricated essays were evaluated by Microsoft Word® version 2003 to obtain the Flesch Reading Ease, FleschKincaid Level, and Word count De-identified, random numbered, onsite and fabricated essays were electronically submitted for automated scoring utilizing the IntelliMetric® systems to Vantage Learning Once results were received, data were maintained on an Excel spreadsheet and statistical analyses performed using Statistical Package for the Social Sciences (SPSS), version 15 Null Hypotheses 27 1. 2. 3. 4. 5. 6. Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite written essays There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays There is no correlation in the scores between the methods of evaluation of onsite essays There is no correlation in the community PA scores between the programmatic and IntelliMetric® rubric of onsite essays Utilizing the IntelliMetric® rubric, there is no difference between the scores of onsite essays evaluated by the AES system and community PAs There is no correlation between the candidates’ totaled scores evaluated by the seven methods of onsite essay evaluation and GPA Descriptive Statistics for Methods of Evaluation N = 88 28 Possible Range Mean S.D. (+/-) Range 0 - 10 8.48 1.26 2 – 10 ∞ 357.93 107.61 142 - 687 Flesch Reading Ease 0 - 100 58.90 9.27 40.4 - 78 Flesch Kincaid Level Grade Level 9.46 1.93 5.9 – 14.2 AES 5 - 30 15.44 4.11 5 - 25 Onsite Community PA Programmatic Rubric 0 - 10 7.15 1.61 3 - 10 Onsite Community PA IntelliMetric® Rubric 5 - 30 21.57 3.86 8 - 30 Method of Evaluation CASPA Essay^ Word Count ^ N = 78 Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite written essays 29 To determine if there was a statistically significant difference between the ranked difference scores, a Wilcoxon Signed Rank test was utilized There was a statistically significant difference z = -5.025, p < 0.01 Therefore, the hypothesis of no difference is rejected Utilizing the programmatic rubric, the community PAs evaluated the onsite essay 57 out of 78 times lower than the CASPA essay Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite written essays 30 The students may have been incapable of composing a written response to the onsite essay as they had for the essay prepared in advance for CASPA because they felt pressured or constrained by time It is unclear as found in previously reported studies if or to what extent they received help in developing the prepared essay’s content and grammar or spelling Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is no difference between the scores of the CASPA and onsite written essays 31 An onsite essay significantly eliminates doubt regarding the origin of the essay and is an essential step in actually assessing the applicant’s writing ability Null Hypothesis 2: There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays 32 To evaluate the consistency of the community PA scores for each domain of the programmatic rubric for the CASPA essay, the corresponding scores were examined by agreement statistics with perfect, adjacent, discrepant, and perfect + adjacent agreement percentages Null Hypothesis 2 Results: There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays 33 CASPA Essay* Rater 1 Mean Scores Versus Rater 2 Mean Scores: Agreement Statistics Evaluating the Domain Scores using the Programmatic Rubric Exact (%) Adjacent (%) Perfect + Adjacent (%) Discrepant (%) Rater 1 Mean Scores S. D. (+/-) Rater 2 Mean Scores S. D. (+/-) Grammar and Spelling (3) 56.4 42.3 98.7 0.013 2.67 0.54 2.59 0.55 Organization & Readability (3) 43.5 53.8 97.3 0.025 2.57 0.56 2.62 0.53 Motivation to become PA (4) 38.4 39.7 78.1 21.8 3.1667 0.88 3.3846 0.80 Domain (Total Points) * N = 78 Null Hypothesis 2 Results: There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays 34 Further, the agreement between the corresponding domain scores for the CASPA essays was examined by intraclass correlation at the 0.05 level of significance by two-way random, average measures with absolute agreement Null Hypothesis 2 Results: CASPA Essay Intraclass Correlation for Rater 1 Mean Scores Versus Rater 2 Mean Scores: Evaluating the Domain Scores using the Programmatic Rubric, N = 78 35 95% Confidence Interval Domain ICC Lower Bound Upper Bound Significance Grammar and Spelling 0.378 0.026 0.603 0.019* Organization and Readability -0.069 -0.685 0.321 0.613 Motivation to Become a PA 0.166 -0.291 0.464 0.208 *p is significant at < 0.05 Null Hypothesis 2 Discussion: There is no rater agreement and no correlation in corresponding domain scores of the CASPA essays 36 While the level is statistically significant, too many external sources may be confounding the findings and no meaningful relationship exists as indicated by the low ICC value Therefore, the community PAs evaluating the CASPA essay resulted in unreliable scoring outcomes Null Hypothesis 3: There is no correlation in the scores between the methods of evaluation of onsite essays. 37 The six methods of evaluation of onsite essays were normalized using Z scores The ICC (1, 6) was calculated to compare the reliabilities of the methods The ICC (1, 6) = 0.410, p < 0.01 Two-way random, average measure with absolute agreement While it is statistically significant, the hypothesis is rejected However, because the correlation is so low No meaningful relationship exists between the methods of evaluation of onsite essays Null Hypothesis 4: There is no correlation in the community PA scores between the programmatic and IntelliMetric® rubric of onsite essays 38 Onsite essays were evaluated by ICC comparing the programmatic and IntelliMetric® rubrics used by the community PA evaluators ICC (1, 2) = 0.567, p < 0.01 While the results are statistically significant, a minimal meaningful relationship exists between the scores of the community PA utilizing the programmatic and IntelliMetric® rubrics of onsite essays Null Hypothesis 5: Utilizing the IntelliMetric ® rubric, there is no difference between the scores of onsite essays evaluated by the AES system and community PAs 39 There was a statistically significant difference of the totaled scores between the onsite essays evaluated by the community PAs utilizing the IntelliMetric® rubric and the AES totaled outcome by the Wilcoxon Signed Rank test with a z = -7.542, p < 0.01. The community PAs’ mean average rating was higher in 82 of the 88 essays Null Hypothesis 6: There is no correlation between the candidates’ totaled scores evaluated by the seven methods of onsite essay evaluation and GPA 40 Spearman Rank Correlation Coefficient of Essay Scores Evaluated by Different Methods and GPA, N = 88 Spearman Coefficient Significance CASPA Essay^ -0.260 0.022* Community PA Programmatic Rubric 0.076 0.479 Community PA IntelliMetric Rubric 0.170 0.112 AES Scoring 0.307 0.004* Word Count 0.237 0.026* Flesch Reading Ease -0.067 0.536 Flesch Kincaid 0.122 0.257 Correlation ^ N = 78; *p is significant < 0.05 Hypothesis 6 Discussion: 41 The Spearman Rank correlation was to evaluate a possible relationship between GPA and the candidates’ individual totaled essay scores As previously reported, essay length is important to a certain number of words so that concepts and ideas may be developed; however, beyond this point, the essay length does not add to the positive outcome of the essay It seems reasonable to assume that an individual who has a higher GPA likely is able to write an essay more effectively than those with a lesser GPA Power Analysis Post Hoc 42 N= Power % Effect Size Cohen’s d Wilcoxon CASPA vs. Onsite Community PA Programmatic Rubric 78 100 0.92 AES vs. Community PA IntelliMetric® Rubric 88 100 1.54 Correlation r Spearman (vs. GPA) CASPA Essay 78 100 0.94 Word Count 88 100 0.92 Flesch Reading Ease 88 100 0.97 Flesch Kincaid Level 88 100 0.91 AES 88 100 0.90 Onsite Community PA Programmatic Rubric 88 100 0.84 Onsite Community PA IntelliMetric® Rubric 88 100 0.96 Fabricated Essays 43 The five of six fabricated essays were identified by the Vantage Learning IntelliMetric® System The same was not true of the community PA evaluators Limitations of the Study 44 Generalizability of results limited to this program Analysis compares the AES scoring from IntelliMetric® to a known flawed system Validation limited Ongoing Study 45 Outcome data to determine the correlation between the onsite essay AES score and the first semester GPA of the candidates who matriculate into our program Future Studies 46 Challenge all of the methods of evaluation for intrarater reliability by submitting two of the same essays with different identification numbers to determine if the grading outcome would be the same Consider fixing raters to specific groups in the random evaluation of essays Consider utilizing two, twenty-five minute timed essays for reasons of reliability and construct validity Consider investigating students’ comfort levels and test anxiety with using computerized writing test and paper-andpencil writing test by age, gender, and ethnicity Conclusion 47 The purpose of this study is to support that there may be a much more effective and reliable way to evaluate the writing skills of candidates for admission to the PA program than the utilization of community PAs Questions exist as to whether the current, labor-intensive process of essay review by volunteer community PAs, is a reliable process Not only is there uncertainty about the source of the essay itself; there is also uncertainty about the consistency and quality of the essay review skills of the community PAs Serious consideration should be given to incorporate AES into the admission process This would reduce the time spent waiting for community PAs to evaluate the essays, reduce the cost of postage, and potentially increase the reliability of the essay scoring References 48 Accreditation Review Commission on Education for the Physician Assistant Standards of Accreditation A2.05b. http://www.arcpa.org/Standards/standards.html. Accessed July 7, 2008. Campbell A, Dickson C. Predicting student success: a 10-year review using integrative review and meta-analysis. J Prof Nurs. 1996; 12(1): 47 – 59. Platt L, Turocy P, McGlumphy B. Preadmission criteria as predictors of academic success in entry level athletic training and other allied health educational programs. Journal of Athletic Training. 2001; 36(2): 141 – 144. Sandow P , Jones A , Peek C, Courts F, Watson R. Correlation of admission criteria with dental school performance and attrition. J Dent Educ. 2002; 66(3): 385 – 392. Hardigan P, Lai L, Arneson D, Robeson A. Significance of academic merit, test scores, interviews, and the admission process: a case study. American Journal of Pharmaceutical Education. 2002; 65: 40 – 43. References 49 Salvatori P. Reliability and validity of admissions tools used to select students for the health professions. Advances in Health Sciences Education. 2001; 6:159 – 175. Sadler J. Effectiveness of student admission essays in identifying attrition. Nurse Education Today. 2003; 23(8): 620 - 627. Ferguson E, James D, O’Hehir F, Sanders A. Learning in practice. BMJ. 2003; 326: 429 – 432. Kulatunga-Moruzi C, Norman G. Validity of admissions measures in predicting performance outcomes: the contribution of cognitive and noncognitive dimensions. Teaching and Learning in Medicine. 2002; 14(1): 3442. Dieter P, Carter R, Rabold J. Automating the complex school admission process to improve screening and tracking of applicants and decision making outcomes. Perspective on Physician Assistant Education. 2000; 11(1): 25 – 34. Skaff K, Rapp D, Fahringer D. Predictive connections between admissions criteria and outcomes assessment. Perspective on Physician Assistant Education. 1998; 9(2): 75-78. References 50 Hanson M, Dore K, Reiter H, Eva K. Medical school admissions: revisiting the veracity and independence of completion of an autobiographical screening tool. Acad Med. 2007; 82(10): S8 - S11. Chestnut R, Phillips C. Current practices and anticipated changes in academic and nonacademic admission sources for entry-level PharmD programs. American Journal of Pharmaceutical Education. 2000; 64: 251-259. https://portal.caspaonline.org/# Albanese M, Snow M, Skochelak S, Huggett K, Farrell, P. Assessing personal qualities in medical school admissions. Acad Med. 2003; 78(3): 313 – 321. Powers D, Fowles M. Balancing test user needs and responsible professional practice: a case study involving assessment of graduate level writing skills. Applied Measurement in Education. 2002; 15(3): 217 – 247. Bill Gates 1997 Annual report letter to shareholders. http://www.microsoft.com/msft/reports/ar97/bill_letter/bill_letter.htm. Accessed July 7, 2008. Flesch R. A new readability yardstick. J Appl Psychol. 1948; 32(3): 221 – 233. Shermis M, Koch C, Page E, Keith T, Harrington S. Trait ratings for automated essay grading. Educational and Psychological Measurement. 2002; 62(5): 5 – 18. References 51 Shermis M, Barrera F. Automated essay scoring for electronic portfolios. Assessment Update. 2002; 14(4): 1-4. Rudner L, Garcia V. An evaluation of IntelliMetric® essay scoring system. Journal of Technology Learning and Assessment. 2006; 4(4): 1 – 21. Shermis M, Burstein J, Leacock C. Applications of computers in assessment and analysis of writing, chapter 27. In: Handbook of Writing Research Guilford Press, c2006: 403-416. Dikli S. An overview of automated scoring essays. Journal of Technology, Learning, and Assessment. 2006; 5(1): 4-35. Vantage Learning. IntelliMetric® scoring accuracy across genres and grade levels. 2006. www.vantagelearning.com. Accessed July 7, 2008. Korbin J. Forecasting the predictive validity of the new SAT I writing section. Available at the College Board Webpage www.collegeboard.com/prod_downloads/sat/newsat_pred_val.pdf Accessed June 15, 2008 Breland H, Bridgeman B, Fowles M. Writing assessment in admission to higher education: review and framework. College Entrance Examination Board and Educational Testing Service, 1999 New York