Scrutiny of an Admission Process: Optimal Ways to

advertisement
Evaluation of Admission Process:
Written Communication
1
Kate Kaiser, BS, PA-S
Christine Reichart, BS, PA-S
Jennifer Snyder, MPAS, PA-C
Larry Vandermolen, BS, MM, PA-S
Jennifer Zorn, MS, PA-C
Today’s Agenda
2
 Cognitive and non-cognitive factors in the admission





process
Review current admission process
Introduce automated essay scoring
Current findings utilizing an automatic essay scoring
system
Our ongoing research
Recommendations for improvement to the current
admission process
Cognitive and Non-Cognitive Factors in the
Admission Process
3
 Cognitive or quantitative variables are known predictors of
success for applicants seeking admission into and
graduating from many healthcare programs

Pre-professional grade point average (GPA) and standardized scores
 Non-cognitive abilities are less consistent in predicting
success

Oral and written communication skills
 Notwithstanding, many admission committees believe that
both cognitive and non-cognitive factors are important
Our Current Admission Process
4
 Individuals are required to submit an application to the
Central Application Service for Physician Assistants
(CASPA)

This includes:
 Candidate’s demographics
 Academic record
 Experience in healthcare
 Personal essay
Admission Process:
The Personal Essay
5
Today, our focus is on the personal essay and the
program’s most recent admission process
Personal Essay Evaluation
6
 A pool of community physician assistants was
recruited to participate in the review and evaluation
of the candidates’ CASPA essays
 Two physician assistants evaluate each essay utilizing
a program-developed Likert scale rubric
Personal Essay Evaluation
7
 The program’s idiosyncratic rubric defines a basis for scoring
the essay in three categories:
1.
2.
3.
Spelling and grammar
Organization and readability
The ability of the applicant to answer the CASPA essay topic of
“describe the motivation towards becoming a PA”
 The PA evaluators independently assign scores to the three
categories, which are then totaled, and an average is
determined between the two evaluator’s scores
The Coordination and Effort is Significant
8
 In the most recent admission cycle, there were over
1,000 essays evaluated

Including those essays reviewed by a third evaluator
Our Current Admission Process
9
 Together, the personal essays, GPA, and healthcare
experience points are totaled and ranked from highest to
lowest
 Invitations for interviews to assess oral communication
skills are offered to approximately the top 90 ranked
candidates
 After completion of the interviews, the interview scores are
combined with previous subtotals and offers of admission
are extended to approximately the top 50 candidates
Limitations of Personal Essay Evaluation
10
The essays are prepared in advance by the applicants
and submitted to CASPA

This raises an important question: to what extent does this
system actually assess the applicant’s writing ability?
Limitations of Personal Essay Evaluation
11
 Community PAs are untrained to analyze sample
essays and may not be capable themselves of
accurately evaluating written work
 Process relies on a program-developed rubric
 Two evaluators must disagree on the score for a
writing sample by 50% of the total score before a
third evaluation of the writing sample takes place
 Unlimited time to reflect on essays is not congruent
with the thinking process required of a PA in
professional practice
Limitations of Personal Essay Evaluation
12
 Resource Depleting
 Paper Trail
 Internal Delays
 Deadline Crunch
 Essays are not de-identified
 Interrater reliability; validity?
Admission Process: Personal Essay Evaluation
13
 As consumers, applicants can be considerably more
demanding, increasingly requesting detailed
information regarding their performance in the
entire admission’s process, not simply accepting an
“admit,” “wait list,” or “non-admit” decisions

Often, the applicant requests specific information to guide
future direction if admission is not initially offered
And Yet We Have Been Successful…
14
 Our program has been successful in selecting very
capable students who regularly achieve above
average national board scores, and who in the past,
have received positive reviews from physicians
employing them
15
What Can the Program Do?
Rudimentary Computer Scoring
16
 By 1997, Microsoft Office® incorporated grammar
checking as a tool for users of the software

Readability Scores
 Determines both Reading Ease, Flesch-Kincaid
Grade levels, and word count
 Grammar-checking software effectively evaluates
essays between 500 to 1000 words covering a wide
range of topics
Automated Essay Scoring (AES)
17
 Evaluation and scoring of written prose via computer
technology

Based on a set of pre-scored essays
 Perfect test-retest reliability
 Used to overcome time, cost, reliability, and generalizability
issues in writing assessments
 The automated system applies the scoring criteria
uniformly and mechanically avoiding the fluctuations found
in untrained graders
 Works well on short descriptive essays encompassing a
wide range of topics

Between 500-1000 words
Advantage Learning IntelliMetric ® Software
Automated Essay Scoring
18
 AES product that utilizes artificial
intelligence, natural language
processing, and statistical analyses to
score and evaluate written prose
Vantage Learning IntelliMetric® Rubric Domains
19
Domain
Area of Evaluation
Is there a main idea, and is it consistently
Focus and Unity
supported?
Development and Are the supporting ideas varied, well
Elaboration
developed, and elaborative?
Does the essay logically transition ideas from
Organization and
introduction, supporting paragraphs, and
Structure
conclusion?
Sentence
Structure
Is there syntactic complexity and variety?
Mechanics and
Conventions
Does the essay follow rules of standard
American English?
Limitations of AES
20
 There are some common criticisms of AES software like
IntelliMetric®
 First, it is possible to respond to an essay question using
appropriate keywords and synonyms but the essay may
still lack a comprehensible answer
 A second criticism of AES software like IntelliMetric® is
that it requires a great deal of effort to write multiple
model answers to essay topics to “train” the software
that properly grades the writing samples
 Some critics wonder if it is possible for a computer to
“artificially think” to generate domain and a holistic
score
Study Methods
21
 The study protocol was reviewed by Butler University’s
institutional review board involving human subjects and
approved as exempt
 Of the 521 applicants in the most recent admission cycle,
the top 90 were selected for interviews using the program’s
standard evaluation process
 A twenty-five minute, onsite written essay was then
required of each candidate as part of the interview process
 The topic chosen for the onsite essay was non-medical and
pre-developed by Vantage Learning
Study Methods
22
 Two of the 90 candidates did not submit an onsite
essay
 Completed onsite essays were reviewed by a faculty
member to excise any identifying names or dates,
and were assigned random identification numbers
 To ensure uniformity, all essays were reduced to
single-spaced documents
Controls: Fabricated Essays
23
 These fabricated essays included:
 Two essays were well-written but in response to a different
essay topic
 One essay was with simple repetition of the topic
 Four sentences written on an essay topic and then simply
repeated in subsequent paragraphs in a different sequence
 An essay with the initial half consisting of a well-written
response to the essay topic, and the second half consisting of a
simple repeat of the essay topic, not a response to the topic
 One essay that responded to the topic and was considered of
good quality
Study Methods
24
 While the IntelliMetric® license fee was
reduced, the study was conducted
independent of Vantage Learning, the
licensor of IntelliMetric®
Study Methods
25
 For consistency, the PAs who assessed the onsite and
fabricated essays were from a group of community
PA volunteers who reviewed CASPA essays in the
past
 Each onsite essay was evaluated by two community
PA volunteers using the programmatic rubric and by
two other community PAs using a hard copy of the
IntelliMetric® rubric
Study Methods
26
 As a means of rudimentary comparative analysis, onsite
and fabricated essays were evaluated by Microsoft Word®
version 2003 to obtain the Flesch Reading Ease, FleschKincaid Level, and Word count
 De-identified, random numbered, onsite and fabricated
essays were electronically submitted for automated
scoring utilizing the IntelliMetric® systems to Vantage
Learning
 Once results were received, data were maintained on an
Excel spreadsheet and statistical analyses performed
using Statistical Package for the Social Sciences (SPSS),
version 15
Null Hypotheses
27
1.
2.
3.
4.
5.
6.
Utilizing the programmatic rubric, there is no difference
between the scores of the CASPA and onsite written essays
There is no rater agreement and no correlation in
corresponding domain scores of the CASPA essays
There is no correlation in the scores between the methods of
evaluation of onsite essays
There is no correlation in the community PA scores between
the programmatic and IntelliMetric® rubric of onsite essays
Utilizing the IntelliMetric® rubric, there is no difference
between the scores of onsite essays evaluated by the AES
system and community PAs
There is no correlation between the candidates’ totaled
scores evaluated by the seven methods of onsite essay
evaluation and GPA
Descriptive Statistics for Methods of Evaluation
N = 88
28
Possible
Range
Mean
S.D.
(+/-)
Range
0 - 10
8.48
1.26
2 – 10
∞
357.93
107.61
142 - 687
Flesch Reading Ease
0 - 100
58.90
9.27
40.4 - 78
Flesch Kincaid Level
Grade Level
9.46
1.93
5.9 – 14.2
AES
5 - 30
15.44
4.11
5 - 25
Onsite Community PA Programmatic Rubric
0 - 10
7.15
1.61
3 - 10
Onsite Community PA IntelliMetric® Rubric
5 - 30
21.57
3.86
8 - 30
Method of Evaluation
CASPA Essay^
Word Count
^ N = 78
Null Hypothesis 1: Utilizing the programmatic rubric, there is no
difference between the scores of the CASPA and onsite written essays
29
 To determine if there was a statistically significant
difference between the ranked difference scores, a
Wilcoxon Signed Rank test was utilized

There was a statistically significant difference
 z = -5.025, p < 0.01
 Therefore, the hypothesis of no difference is
rejected
 Utilizing the programmatic rubric, the community
PAs evaluated the onsite essay 57 out of 78 times
lower than the CASPA essay
Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is
no difference between the scores of the CASPA and onsite written essays
30
 The students may have been incapable of
composing a written response to the onsite essay as
they had for the essay prepared in advance for
CASPA because they felt pressured or constrained
by time
 It is unclear as found in previously reported studies
if or to what extent they received help in
developing the prepared essay’s content and
grammar or spelling
Discussion Null Hypothesis 1: Utilizing the programmatic rubric, there is
no difference between the scores of the CASPA and onsite written essays
31
An onsite essay significantly eliminates doubt
regarding the origin of the essay and is an
essential step in actually assessing the
applicant’s writing ability
Null Hypothesis 2: There is no rater agreement and no correlation in
corresponding domain scores of the CASPA essays
32
 To evaluate the consistency of the community PA
scores for each domain of the programmatic rubric
for the CASPA essay, the corresponding scores
were examined by agreement statistics with
perfect, adjacent, discrepant, and perfect +
adjacent agreement percentages
Null Hypothesis 2 Results: There is no rater agreement and no correlation
in corresponding domain scores of the CASPA essays
33
CASPA Essay* Rater 1 Mean Scores Versus Rater 2 Mean Scores: Agreement Statistics Evaluating the
Domain Scores using the Programmatic Rubric
Exact
(%)
Adjacent
(%)
Perfect +
Adjacent
(%)
Discrepant
(%)
Rater 1
Mean
Scores
S. D.
(+/-)
Rater 2
Mean
Scores
S. D.
(+/-)
Grammar and
Spelling (3)
56.4
42.3
98.7
0.013
2.67
0.54
2.59
0.55
Organization &
Readability (3)
43.5
53.8
97.3
0.025
2.57
0.56
2.62
0.53
Motivation to
become PA (4)
38.4
39.7
78.1
21.8
3.1667
0.88
3.3846
0.80
Domain
(Total Points)
* N = 78
Null Hypothesis 2 Results: There is no rater agreement and no correlation
in corresponding domain scores of the CASPA essays
34
 Further, the agreement between the corresponding
domain scores for the CASPA essays was examined
by intraclass correlation at the 0.05 level of
significance by two-way random, average measures
with absolute agreement
Null Hypothesis 2 Results: CASPA Essay Intraclass Correlation for Rater
1 Mean Scores Versus Rater 2 Mean Scores: Evaluating the Domain
Scores using the Programmatic Rubric, N = 78
35
95% Confidence Interval
Domain
ICC
Lower Bound Upper Bound
Significance
Grammar and
Spelling
0.378
0.026
0.603
0.019*
Organization and
Readability
-0.069
-0.685
0.321
0.613
Motivation to
Become a PA
0.166
-0.291
0.464
0.208
*p is significant at < 0.05
Null Hypothesis 2 Discussion: There is no rater agreement and no
correlation in corresponding domain scores of the CASPA essays
36
 While the level is statistically significant, too many
external sources may be confounding the findings
and no meaningful relationship exists as indicated by
the low ICC value
 Therefore, the community PAs evaluating the CASPA
essay resulted in unreliable scoring outcomes
Null Hypothesis 3: There is no correlation in the scores
between the methods of evaluation of onsite essays.
37
 The six methods of evaluation of onsite essays were
normalized using Z scores
 The ICC (1, 6) was calculated to compare the
reliabilities of the methods
 The ICC (1, 6) = 0.410, p < 0.01

Two-way random, average measure with absolute agreement
 While it is statistically significant, the hypothesis is
rejected
 However, because the correlation is so low

No meaningful relationship exists between the methods of
evaluation of onsite essays
Null Hypothesis 4: There is no correlation in the community PA scores
between the programmatic and IntelliMetric® rubric of onsite essays
38
 Onsite essays were evaluated by ICC comparing the
programmatic and IntelliMetric® rubrics used by the
community PA evaluators
 ICC (1, 2) = 0.567, p < 0.01
 While the results are statistically significant, a
minimal meaningful relationship exists between the
scores of the community PA utilizing the
programmatic and IntelliMetric® rubrics of onsite
essays
Null Hypothesis 5: Utilizing the IntelliMetric ® rubric, there is no
difference between the scores of onsite essays evaluated by the AES
system and community PAs
39
 There was a statistically significant difference of the
totaled scores between the onsite essays evaluated by
the community PAs utilizing the IntelliMetric®
rubric and the AES totaled outcome by the Wilcoxon
Signed Rank test with a z = -7.542, p < 0.01.
 The community PAs’ mean average rating was higher
in 82 of the 88 essays
Null Hypothesis 6: There is no correlation between the candidates’ totaled
scores evaluated by the seven methods of onsite essay evaluation and
GPA
40
Spearman Rank Correlation Coefficient of Essay Scores Evaluated by Different Methods and GPA, N = 88
Spearman
Coefficient
Significance
CASPA Essay^
-0.260
0.022*
Community PA Programmatic Rubric
0.076
0.479
Community PA IntelliMetric Rubric
0.170
0.112
AES Scoring
0.307
0.004*
Word Count
0.237
0.026*
Flesch Reading Ease
-0.067
0.536
Flesch Kincaid
0.122
0.257
Correlation
^ N = 78; *p is significant < 0.05
Hypothesis 6 Discussion:
41
 The Spearman Rank correlation was to evaluate a
possible relationship between GPA and the
candidates’ individual totaled essay scores
 As previously reported, essay length is important to a
certain number of words so that concepts and ideas
may be developed; however, beyond this point, the
essay length does not add to the positive outcome of
the essay
 It seems reasonable to assume that an individual who
has a higher GPA likely is able to write an essay more
effectively than those with a lesser GPA
Power Analysis
Post Hoc
42
N=
Power
%
Effect Size
Cohen’s d
Wilcoxon
CASPA vs. Onsite Community PA Programmatic Rubric
78
100
0.92
AES vs. Community PA IntelliMetric® Rubric
88
100
1.54
Correlation r
Spearman (vs. GPA)
CASPA Essay
78
100
0.94
Word Count
88
100
0.92
Flesch Reading Ease
88
100
0.97
Flesch Kincaid Level
88
100
0.91
AES
88
100
0.90
Onsite Community PA Programmatic Rubric
88
100
0.84
Onsite Community PA IntelliMetric® Rubric
88
100
0.96
Fabricated Essays
43
 The five of six fabricated essays were identified by
the Vantage Learning IntelliMetric® System
 The same was not true of the community PA
evaluators
Limitations of the Study
44
 Generalizability of results limited to this program
 Analysis compares the AES scoring from
IntelliMetric® to a known flawed system

Validation limited
Ongoing Study
45
 Outcome data to determine the correlation between
the onsite essay AES score and the first semester
GPA of the candidates who matriculate into our
program
Future Studies
46
 Challenge all of the methods of evaluation for intrarater
reliability by submitting two of the same essays with different
identification numbers to determine if the grading outcome
would be the same
 Consider fixing raters to specific groups in the random
evaluation of essays
 Consider utilizing two, twenty-five minute timed essays for
reasons of reliability and construct validity
 Consider investigating students’ comfort levels and test
anxiety with using computerized writing test and paper-andpencil writing test by age, gender, and ethnicity
Conclusion
47
 The purpose of this study is to support that there may be a
much more effective and reliable way to evaluate the writing
skills of candidates for admission to the PA program than the
utilization of community PAs
 Questions exist as to whether the current, labor-intensive
process of essay review by volunteer community PAs, is a
reliable process
 Not only is there uncertainty about the source of the essay
itself; there is also uncertainty about the consistency and
quality of the essay review skills of the community PAs
 Serious consideration should be given to incorporate AES into
the admission process

This would reduce the time spent waiting for community PAs to
evaluate the essays, reduce the cost of postage, and potentially
increase the reliability of the essay scoring
References
48





Accreditation Review Commission on Education for the Physician Assistant
Standards of Accreditation A2.05b. http://www.arcpa.org/Standards/standards.html. Accessed July 7, 2008.
Campbell A, Dickson C. Predicting student success: a 10-year review using
integrative review and meta-analysis. J Prof Nurs. 1996; 12(1): 47 – 59.
Platt L, Turocy P, McGlumphy B. Preadmission criteria as predictors of
academic success in entry level athletic training and other allied health
educational programs. Journal of Athletic Training. 2001; 36(2): 141 – 144.
Sandow P , Jones A , Peek C, Courts F, Watson R. Correlation of admission
criteria with dental school performance and attrition. J Dent Educ. 2002;
66(3): 385 – 392.
Hardigan P, Lai L, Arneson D, Robeson A. Significance of academic merit,
test scores, interviews, and the admission process: a case study. American
Journal of Pharmaceutical Education. 2002; 65: 40 – 43.
References
49






Salvatori P. Reliability and validity of admissions tools used to select
students for the health professions. Advances in Health Sciences Education.
2001; 6:159 – 175.
Sadler J. Effectiveness of student admission essays in identifying attrition.
Nurse Education Today. 2003; 23(8): 620 - 627.
Ferguson E, James D, O’Hehir F, Sanders A. Learning in practice. BMJ.
2003; 326: 429 – 432.
Kulatunga-Moruzi C, Norman G. Validity of admissions measures in
predicting performance outcomes: the contribution of cognitive and noncognitive dimensions. Teaching and Learning in Medicine. 2002; 14(1): 3442.
Dieter P, Carter R, Rabold J. Automating the complex school admission
process to improve screening and tracking of applicants and decision making
outcomes. Perspective on Physician Assistant Education. 2000; 11(1): 25 –
34.
Skaff K, Rapp D, Fahringer D. Predictive connections between admissions
criteria and outcomes assessment. Perspective on Physician Assistant
Education. 1998; 9(2): 75-78.
References
50


Hanson M, Dore K, Reiter H, Eva K. Medical school admissions: revisiting the veracity
and independence of completion of an autobiographical screening tool. Acad Med.
2007; 82(10): S8 - S11.
Chestnut R, Phillips C. Current practices and anticipated changes in academic and
nonacademic admission sources for entry-level PharmD programs. American Journal
of Pharmaceutical Education. 2000; 64: 251-259.

https://portal.caspaonline.org/#

Albanese M, Snow M, Skochelak S, Huggett K, Farrell, P. Assessing personal qualities
in medical school admissions. Acad Med. 2003; 78(3): 313 – 321.
Powers D, Fowles M. Balancing test user needs and responsible professional practice: a
case study involving assessment of graduate level writing skills. Applied Measurement
in Education. 2002; 15(3): 217 – 247.
Bill Gates 1997 Annual report letter to shareholders.
http://www.microsoft.com/msft/reports/ar97/bill_letter/bill_letter.htm. Accessed
July 7, 2008.
Flesch R. A new readability yardstick. J Appl Psychol. 1948; 32(3): 221 – 233.




Shermis M, Koch C, Page E, Keith T, Harrington S. Trait ratings for
automated essay grading. Educational and Psychological Measurement.
2002; 62(5): 5 – 18.
References
51
 Shermis M, Barrera F. Automated essay scoring for electronic portfolios.






Assessment Update. 2002; 14(4): 1-4.
Rudner L, Garcia V. An evaluation of IntelliMetric® essay scoring system.
Journal of Technology Learning and Assessment. 2006; 4(4): 1 – 21.
Shermis M, Burstein J, Leacock C. Applications of computers in assessment and
analysis of writing, chapter 27. In: Handbook of Writing Research Guilford
Press, c2006: 403-416.
Dikli S. An overview of automated scoring essays. Journal of Technology,
Learning, and Assessment. 2006; 5(1): 4-35.
Vantage Learning. IntelliMetric® scoring accuracy across genres and grade
levels. 2006. www.vantagelearning.com. Accessed July 7, 2008.
Korbin J. Forecasting the predictive validity of the new SAT I writing section.
Available at the College Board Webpage
www.collegeboard.com/prod_downloads/sat/newsat_pred_val.pdf Accessed
June 15, 2008
Breland H, Bridgeman B, Fowles M. Writing assessment in admission to higher
education: review and framework. College Entrance Examination Board and
Educational Testing Service, 1999 New York
Download