Uploaded by zerun wang

Computer-assisted EFL writing and evaluations based on artificial intelligence

advertisement
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/0737-8831.htm
LHT
40,1
80
Received 11 May 2020
Revised 14 July 2020
17 August 2020
10 September 2020
Accepted 10 September 2020
Computer-assisted EFL writing
and evaluations based on artificial
intelligence: a case from a college
reading and writing course
Zhijie Wang
College of Humanities and Development, China Agricultural University,
Beijing, China
Abstract
Purpose – The aim of this study is to explore students’ expectations and perceived effectiveness of computerassisted review tools, and the differences in reliability and validity between human evaluation and automatic
evaluation, to find a way to improve students’ English writing ability.
Design/methodology/approach – Based on the expectancy disconfirmation theory (EDT) and Intelligent
Computer-Assisted Language Learning (ICALL) theory, an experiment is conducted through the observation
method, semistructured interview method and questionnaire survey method. In the experiment, respondents
were asked to write and submit four essays on three online automated essay evaluation (AEE) systems in total,
one essay every two weeks. Also, two teacher raters were invited to score the first and last papers of each
student. The respondents’ feedbacks were investigated to confirm the effectiveness of the AEE system; the
evaluation results of the AEE systems and teachers were compared; descriptive statistics was used to analyze
the experimental data.
Findings – The experiment revealed that the respondents held high expectations for the computer-assisted
evaluation tools, and the effectiveness of computer scoring feedback on students was higher than that of
teacher scoring feedback. Moreover, at the end of the writing project, the students’ independent learning ability
and English writing ability were significantly improved. Besides, there was a positive correlation between
students’ initial expectations of computer-assisted learning tools and the final evaluation of learning results.
Originality/value – The innovation lies in the use of observation methods, questionnaire survey methods,
data analysis, and other methods for the experiment, and the combination of deep learning theory, EDT and
descriptive statistics, which has particular reference value for future works.
Keywords Computer-assisted writing, Learner expectation and satisfaction, Artificial intelligence in writing
and reading, AEE, AERW
Paper type Research paper
1. Introduction
1.1 Literature review
With the irresistible trend of globalization and internationalization worldwide, the ability to
write well in English as a lingua franca for communication in diverse fields across cultures
had become an imperative in second/foreign language education (Wang et al., 2013; Yu, 2015;
Khadka, 2020). As a result, teachers undertaking EFL (English as a foreign language) writing
courses were facing increasing pressure for language output evaluating and corrective
feedback, especially when enrollments outrun handling capacity (Tang and Wu, 2011; Gao
and Ma, 2020). In order to make up for instructors’ inability to standardize essay assessment
Library Hi Tech
Vol. 40 No. 1, 2022
pp. 80-97
© Emerald Publishing Limited
0737-8831
DOI 10.1108/LHT-05-2020-0113
The paper is supported by the MOE(Ministry of Education in China) Industry-Academy Cooperation
Program “Study on the Effect of Group Cooperation on Undergraduates’ Improvement in EFL Reading
and Writing” (Grant No. 201902184036).
This paper forms part of a special section “Informetrics on Social Network Mining: Research, Policy
and Practice challenges - Part 2”, guest edited by Mu-Yen Chen, Chien-Hsiang Liao, Edwin David
Lughofer and Erol Egrioglu.
and provide quick and valid feedback, computer-based software and online systems have
been introduced in writing evaluation. The first experiment on computer-assisted essay
evaluation was presented in 1966 (Page, 1966). Into the 21st century, more sophisticated
automated essay evaluation (AEE) systems had been applied in writing curricula, such as
Criterion online writing service (Burstein et al., 2004), IntelliMetricSM essay scoring system
(Rudner et al., 2006) and e-rater® scoring engine (Attali, 2007).
Computer-assisted scoring systems were reliable in that if they were programmed in the
same way, they would stably yield identical results free from such conditions as timing,
environment or mood of human beings (Powers et al., 2000). Tests of syntactic complexity
analyzer (SCA) and Coh-Metrix showed that the two AEE systems were both reliable for
studying syntactic complexity across genres (Polio and Yoon, 2018). It was also found that a
combination of e-rater version 2.0, an AEE system and critique writing analysis tools – a suite
of programs that detects undesirable style, errors in grammar, usage and mechanics –
provided students with reliable feedback to improve their writing skills (Burstein et al.,
2004). Research also suggests that AEE software had partly sufficient qualities from the point
of teachers (Bas and Tekal, 2014). The Development Planning for a New Generation of Artificial
Intelligence issued by the Chinese State Council in 2017 proposed to accelerate the innovation
and application of artificial intelligence (AI) in the field of education, develop intelligent
education and meet the personalized and diverse educational needs. It is necessary to construct
a learner-centered education environment, create an intelligent learning and interactive
learning education system, as well as use intelligent technology to innovate talent training
modes and teaching methods. Since 2018, the Ministry of Education of China promulgated the
“Internet þ Education” specific implementation plan Education Informatization 2.0 Action
Plan. AI has gradually entered the implementation stage in the field of education.
The validity of AEE technology has also been put to the test in Chinese universities in
recent years. An experiment at Chongqing Technology and Business University showed that
the use of Pigai.org (an AEE system in China) could effectively enhance students’ writing
proficiency, promote their writing motivation and improve their self-efficacy (Yang and Dai,
2015). An 18-weeks experiment was conducted on 81 non-English major sophomores to
explore the effectiveness of blog-assisted writing for Chinese English learners. The results
indicated that during the blog-assisted process, students’ English writing ability significantly
improved. Moreover, most participants had a positive attitude toward this writing mode,
which played a decisive role in solving the problems that may arise in traditional English
writing teaching (Zhou, 2015). A tracking study of six non-English major undergraduates
found that an AEE-mediated three-phase process – analyzing, responding and revising –
beefed up the enthusiasm of the respondents involved, reinforced their perception of learning
goals and yielded positive feedbacks (Lu, 2016). Another investigation showed that all 37
respondents enrolled in a writing program were satisfied with feedbacks from Pigai.org in
that the warnings and suggestions on vocabulary and collocations contributed much to their
improvement of lexical diversity and sophistication (Huang and Zhang, 2018). Recent
research also found that by using AI technology in planting depth perception features,
including correlation, error and overall accuracy, automatic evaluations could be as reliable
as manual scoring in dealing with deep-seated language issues (Pribadi et al., 2017; Tang and
Sun, 2019; Boulanger and Kumar, 2019).
Though many empirical studies in the field of AEE systems revealed positive effects on
overall writing improvement, some studies had shown unfavorable learner evaluations. A
study conducted in three EFL classrooms found that respondents in all three classes
perceived the use of the AEE system unfavorably (Chen and Cheng, 2008). Others were
concerned that AEE could be easily fooled to assign high scores to essays which were long,
syntactically complex and replete with sophisticated vocabulary (Wilson and Czik, 2016)
Therefore, further research was required to test the efficiency of these systems and to
Computerassisted EFL
writing
81
LHT
40,1
82
determine which areas in the writing construct could be effectively improved via AEE
systems (Aluthman, 2016). In addition, integrating technological advances into intelligent
training models to help learners build conceptualized language ability was also a grand
challenge for the future development of AEE tools (Winke and Isbell, 2017).
In recent years, social network interactions were also integrated into the writing process to
provide more practical and profound implications in the use and evaluations of AEE systems.
In fact, with the growing popularity of Web 2.0 technology, the value of web-based
interactions and collaborations in EFL writing had been positively critiqued (Jin, 2014; Chu
et al., 2017). Experimental studies on EFL writing collaboration revealed that collaborative
writing in the form of peer review exercises within task groups could improve writing
outcomes (Cristina and Carrillo, 2016) and encourage students’ sense of fulfillment (Li and
Liu, 2011; Liu et al., 2018). Both qualitative and quantitative analyses showed that
collaborative writing on blogs could improve the students’ knowledge about their language
performance in writing (Amir et al., 2011), trigger motivation to learn autonomously (Yang
and Yang, 2013), foster a sense of competence, be useful and mutually related (Kramer and
Kusurkar, 2017). In this study, all respondents were also encouraged to interact by crossreviewing each other’s essays and responding to peer reviews, using three AEE websites as
academic, social network tools.
Despite massive research efforts from multiple dimensions, there are still some questions
that need solutions. First, in what way do learners engage in AEE systems? Second, how do
learners respond to and perceive the corrective feedback from the AEE rating? Third, how to
measure learners’ expectations and outcome confirmation? Fourth, how to compare the
reliability and validity between the AEE evaluations and that of human raters? Fifth, how do
the latest technological advances, especially in AI technology, help improve on the ingrained
concerns, such as learner engagement? (Kukich, 2000). In conclusion, the traditional English
writing teaching method only focuses on the teaching of writing skills and the result, which is
not conducive to stimulating students’ interest in learning. With the increasing influence of
network technology on the reform of English teaching mode, it has become the attention focus
of teachers and students. AI technology and deep learning theory are integrated into the AEE
system to realize the optimization of the AEE system and effective feedback of the evaluation
results.
2.2 Research questions
This study deals with three research questions. First, what are the respondents’ expectations
and subjective evaluations of AEE? An exploration of this question could provide a further
test on the existing research on learners’ engagement with AEE systems (Tian and Zhou,
2020). Second, what are the respondents perceived effectiveness of AEE in their English
writing improvement? Research on this question could throw light on how student beliefs
about AEE might relate to their writing performance (Wang, 2013). Third, what are the
advantages and disadvantages of AEE in compassion with the human rating? A study on
this question could offer more evidence on AEE if it was a good substitute for the human
rating (Azmi et al., 2019) or if it played a supplementary role (Li, 2019).
2. Method and design
2.1 Respondents
The respondents of this study were 188 students from China Agricultural University, who
were taking the advanced English reading and writing (AERW) course taught by the
teacher–researcher of this paper. As the course was open to all undergraduates, the
enrollments included students from first-year students to fourth-year seniors. First-year
students accounted for 52% of the total, sophomores 47%, junior and senior students both
less than 2%. As there was no classroom as big as to accommodate all the students, and many
students had to attend other compulsory courses at different time slots, the respondents were
separately placed in Class A42, Class A42, Class A51 and Class A52, each of which was held
at different hours. To guarantee the validity and reliability of the analysis, each student had
to finish and submit all four essays online under the teacher’s monitoring in class within a
given amount of time. Consequently, ten respondents were eliminated for failing to meet all
requirements, and the remaining 178 respondents were counted in as qualified respondents of
the research.
Computerassisted EFL
writing
83
2.2 Research methods
The theoretical paradigm of this study is based on two theories: expectancy disconfirmation
theory (EDT) and the theory of ICALL (Intelligent Computer-Assisted Language Learning).
The research methods adopted were observation, semistructured interview and survey.
Descriptive statistics was used to display and explain the data collected in the experiments.
2.2.1 EDT. EDT was first applied to analyze the effect of expectation–disconfirmation on
affective judgments of a product (Oliver, 1977). At the heart of EDT was that disconfirmed
expectancy might positively or negatively influence customers’ attitude and purchase
intention (Oliver, 1980). EDT had been widely used in the analysis of affective evaluations
and behavioral management, such as how an organization’s satisfaction with its supply
network’s behavior influenced its intention to open innovation with that network (Bravo et al.,
2017), and how citizen satisfaction with public services was judged by prior expectations
(Van Ryzin, 2013).
An EDT analysis model created by Van Ryzin (2013), shown in Figure 1, is used to build
analytic indicators in respondents’ subjective evaluations of AEE systems in this study.
Among the links, A indicates high expectations result in negative disconfirmation, B shows
high-performance yields positive disconfirmation, C represents a positive relation between
disconfirmation and satisfaction, D indicates expectations and performance are positively
related, E shows a positive relationship between performance and satisfaction and F shows
expectations may either positively or negatively relate to expectations.
2.2.2 English writing teaching assisted by AI computing. Regarding the application of AI in
the field of language teaching, there is currently the theory of ICALL. ICALL is a branch
discipline developed from CALL (Computer-Assisted Language Learning). From the existing
Figure 1.
EDT analysis model
created by Van
Ryzin (2013)
LHT
40,1
84
empirical research, the current application of AI technology in language teaching mainly
includes speech recognition and semantic analysis, image recognition (such as recognizing
learners and learning content, assisting online examinations, homework correction and
review), human–computer interaction (such as intelligent evaluation, intelligent learning
feedback), adaptive learning and personalized learning (adjusting learning materials, test
methods or learning sequences according to individual learning conditions to meet
personalized learning needs) and scene teaching based on virtual reality. As an emerging
discipline, although AI-assisted language teaching is still in its infancy, the development and
application of the technology has dramatically enriched the means of English teaching and
provided strong technical support for mixed English teaching. The English writing teaching
mode based on AI computer assistance is shown in Figure 2.
2.2.3 Data collection and analysis tools. A set of data collection and analysis tools are used
in this research, including two sets of questionnaires conducted via www.wjx.cn, one of the
leading Chinese social statistics websites, separately at the beginning and ending of the
writing project. The five-point Likert-scale questionnaires had 15 items concerning
respondents’ attitudes toward the effectiveness of three AEE systems on writing
improvement. Some items were also designed for comparisons by respondents between
different scoring approaches. An estimation of Cronbach’s α showed high reliability of the
questionnaire (α 5 0.82). Also, some open-ended questions were included in respondents’
evaluations of the three AEE tools to supplement quantitative analysis.
2.3 Data collection process
Data collection was conducted in the Spring semester of the 2018 school year. The AEWR
course was given in a computer-equipped classroom, ensuring that each respondent had a
computer at hand. Before writing tasks were assigned, respondents were asked to finish a
background-surveying questionnaire. In the first two weeks of the class, the teacher
explained the scoring mechanism of Pigai.org, iWrite and Awrite, three well-known Chinese
AEE systems shown in Table 1, and then various features and functions of the three
evaluation tools were demonstrated. As the respondents registered and practiced, the teacher
circulated in the classroom to supervise the progress and respond to questions.
Network environment support,
artificial intelligence technology
Before class:
guidance and Practice
Autonomous
Learning
Learn
r ing
Figure 2.
English writing
teaching mode based
on AI computer
assistance
In class: learn to
imitate
After class: Reciting
to promote writing
review
revise
Study
t
comment
Transfer
Transfe
f r ooff
sentence patterns
Writing
Writi
t ng transfer
t ansfe
tr
f r
Combination of online and
offline learning
Corporation
Software Technique
Beijing Ciwang
Technology
Incorporated
Foreign Language
Teaching and
research press
Fangzhou
Education
Technology
Incorporated
Pigai
NLP þ cloud
computation
iWrite
NLP
Awrite
NLP
Main focus
Scoring
Feedback
Grammar,
syntactic
complexity, style
Grammar,
content, usage
Holistic and
limited trait
scores
Holistic and
trait scores
Grammar, usage,
organization
Holistic and
trait scores
Detailed
individualized
feedback
Limited
individualized
feedback
Detailed
individualized
feedback
During the period from the third week to the fifth week into the course, various modes of
essay writing (argument and persuasion, comparison and contrast, cause and effect) were
instructed, and respondents were encouraged to practice organizing a paragraph on their
own. After the instruction and practice in the beginning five weeks, the respondents were
asked to write one essay every two weeks and upload their work to the websites of Pigai.org,
iWrite and Awrite. The respondents had to write four essays in total, each finished in 30min
under the teacher’s supervision in computer-equipped classrooms. Since qualified
respondents were 178 in total, 752 essays were collected for the study.
To guarantee the functions of the three AEE systems were fully understood and properly
used, each respondent was given 5 to 20 minutes of tutoring before writing. During the
tutoring sessions, respondents and the teacher worked together to submit a sample essay
first, and then they reviewed the feedback from the scoring systems. For anything that the
respondents felt difficult or puzzled about, the teacher offered timely advice and solutions.
After each submission, respondents were encouraged to interact with each other by crossreviewing each other’s essays and responding to peers’ reviews via the academic, social
networking features of the three AEE websites.
An e-rater scoring metric (Quinlan et al., 2009) was offered to help the respondents classify
and pinpoint rating focuses common in AEE systems, as shown in Figure 3. After each
assigned essay was finished, respondents were encouraged to check the feedback of the three
AEE systems with the e-rater metric and respond to 15 items of a five-point Likert-scale
questionnaire.
After the writing tasks were finished, the respondents’ affective feedback was collected by
asking them to answer some questions designed based on the expectation–disconfirmation
theory. Respondents were categorized according to their feedback into four groups: low
expectations and low performance, low expectations and high performance, high
expectations and low performance and high expectations and high performance.
After all of the four essays were finished, the teacher retrieved the first and the last essays
of each student from the online electronic archives of the three AEE systems. The word count
and the total score of each essay were recorded. After that, two experienced teacher raters
were invited to review and evaluate the essays following the rubrics of Rubrics on CET
(College English Test) Band Four, a popular standardized English test in Mainland China.
And then the respondents were asked to finish another five-point Likert-scale questionnaire
with 15 items for a comparison between AEE and human scoring.
Computerassisted EFL
writing
85
Table 1.
A comparison of three
well-known Chinese
AEE systems
LHT
40,1
86
Figure3.
The organization of
e-rater scoring
(Quinlan et al., 2009)
3. Results
3.1 Respondents background
Analysis of the questionnaire shows that 116 of the respondents are female, accounting for
65.17% of the total, and the remaining 34.83% are male. As to age, 94%, or 168 respondents of
the total, are in the age span from 17 to 20, with the remaining 10 between 20 to 22 years of
age. As for the places where they received high school education, 103 respondents or 58%
of the total come from developed provinces and cities; 44 respondents or 25% are from
moderately developed areas; 20 respondents or 11% are from less developed places, and the
remaining 11 respondents or 6% come from poverty-stricken areas. Since all the respondents
taking the AERW course are comparatively of higher English proficiency are of little
difference from each other in language skills, places of their high-school education are not
reckoned as factors of influence. Instead, an analysis according to their years in university
was conducted in this study.
Computerassisted EFL
writing
87
3.2 Effect of disconfirmed expectation
It is found in Table 2 that respondents of different years in university show different
expectations for AEE systems. More first-year students than respondents of other years
focus on the ability to enlarge their vocabulary, accounting for 30.86% of the total. Putting a
premium on vocabulary building was common for fresh university goers because they faced
more difficult text content with more new words than they did in high school (Liu, 2015; Jiang,
2019). Sophomores are comparatively less concerned with vocabulary, but they instead
emphasize better organization and development, as well as better syntactic variety. Juniors
are more concerned with better syntactic variety and better organization and development
than with other English skills.
First-year
students
Sophomores
Juniors
Seniors
N
Vocabulary
enlargement
(%)
Grammar
refining (%)
Usage
improvement
(%)
Better
syntactic
variety (%)
Improvement in
organization and
development (%)
81
25 (30.86)
15 (18.52)
11 (13.58)
13 (18.05)
15 (18.52)
72
12
13
14 (19.44)
2 (16.67)
2 (15.38)
17 (23.61)
3 (25.00)
3 (23.07)
4 (5.55)
1 (8.33)
1 (7.69)
19 (26.39)
3 (25.00)
3 (23.07)
15 (20.83)
3 (25.00)
3 (23.07)
Vocabulary
E
P
Grammar
E
P
Low expectations,
low performance
Low expectations,
high performance
High expectations,
low performance
High expectations,
high performance
Usage
E
P
Fluency
E
P
Table 2.
Expectations of
respondents for AEE
Organization and
development
E
P
2.23
3.12
2.26
2.89
2.32
3.15
2.54
2.46
2.67
2.86
2.15
4.54
2.34
4.24
3.01
4.45
2.28
4.10
2.76
4.08
4.24
3.56
4.34
3.23
4.67
3.78
4.52
3.43
4.69
2.89
4.56
4.58
4.67
4.69
4.71
4.70
4.47
4.40
4.42
4.27
Table 3.
Comparing
expectations and
performance ratings
(E indicates
expectations, P
indicates performance)
LHT
40,1
88
The effect of expectation–disconfirmation is revealed in Tables 3 and 4, with respondents
grouped in four categories according to their ratings for AEE features at different stages: the
first is the low expectations and low performance, the second low expectations and high
performance, the third high expectations and low performance, the fourth high expectations
and high performance.
The relationship between expectations and performance, as is shown in Table 3, is
positive when levels of expectations and performance are both high or low. The relationship
can also be negative. For example, when expectations are low, the performance can be high.
The ratings of AEE performance in vocabulary, grammar and usage are markedly higher
than those of fluency and organization/development. When expectations are high, but
performance is low, the ratings of AEE performance in fluency and organization/
development are markedly lower than that of vocabulary, grammar and usage.
From the ratings of satisfaction, as is shown in Table 4, it can be found that performance
and satisfaction are positively related, whereas the relationship between expectations and
satisfaction is comparatively complex. Unless the levels of expectations and performance are
both high or low, satisfaction levels are not in a positive relationship with expectations.
When expectations are low, but performance is high, satisfaction levels stay high,
especially concerning vocabulary, grammar and usage. When expectations are high, but
performance is low, satisfaction with vocabulary, grammar and usage remains at relatively
high levels, though affective evaluations of fluency and organization/development are low.
3.3 Deep learning analysis based on AI technology
In recent years, as deep learning is applied to word vector processing, more word vector
expressions are trained, such as WORD2VEC word vector expression. The text is trained
through a neural network model, but WORD2VEC is mainly used to predict words around
words. So, WORD2VEC shows weakness when global statistical information is not used.
English text classification can have a good user experience improvement for the English
learning system and provide an excellent design basis for the system’s user preference
analysis and article recommendation. In this way, more accurate English articles can be
provided to users in a targeted manner. By applying a text classification training model,
different convolution windows are used for convolution training. Compared with a single
convolution window, the correlation between word and text can be better trained. By
adopting the improved average processing method of pooling convolution kernel for the
pooling layer, the accuracy of the training results will also be increased. The weight-sharing
characteristics or the local weight-sharing characteristics of the network layer of the
convolutional neural network (CNN) make CNN have considerable advantages over
traditional methods in speech recognition and two-dimensional image processing. The
weight-sharing characteristic also reduces the complexity of the neural network model. When
the quantity of data input is relatively large, the excessive complexity of the model is avoided
to a certain extent.
Satisfaction with various features
Low expectations, low performance
Low expectations, high performance
Table 4.
Comparing satisfaction High expectations, low performance
High expectations, high performance
ratings
Vocabulary
Grammar
Usage
Fluency
Organization/
development
3.88
4.68
3.87
4.77
3.98
4.76
3.76
4.56
3.76
4.58
3.68
4.50
3.01
3.78
3.15
3.56
2.98
3.69
3.24
3.27
Computerassisted EFL
writing
input data
Text preprocessing
Convolution
layer
Activation layer
tanh
Convolution
layer
Activation layer
tanh
Convolution
layer
Activation layer
tanh
Pool layer Max
Pool layer Max
Pool layer Max
Flattening
convolution
Flattening
convolution
Flattening
convolution
89
Full connection
layer
Activation layer
Classified output
English texts are preclassified and classified into possible different types of groups to avoid
classification errors caused by a single classification of some texts. The method of controlling
variables is adopted in the training process to obtain the effect of text preprocessing on the
results. The dimension of the word vector is controlled to 50 dimensions, and the number of
training iterations is 10. The number of loop training is 1000, and the depth of the
convolutional layer is 3 convolutional layers. The overall design of the neural network is
shown in Figure 4.
Under the background of AI, the mixed teaching mode in English writing teaching is
designed with students at the center. It combines online and offline learning. A variety of
teaching methods such as interest-driven, problem-oriented, and flipped classrooms are
integrated. At the same time, diverse teaching evaluation methods are used to allow students
to truly integrate into the classroom and make them indeed be the protagonists of learning.
The use of AI cloud classes helps students develop autonomous learning, critical thinking
and developing ability. Also, teaching resources have been shared more effectively. In the
context of “AI þ Education,” the use of modern technology can accelerate the reform of the
talent training mode and effectively improve the quality of curriculum teaching.
3.4 Comparison between AEE and human rating
In this part, a set of five-point Likert-scale questionnaires were used to gauge respondents’
perceived comparisons between automatic essay scoring and human rating in the AREW
course. On the five-point scale, 1 means “totally disagree,”; 2 indicates “disagree,”; 3 shows
“slightly agree,”; 4 means “agree,”; 5 indicates “totally agree.” The questions were first
imported on www.wjx.cn, the above-mentioned social statistic service website, and later
exported to SPSS (version 22.0) for variables and reliability analysis.
Table 5 shows eight favorable traits of AEE in comparison with human rating, including
being timely, more detailed, more individualized and more understandable. It can also be
found from Table 5 that respondents give high positive evaluations of more independent
learning assisted by AEE. There are also positive responses regarding the reliability of the
Figure 4.
The overall design of
the neural network
LHT
40,1
90
system scoring method, as well as the timeliness and detail of the feedback content. Overall,
the data shows that most respondents accepted the AEE traits as more effective than human
rating (total average of means 5 4.07). It is not negligible, however, that some evaluation
scores are comparatively low, especially for the effectiveness of AEE in vocabulary building
and sentence learning. Interviews were arranged with the respondents; specific analysis is
provided in the discussion part of this paper.
To further fathom the respondents’ attitude toward using AEE systems, an additional
five-point Likert-scale questionnaire was designed. As the data in Table 6 show, most
respondents in each class of the AERW course expressed a positive attitude. Of the total
respondents, 78% of respondents in Class A41 considered AEE as “greatly helpful” or
“moderately helpful.” Percentages in Class A42, Class A51, and Class A52 are 73%, 86% and
84% separately. Class A51 and Class A52 show a comparatively higher ratio of positive
attitude, which credits further explorations.
Means
Table 5.
Respondents’
perceived effective
traits of AEE
Table 6.
Overall perceived
effectiveness of using
AEE systems
Feedback is timely
4.69
Feedback is more detailed
4.57
Feedback is more individualized
3.68
Feedback is more clearly understandable
4.08
Scoring measure is more reliable
4.62
More convenient in vocabulary building
3.53
More convenient in sentence learning
3.64
Learning is more independent
3.34
Note(s): Total average 5 4.07; Cronbach’s α 5 0.816
SD
Sig. (two-tailed)
178
178
178
178
178
178
178
178
0.536
0.537
0.734
0.625
0.484
0.879
0.954
0.756
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
To what extent do you believe
AEE helps you in writing?
Greatly
helpful
Moderately
helpful
Slightly
helpful
Not
helpful
Undecided
Class A41 (N 5 46)
Class A42 (N 5 45)
Class A51 (N 5 43)
Class A52 (N 5 44)
11 (24%)
9 (20%)
7 (16%)
8 (18%)
25 (54%)
24 (53%)
30 (70%)
29 (66%)
3 (0.7%)
6 (13%)
3 (0.7%)
5 (%)
3 (0.7%)
4 (0.9%)
3 (0.7%)
2 (%)
4 (0.9%)
2 (0.4%)
0 (0%)
0 (0%)
PIgai.org
Table 7.
Respondents’ overall
assessment of AEE
reliability in writing
improvement
n
iWrite
Awrite
Grammar
4.58
4.50
4.45
Usage
4.55
4.28
4.50
Mechanics
4.50
4.38
4.12
Style
4.12
4.40
4.08
Organization
3.88
3.90
3.78
Development
3.28
3.45
3.48
Syntactic complexity
4.60
4.54
4.38
Note(s): Total average 5 4.07 Cronbach’s α 5 0.825
Mean
N
SD
Sig. (Two-tailed)
4.51
4.44
4.33
4.21
3.85
3.40
4.51
178
178
178
178
178
178
178
0.576
0.549
0.658
0.785
0.804
0.746
0.634
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3.5 Effects of AEE on students’ writing improvement
In order to examine the perceived effects of AEE on writing performance, another five-point
Likert-scale type questionnaire was designed to gauge the respondents’ assessment of the
functions of three AEE systems separately. As seen in Table 7, most participants expressed
positive attitudes toward the features of all three systems, especially concerning grammar,
usage, mechanics and syntactic complexity. The scores for organization and development are
comparatively low. Interviews with respondents revealed that they were satisfied with the
content analysis of the AEE systems, but they expected more feedback on discourse
elements.
The respondents’ writing improved over the course as can be seen in Table 8, which shows
a correlation between the word count and AEE scores. For instance, the word count in the
first essays of Class A41 ranges from 128 to 246 words. Compared with the first essay, the
number of words in the fourth essay goes up from 141 words to 308. Besides the increase in
word count, the AEE scores for Class A41 also rise to the range of 5 to 9 from the original
range of 4 to 9. A significant improvement is seen in the minimum score, which advances from
4 to 5. The same tendency is found in all other classes, of which Class A42 registers the
highest leap of the minimum score from 3 to 5.
N
Word count
(1st essay)
Min
Max
Word count
(4th essay)
Min
Max
46
45
43
44
128
120
119
121
141
152
150
155
Class
A41
A42
A51
A52
Essay
246
208
234
216
Scores of the first
essay
Mean
SD
1.
7.14
0.53
2.
7.27
0.48
3.
7.94
0.54
4.
7.68
0.60
Note(s): ***means p < 0.001
Scores of the first
essay
Mean
SD.
1.18
4
3
4
4
9
10
10
9
t-test
0.54
0.52
0.49
0.56
Scores of the
fourth essay
Mean
SD.
7.28
1.05
8.76
Note(s): **means p < 0.01
308
303
356
328
Scores of the
fourth essay
Mean
SD
8.34
8.61
9.26
9.12
Scores
(1st essay)
Min
Max
Computerassisted EFL
writing
91
Scores
(4th essay)
Min
Max
5
5
6
5
9
10
10
10
t
p
Cohen’s d effect
d
4.15
5.48
5.37
4.48
0.000***
0.000***
0.000***
0.000***
0.56
0.61
0.79
0.51
t
p
Cohen’s d effect size
4.012
0.005**
0.56
Table 8.
Words/scores of first
and last essays
Table 9.
Improvements in the
mean scores (N 5 178)
Table 10.
Scores of all four
essays
LHT
40,1
92
An analysis of the significant difference in the scores of the first essay and the fourth essay
also shows positive results. As it is seen in Table 9, Cohen’s d effect sizes underlying these
significant levels range from 0.51 to 0.79, showing medium to significant effects. These
results demonstrate a marked improvement of respondents’ scores from the first essays to the
fourth essays.
Two experienced teacher raters were also invited to assess the first and fourth essays on
the same scale of one to ten. Their evaluations of the essays show marked similarity with that
of the three AEE systems. The score difference between the first essay and the fourth essay
shows significant improvement in respondents’ writing skills. Cohen’s d effect size shows
that the significant difference is statistically moderate (Table 10).
4. Discussion
This paper discusses the issue of AEE systems and their application in EFL writing at China
Agricultural University. Four dimensions are covered in the study: respondents’ expectations,
comparison between AEE and human rating, perceived effectiveness of AEE in English
writing and mediation of AI-based deep learning. Questionnaires and interviews are conducted
to examine the expectations of the respondents enrolled in the writing project. Besides, to draw
a comparison between AEE and human rating and study the perceived effects of AEE on
respondents’ writing improvement, all respondents are asked to write four essays throughout
the project. At the end of the project, two teacher raters are invited to evaluate the first and last
essays of the respondents. Furthermore, to provide a better understanding of AEE-mediated
learning context, a model is established based on deep learning concepts.
As for the first research question, it is found that respondents of different college years
have different expectations. For first-year students, vocabulary enlargement is most
important, while for sophomores, syntactic variety is a priority. While respondents in their
junior and senior year focus on better organization and development of the essay, they take
grammar refining as similarly necessary. The EFL teaching curriculum can explain such
results not only in China Agricultural University but in other colleges in Asia where the
learning mode usually transforms from a test-intensive way in high school to a self-regulated
way at college. (Chou, 2015; Kim et al., 2016). First, grammar and syntax are supposedly
taught in high schools, but text content is rendered simple enough to exclude words and
expressions outside of the list issued by educational authorities. College English content,
however, is not subject to a guided wordlist, and thus first-year students are challenged to
learn as many words as quickly to catch up with the teaching schedule. (Zheng, 2014) Second,
all respondents are not English majors, and will not be able to obtain writing tutorship
afterward, so peer collaboration or AEE rating are alternatives to keep their writing skills
honed. Analysis based on EDT to gauge students’ affective evaluations of AEE systems also
reflect a high level of satisfaction at AEE systems even if there is a disparity between high
expectations and low performance.
For the second research question, this study shows that the respondents confirmed their
preference for AEE functions in many ways, such as being timely, more detailed, more clearly
understandable feedback and reliable scoring scale. Similar research explained that
instantaneous feedback and consistent grading helped students draft better essays (Azmi
et al., 2019). It seems contradictory that the respondents place a high value on the “detailed”
feature but in the meanwhile give a markedly lower score to the “individualized” feature.
Moreover, respondents give a high score on “clearly understandable” feedback of AEE, but
meanwhile offer a pronouncedly low mark to “more convenient in vocabulary and sentence
learning.” Also unusual is the low score they give to “learning more independently.”
Interviews with some respondents show that they believe the AEE feedback does not give as
individualized comments as they can get when they have their essays reviewed by the
teacher, with whom they feel easy to develop an affinity during the torturing process.
Furthermore, though AEE is reliable in most cases, it will give inaccurate, confusing
feedback. For instance, the three AEE systems wrongly code proper nouns like “iPhone” as a
capitalization error and fail to distinguish “a long” from “along” in “Ocean water can lead to
great damage to buildings along coastal areas,” which indicates AEE is oblivious to the logic
errors of transitional words. Moreover, it fails to check the missing subject and predicate in
sentences like “When Robby got up early, which was a routine for twenty years,” while the
teacher-rater identifies them accurately. In addition, self-learning via AEE requires high
motivation and well-informed interactions, which many respondents felt too challenging to
handle.
In terms of the third question, upon the comparison between AEE systems and human
rating, the respondents view the former favorably, but interviews also find that teacher
raters’ role is indispensable. On the one hand, the effect of three AEE systems on the
respondents’ writing is positive. Respondents confirm AEE’s better effectiveness in giving
feedback on grammar, usage, mechanics and syntactic complexity. Other research also
shows that human evaluation was somewhat inconsistent, and it would be advisable to use
automatic evaluation (Azmi et al., 2019) in cases where punctuation, morphology, lexis,
syntax and coherence is to be evaluated (Svatava et al., 2019). However, on the other hand,
interviews with respondents show human raters are indispensable where there is no
consistent standard, such as logic consistency, organization and development. Research also
shows that most AEE systems failed to give accurate evaluations of the structure and content
of an essay (Bai and W, 2019). Moreover, the automatic rating feature had a disadvantage
when it was necessary to evaluate written texts while considering their specific context
(Patout and Cordy, 2019). Simply put, this study reveals that AEEs effectively help students
improve their writing skills, but low satisfaction levels in AEE evaluations of development
and organization indicates that human rating remains indispensable.
5. Conclusion and implication
This study explores the students’ disconfirmed expectation on and motivation in using AEE
systems by creating a dynamic and longitudinal context. The results reveal marked
relationships: first, if the relationship between expectations and performance is positive, then
there is a positive relationship between expectation and satisfaction. Low expectations, for
example, combined with low performance, leads to low levels of satisfaction, and vice versa.
Second, in the cases when expectations are low but performance is high, satisfaction levels
are high. Third, if expectations are high but performance is low, satisfaction levels stay in the
middle, with respondents’ overall ratings of the AEE systems at high levels.
The study provides several significant implications: first, the results reveal a discrepancy
between expectations and perceived performance, which not only wields an impact on
student satisfaction but also acts as a moderating agent between expectations and
satisfaction. Therefore, it is advisable for teacher–researchers to explain the strengths of
AEE at the beginning, to inform respondents of the latest improvement of AEE systems
during the process of writing and to better integrate AEE systems with EFL (English as a
foreign language) writing curriculum. Second, grasping the affective level of student
expectations is challenging in that respondents may appear optimistic at the initial use of the
AEE system features. However, their satisfaction can diminish if they cannot experience the
same level of perceived performance as expected, thus resulting in a negative effect on their
motivation to use valid AEE features. Third, although there is a positive relationship between
performance and satisfaction, the relationship between expectations and disconfirmation is
not linear and is mediated by factors such as teacher’s guidance and collaboration with peer
reviewers, which provides insight for pedagogical reforms in College English curriculum to
Computerassisted EFL
writing
93
LHT
40,1
94
foster students’ learning motivations. Wilson conducted a one-semester experiment with
respondents from fourth grade to junior high school from 28 schools. It was found that
automatic feedback could actively assist teachers in dealing with various problems
encountered in the writing and reviewing process, such as diagnosis and response (Wilson,
2014). In this way, the teachers will have more time and energy to improve students’ writing
ability and other aspects of feedback. It is consistent with the results of this study, confirming
the importance of educational informatization for teaching innovation.
In summary, the AEE system can adequately compensate for the failure to give untimely
corrective feedback in traditional teaching, help students improve their English writing
performance and promote the improvement of students’ writing ability. Moreover, it
stimulates students’ motivation and self-efficacy, cultivates students’ interest and confidence
in writing and help build up the ability to learn independently. Furthermore, the evolution of
AI will transform the writing teachers’ role from merely giving corrective feedback to
constructing a learner-friendly social engagement context. Investing in intelligent computing
technology and facilitating its use in classrooms is of significant policy implications. In order
to make up for current limitations, future research may focus on how to enhance students’
writing proficiency and satisfaction levels by helping them construct conceptual thoughts via
deep engagement with AI technologies.
References
Aluthman, E.S. (2016), “The effect of using automated essay evaluation on ESL undergraduate
students’ writing skill”, International Journal of English Linguistics, Vol. 6 No. 6, pp. 54-67.
Amir, Z., Ismail, K. and Hussin, S. (2011), “Blogs in language learning: maximizing students’
collaborative”, Procedia Social and Behavioral Sciences, Vol. 18 No. 2011, pp. 537-543.
Attali, Y. (2007), Construct Validity of E-Rater in Scoring TOEFL Essays (Research Report No. RR-0721), Educational Testing Service, available at: https://www.ets.org/research/policy_research_
reports/publications/report/2007/hsmn.
Azmi, A.M., Al-Jouie, M.F. and Hussain, M. (2019), “AAEE–automated evaluation of students’ essays
in Arabic language”, Information Processing and Management, Vol. 56 No. 5, pp. 1736-1752.
Bai, L.F. and W, J. (2019), “An overview of automatic essay evaluation in the past two decades”,
Foreign Language Research, No. 1, pp. 65-71.
Bas, F.C. and Tekal, M. (2014), “Evaluation of computer based foreign language learning software by
teachers and students”, The Turkish Online Journal of Educational Technology, Vol. 13 No. 2,
pp. 71-78.
Boulanger, D. and Kumar, V. (2019), “Deep learning in automated essay scoring”, Presented at 14th
International Conference on Intelligent Tutoring Systems (ITS), Montreal. doi: 10.1007/978-3-31991464-0_30.
Bravo, M.I.R., Montes, F.J.L. and Moreno, A.R. (2017), “Open innovation in supply networks: an
expectation disconfirmation”, Journal of Business and Industrial Marketing, Vol. 32 No. 3,
pp. 432-444.
Burstein, F., Chodorow, M. and Leacock, C. (2004), “Automated essay evaluation: the criterion online
writing service”, AI Magazine, Vol. 25 No. 3, pp. 27-36.
Chen, C.F.E. and Cheng, W.Y.E. (2008), “Beyond the design of automated writing evaluation:
pedagogical practices and perceived earning effectiveness in EFL writing classes”, Language
Learning and Technology, Vol. 12 No. 2, pp. 94-112.
Chou, M.H. (2015), “Impacts of the test of English listening comprehension on students’ English
learning expectations in taiwan”, Language and Curriculum, Vol. 28 No. 2, pp. 191-208.
Chu, S.K.W., Capio, C.M., van Aalst, J.C.W. and Cheng, E.W.L. (2017), “Evaluating the use of a social
media tool for collaborative group writing of secondary school students in Hong Kong”,
Computers and Education, Vol. 110 No. 2017, pp. 170-180.
Cristina, P.B. and Carrillo, C.A. (2016), “L2 collaborative E-writing”, Procedia-Social and Behavioral
Sciences, Vol. 228 No. 2016, pp. 601-607.
Gao, J.W. and Ma, S. (2020), “Instructor feedback on free writing and automated corrective feedback in
drills: intensity and efficacy”, Language Teaching Research, Advance online publication, p. 601,
1362168820915337, doi: 10.1177/1362168820915337.
Huang, A.Q. and Zhang, W.X. (2018), “The effect of automated writing evaluation feedback on
students’ vocabulary revision -taking Pigai.org for example”, Modern Educational Technology,
Vol. 2 No. 8, pp. 71-78, (In Chinese).
Jiang, Y.L.B. (2019), “The development of vocabulary and morphological awareness: a longitudinal
study with college EFL students”, Applied Psycholinguistics, Vol. 40 No. 4, pp. 877-903.
Jin, Y.Q. (2014), “An evidenced-based rearch on the role of web 2.0 in building up college students’
collaborative learning capabilities”, China Educational Technology, Vol. 12 No. 335, pp. 139-145.
Khadka, S. (2020), “Meaning and teaching of writing in higher education in Nepal”, in Bista, K.,
Sharma, S. and Raby, R.L. (Eds), Higher Education in Nepal: Policies and Perspectives,
Routledge, Oxford, pp. 201-213.
Kim, Hee, N., Jung and Ae, M. (2016), “Korean EFL freshman students’ English learning experiences in
high school and in college”, The Journal of Studies in Language, Vol. 32 No. 1, pp. 1-23.
Kramer, I.M. and Kusurkar, R.A. (2017), “Science-writing in the blogosphere as a tool to promote autonomous
motivation in education”, The Internet and Higher Education, Vol. 35 No. 2017, pp. 48-62.
Kukich, K. (2000), “Beyond automated essay scoring, the debate on automated essay grading”, IEEE
Intelligent Systems, Vol. 15 No. 5, pp. 22-27.
Li, G.F. (2019), “The impact of the integrated feedback on students’ writing revision based on the
AWE”, Foreign Language Education, Vol. 40 No. 4, pp. 72-76, (In Chinese).
Li, H. and Liu, R.D. (2011), “Study on the effect of web-based collaborative writing”, Web-Assisted
Education, Vol. 219 No. 7, pp. 67-72.
Liu, H. (2015), “Freshmen’s new words consciousness and its influence on language output
competence measured on a corpus basis”, Higher Education Exploration, Vol. 2015 No. 7,
pp. 91-96, (In Chinese).
Liu, M., Liu, L.P. and Liu, L. (2018), “Group awareness increases student engagement in online
collaborative writing”, The Internet and Higher Education, Vol. 38 No. 2018, pp. 1-8.
Lu, L. (2016), “A study of the second writing process based on an automated essay evaluation tools”,
Foreign Language World, Vol. 2016 No. 2, pp. 88-96, (In Chinese).
Oliver, R.L. (1977), “Effect of expectation and disconfirmation on postexposure product evaluations–
an alternative interpretation”, Journal of Applied Psychology, Vol. 62 No. 4, pp. 480-486.
Oliver, R.L. (1980), “A cognitive model of the antecedents and consequences of satisfaction decisions”,
Journal of Marketing Research, Vol. 17 No. 4, pp. 460-469.
Page, E.B. (1966), “The imminence of grading essays by computer”, Phi Delta Kappan, Vol. 47 No. 5,
pp. 238-243.
Patout, P.A. and Cordy, M. (2019), “Towards context-aware automated writing evaluation systems”,
Presented at Proceedings of The 1st ACM SIGSOFT International Workshop on Education through
Advanced Software Engineering and Artificial Intelligence, Tallinn. doi: 10.1145/3340435.3342722.
Polio, C. and Yoon, H.J. (2018), “The reliability and validity of automated tools for examining variation
in syntactic complexity across genres”, International Journal of Applied Linguistics, Vol. 128
No. 1, pp. 165-188.
Computerassisted EFL
writing
95
LHT
40,1
96
Powers, D.E., Burstein, C., Chodorow, M., Fowles, M.E. and Kukich, K. (2000), Comparing the Validity
of Automated and Human Scoring of Essays, Educational Testing Service, Princeton, NJ.
Pribadi, F.S., Utomo, A.B. and Mulwinda, A. (2017), “Automated short essay scoring system using
normalized simpson methods”, Presented at Engineering International Conference (EIC2017),
Semarang. doi: 10.1063/1.5028081.
Quinlan, T., Higgins, D. and Wolff, S. (2009), Evaluating the Construct-Coverage of the E-Rater®
Scoring Engine, ETS, Princeton, NJ.
Rudner, L.M., Garcia, V. and Welch, C. (2006), “An evaluation of the IntelliMetricSM essay scoring
system”, The Journal of Technology, Learning, and Assessment, Vol. 4 No. 4, pp. 1-22.
Svatava, S., Katerina, R. and Magdalena, R. (2019), “Comparison of automatic and human evaluation
of L2 texts in Czech”, Issledovanija po slavjanskim jazykam, Vol. 24 No. 1, pp. 93-101.
Tang, D. and Sun, Y. (2019), “Automatic scoring method of English composition based on language
depth perception”, Presented at 2019 4th International Seminar on Computer Technology,
Mechanical and Electrical Engineering (ISCME 2019), Chengdu. doi: 10.1088/1742-6596/1486/4/
042045.
Tang, J.L. and Wu, Y. (2011), “Using automated writing evaluation in classroom assessment: a
critical review”, Foreign Language Teaching and Research, Vol. 2011 No. 2, pp. 273-282, (in
Chinese).
Tian, L.L. and Zhou, Y. (2020), “Learner engagement with automated feedback, peer feedback and
teacher feedback in an online EFL writing context”, System, Vol. 91 No. 102247, doi: 10.1016/j.
system.2020.102247.
Van Ryzin, G.G. (2013), “An experimental test of the expectancy-disconfirmation theory of citizen
satisfaction”, Journal of Policy Analysis and Management, Vol. 32 No. 3, pp. 597-614.
Wang, P.L. (2013), “Can automated writing evaluation programs help students improve their English
writing?”, International Journal of Applied Linguistics and English Literature, Vol. 2
No. 1, pp. 6-12.
Wang, Y.J., Shang, H.F. and Briody, P. (2013), “Exploring the impact of using automated writing
evaluation in English as a foreign language university students’ writing”, Computer Assisted
Language Learning, Vol. 26 No. 3, pp. 234-257.
Wilson, J. (2014), “Does automated feedback improve writing quality?”, Learning Disabilities: A
Contemporary Journal, Vol. 12 No. 1, pp. 93-118.
Wilson, J. and Czik, A. (2016), “Automated essay evaluation software in English language arts
classrooms: effects on teacher feedback, student motivation, and writing quality”, Computers
and Education, Vol. 100 No. 2016, pp. 94-106.
Winke, P.M. and Isbell, D.R. (2017), “Computer-assisted language assessment”, in Thorne, S. and May, S.
(Eds), Language, Education and Technology. Encyclopedia of Language and Education, 3rd ed.,
Springer, Cham, pp. 1-13.
Yang, X.Y. and Dai, Y.C. (2015), “An empirical study on college English autonomous writing teaching
model based on www. pigai.Org”, Computer-assisted Foreign Language Education, Vol. 2015
No. 2, pp. 17-23, (In Chinese).
Yang, Y.J. and Yang, Y. (2013), “Blog-assisted process writing for English majors and
pedagogic implications”, Computer-Assisted Foreign Language Education, Vol. 153
No. 2013, pp. 46-51.
Yu, B.B. (2015), “Incorporation of automated writing evaluation software in language education: a case
of evening university”, International Journal of Information and Education Technology, Vol. 5
No. 11, pp. 808-813.
Zheng, R.N. (2014), “A discussion on the connection of English teaching between high school and
college”, Education exploration, Vol. 2014 No. 12, pp. 35-36.
Zhou, H. (2015), “An empirical study of blog-assisted EFL process writing: evidence from
Chinese non-English majors”, Journal of Language Teaching and Research, Vol. 6 No. 1,
pp. 189-195.
About the author
Dr. Zhijie Wang is an associate professor at the College of Humanities and Development at
China Agricultural University. His research interests include computer-assisted language learning,
second language writing and second language education. Zhijie Wang can be contacted
at: lynx17505@cau.edu.cn
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
Computerassisted EFL
writing
97
Download