Assessing Writing 63 (2025) 100909 Contents lists available at ScienceDirect Assessing Writing journal homepage: www.elsevier.com/locate/asw A meta-analysis of relationships between syntactic features and writing performance and how the relationships vary by student characteristics and measurement features Jiali Wang * , Young-Suk G. Kim , Joseph Hin Yan Lam , Molly Ann Leachman University of California, Irvine, CA 92697, United States A R T I C L E I N F O A B S T R A C T Keywords: Meta-analysis Writing Syntax Measurement Genre Linguistic features Students’ proficiency in constructing sentences impacts the writing process and writing products. Linguistic demands in writing differ in terms of both student characteristics and measurement features. To identify various syntactic demands considering these features, we conducted a metaanalysis examining the relationships between syntactic features (complexity and accuracy) and writing performance (quality, productivity, and fluency) and moderating effects of both student characteristics and measurement features. A total of 109 studies (effect sizes: 871; the total number of participants: 24,628) met the inclusion criteria. Results showed that there was a weak relationship for syntactic accuracy (r = .25) and complexity (r = .16). Writers’ characteristics, including grade level and language proficiency, and measurement features, writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures, were significant moderators for certain syntactic features. The findings highlighted the importance of writer and measurement factors when considering the relationships between linguistic features in writing and writing performance. Implications were discussed regarding the selection of syntactic features in assessing language use in writing, gaps in the literature, and significance for writing instruction and assessment. 1. Introduction Writing requires individuals to master a range of cognitive, linguistic, and literacy skills to express themselves appropriately across various settings and contexts (Graham, 2018; Kim & Graham, 2022; Kim et al., 2014; Wagner et al., 2011). Analyzing language use in student writing can inform educators of the specific linguistic demands in writing. The measures and approaches to assessing syntactic features in writing have received much attention recently (e.g., Jagaiah et al., 2020; Kyle & Crossley, 2018; Troia et al., 2019; Wang et al., 2024). Two key syntactic features frequently examined in written composition are syntactic complexity and syntactic accuracy. Syntactic complexity, defined as the variety and degree of sophistication of the syntactic structures at the phrasal and clausal levels (Ortega, 2003), has been widely studied in student writing. Syntactic complexity in written composition reflects one aspect of writers’ language use in writing, and it varies depending on the purpose, genre, topic, and audience of the text. Writers adapt sentence structures to meet * Correspondence to: University of California, 401 E. Peltason Dr., Ste. 3200 Education, Irvine, CA 92697-5500, United States. E-mail addresses: jialiw8@uci.edu (J. Wang), youngsk7@uci.edu (Y.-S.G. Kim), jhylam@uci.edu (J.H.Y. Lam), leachmam@uci.edu (M.A. Leachman). https://doi.org/10.1016/j.asw.2024.100909 Received 31 March 2024; Received in revised form 6 December 2024; Accepted 11 December 2024 Available online 3 February 2025 1075-2935/© 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Assessing Writing 63 (2025) 100909 J. Wang et al. the communicative needs of specific registers (Biber & Gray, 2011). Thus, research on syntactic complexity can reveal how students employ diverse grammatical structures in their writing to address the differential demands of different writing genres. Syntactic accuracy, another critical feature in writing, pertains to the extent to which a text is free from grammatical errors. While extensive research has investigated the relationship between syntactic complexity and writing quality, relatively few studies have explicitly examined the relationship between syntactic accuracy and writing quality (e.g., Troia et al., 2019; Wang et al., 2024). Nevertheless, syntactic accuracy has been shown to be an important component of writing quality for both monolingual and bilingual writers (Troia et al., 2019). This meta-analysis systematically examined the relationships between syntactic features (i.e., syntactic complexity and syntactic accuracy) in writing and writing performance (i.e., writing quality, productivity, and fluency). Furthermore, it examined how these relationships are moderated by student characteristics (grade level and language proficiency in the target language) and measurement features (writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures). 1.1. Theoretical framework The role of language skills in writing has been recognized in several theoretical models. According to the cognitive model of writing (Hayes, 1996) and the writer(s)-within-community model of writing (Graham, 2018), linguistic knowledge is stored in students’ long-term memory and is accessed during the writing process, particularly when writers translate their ideas into sentences (Berninger et al., 2002). In the current study, we used the Direct and Indirect Effects Model of Writing (DIEW; Kim & Graham, 2022) as the primary theoretical framework. This model explicitly recognizes the role of various language skills and discourse knowledge in writing. Additionally, DIEW specifies the nature of relationships among component skills and writing, such as dynamic relationships between component skills (e.g., language skills) and writing outcomes by student characteristics and measurement features. Grammatical knowledge, as a critical component of language skills, enables students to express their ideas accurately, effectively, and richly. High-quality writing usually requires the flexible use of a range of syntactic features from simple to complex, depending on the goal and audience of the writing task (Beers & Nagy, 2009). Limited proficiency in constructing complex sentences may inhibit students from translating thoughts into sentences, whereas students with more grammatical knowledge can allocate more cognitive resources to other crucial aspects of the writing process, such as ideation, which ultimately improves the overall quality of their writing (Graham, 2018). Discourse knowledge is also crucial for writing quality because different genres impose specific demands on text structure and language use (Kim & Graham, 2022). Writers acquire knowledge of these genre-specific expectations as they learn to write. Certain syntactic features serve distinct functions for particular purposes within specific genres (Biber & Gray, 2011; Schleppegrell, 1998). Thus, to navigate writing in a particular genre, writers need knowledge of text structure and linguistic features associated with that genre. 1.2. Syntactic features and writing quality Limited attention has been paid to explicitly examining the relationship between syntactic accuracy in written composition and writing performance. Prior studies have found that syntactic accuracy, operationalized as the percentage of grammatical sentences, was associated with narrative writing quality among upper elementary school students (Troia et al., 2019; Wang et al., 2024). Similarly, an analysis of TOEFL essays written by English as a Foreign Language (EFL) learners revealed that students with higher holistic scores exhibited greater syntactic accuracy (Cumming et al., 2005). So far research has shown no significant differences in syntactic accuracy across genres or writing tasks (Cumming et al., 2005; Scott & Windsor, 2000; Wang et al., 2024). The relationship between syntactic complexity in written composition and writing performance has been more extensively explored. Early research on syntactic complexity aimed to assess students’ grammatical knowledge, particularly among those with limited proficiency in the target language. T-unit length, one of the most commonly examined measures of syntactic complexity, was found to strongly correlate with grammatical knowledge (Loban, 1976). Studies investigating the relationship between syntactic complexity and writing quality have predominantly focused on young monolingual children or bilingual writers with limited language proficiency in the target language. These studies hold the potential for identifying students’ language needs more efficiently. However, questions remain regarding the relationship between syntactic complexity and writing quality primarily for two reasons. First, findings across studies have been inconsistent. Some studies have reported a positive relationship between the two constructs, whereas others found no such association (e.g., Beers & Nagy, 2009; Grobe, 1981; Ortega, 2003; Stewart & Grobe, 1979). Second, the relationship varies depending on several factors, such as grade level, language proficiency in the target language, writing genres, and the type of syntactic complexity measures employed. In the sections that follow, we introduce potential moderators related to student characteristics and measurement features that may influence the relationship between syntactic features and writing outcomes. 1.3. Student characteristics 1.3.1. Grade level as a proxy for developmental phase As foundational language acquisition progresses throughout childhood, students’ ability to construct and formulate sentences develops (Ortega, 2003). Thus, grade level is an important factor influencing syntactic complexity in written compositions. In the U.S. context, studies found that students of higher grade levels use more complex syntax in their writing (Beers & Nagy, 2011; Crowhurst, 1980; Crowhurst & Piche, 1979; Crossley et al., 2011; Ferris, 1994; Wagner et al., 2011). As students advance through school, they write longer T-units, produce more clauses, and demonstrate greater diversity in clause types (e.g., Beers & Nagy, 2011; Crossley et al., 2 Assessing Writing 63 (2025) 100909 J. Wang et al. 2011; Ferris, 1994; Wagner et al., 2011; Loban, 1976). This developmental pattern extends to adult writers as well (e.g., MacArthur et al., 2019). DIEW hypothesizes differential relationships between language skills and writing quality depending on developmental stages (dynamic relationships as a function of development; Kim & Graham, 2022). As transcription skills improve and become more automatic, other skills and knowledge, including syntactic complexity and accuracy, are expected to play a greater role in influencing writing quality. However, this does not imply that syntactic skills are unimportant in earlier phases of development. Instead, their impact is often constrained by transcription skills, such that variations in syntactic skills are not fully reflected in written composition during the early stages of writing development. The current study focuses on syntactic features in written compositions, where transcription skills are already accounted for. Therefore, the relationship between syntactic features and writing quality is not expected to be weaker in earlier phases. In fact, this relationship might be stronger during earlier phases of development, as there is likely to be greater variation in syntactic features in written composition during an earlier phase as children rapidly develop syntactic skills. 1.3.2. Language proficiency of the target language According to DIEW, language skills are crucial for producing high-quality writing (Kim & Graham, 2022). The relationship between language skills and writing ability has been found to be prominent in EFL literature (Lu, 2017). The relative importance of language skills in writing is likely to be greater in the context where individuals have limited and developing language skills in the language they are writing in. For example, in the context of foreign language learning, when bilinguals write in a language they have limited proficiency in, they may have difficulties in using various syntactic structures flexibly and appropriately. Research in EFL contexts has reported more consistent results regarding the relationship between syntactic complexity and writing quality compared to English as a second language or monolingual-dominant contexts (Lu, 2017). In studies observing writing performance of learners writing in a language that they have limited proficiency in, it was found that higher-scoring essays exhibited longer and more varied syntactic structures (Li, 2015; Lu, 2011; Ortega, 2003), greater clausal subordination (Biber et al., 2016; Grant & Ginther, 2000; Li, 2015), more complex clause structure (Crossley & McNamara, 2014; Li, 2015), more complex phrasal structures (Crossley & McNamara, 2014; Guo et al., 2013; Kyle & Crossley, 2018; Taguchi et al., 2013), and a higher incidence of passive structures (Biber et al., 2016; Ferris, 1994). These findings were derived using a variety of syntactic complexity measures at the phrasal, clausal, T-unit, and sentence levels, all of which were generated by software. Notably, most of these studies focused on adult learners, and fewer studies have examined younger learners. This highlights a critical gap in the literature that warrants further exploration to better understand how language proficiency influences the relationship between syntactic features and writing quality in younger populations. 1.4. Measurement features When examining relationships between skills, it is essential to consider the role of measurement and how the relationships may differ as a function of measurement methods (see dynamic relationships as a function of measurement; Kim & Graham, 2022). This study focuses on the measurement of writing in terms of writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures employed, acknowledging that differences in these constructs may lead to varying findings. 1.4.1. Writing genres Empirical evidence suggests that the syntactic demands of writing vary across genres (e.g., Beers & Nagy, 2009, 2011; Olinghouse & Wilson, 2013; Scott & Windsor, 2000; Wang et al., 2024). A literature review found that argumentative writing exhibited the highest values for T-unit length and clauses per T-unit (Jagaiah et al., 2020). Narrative writing, in contrast, tends to contain less complex syntax (e.g., Crowhurst & Piche, 1979). Argumentative writing often requires more complex syntactic structures to efficiently convey meaning and integrate competing or complementary ideas within a single T-unit (Jagaiah et al., 2020). For example, Beers and Nagy (2011) found that argumentative essays contained more clauses per T-unit than narrative, compare/contrast, and expository texts. Prior studies have also found that expository genre may have higher demands in syntactic complexisy. For example, Beers and Nagy (2011) found that expository texts had more words per clause than argumentative and narrative texts, though this was only observed among Grade 7 students. Similarly, Scott and Windsor (2000) reported that expository writing demonstrated greater syntactic complexity (measured by T-unit length) than narrative writing in students aged 9–11. Additionally, Wang et al. (2024) found that syntactic complexity measured by T-unit length, clauses per T-unit, and frequency of adverbial and relative clauses, are higher in opinion genre than in narrative genre among fourth-grade students. Taken together, prior research suggests that informational genre (e.g., argumentative, expository, opinion) tends to have higher demands on syntactic complexity than narrative genre, especially when syntactic complexity is measured by T-unit length and clauses per T-unit. Research also shows that syntactic features are associated with writing quality differently across genres. For example, Wang et al. (2024) found that syntactic accuracy was associated with writing quality in the narrative task, but syntactic complexity (measured by clauses per T-unit) was associated with writing quality in the opinion task among fourth-grade students. Qin and Uccelli (2016) found that words per clause was related with writing quality in argumentative writing but not narrative writing for Chinese secondary EFL learners. Additionally, Grobe (1981) analyzed T-unit length, words per clause, and clauses per T-unit in narrative writing by students in Grades 5, 8, and 11 and found no significant relationship between syntactic complexity and writing quality in the narrative genre. Beers and Nagy (2009) further explored the three different measures of syntactic complexity in narrative and argumentative genres for Grade 7 and 8 students. However, they found that T-unit length was positively correlated with writing quality in the narrative task but 3 Assessing Writing 63 (2025) 100909 J. Wang et al. was negatively correlated with writing quality in the argumentative task. The authors attributed the negative correlation to the repetitive use of subordinate clauses (e.g., “I think X because Y”) in argumentative writing, which does not necessarily enhance elaboration or argumentation quality. In summary, prior research indicates that the relationship between syntactic complexity and writing quality varies by genre. Informational genres may be associated with higher demands in syntactic complexity. However, findings remain inconsistent across studies, underscoring the need for further investigation to understand the nuanced ways in which genre-specific syntactic demands influence writing quality. 1.4.2. Writing outcomes Previous studies have shown that writing quality, writing productivity (e.g., text length), and writing fluency are interrelated but dissociable skills (Kim et al., 2014; Puranik et al., 2008; Wagner et al., 2011). Importantly, the contributions of language and literacy skills differ depending on specific writing outcomes being assessed (see the dynamic relationships as a function of measurement of DIEW, Kim & Graham, 2022). For example, language skills were found to play a greater role in writing quality than writing productivity, whereas transcription skills were more important to writing productivity (Kim & Graham, 2022). By investigating how syntactic features relate to various writing outcomes, we can better understand how syntactic demands may limit various aspects of writing performance. 1.4.3. Whether the writing task is text-based The measurement of constructs also includes the nature of the writing tasks (Kim & Graham, 2022). For example, when writing tasks involve reading source materials, reading comprehension skills become particularly important to the writing process and outcomes. Language demands in text-based and non-text-based tasks may differ in the following aspects: (1) text-based tasks require writers to summarize, manipulate, and reorganize source content, whereas non-text-based tasks primarily depend on production skills; (2) text-based tasks demand both comprehension and production, while non-text-based tasks only demand production (Cumming et al., 2005; Kim & Crossley, 2018). Because of differential linguistic demands associated with the two types of tasks, it is possible that the relationships between syntactic features and writing outcomes differ by whether the writing task is text-based or not. Prior studies have found that linguistic features predict writing quality differently in text-based writing tasks and non-text-based writing tasks (Guo et al., 2013; Kim & Crossley, 2018). However, relatively few studies have focused on linguistic features in text-based argumentative writing (Cumming et al., 2016). For example, Guo et al. (2013) found that noun phrase complexity was significantly associated with writing quality in both source-based and non-source-based tasks, whereas subordination was negatively correlated with writing quality only in the non-text-based writing task. Kim and Crossley (2018) found that words per clause was related to writing quality in both tasks, words per sentence and words per T-unit were only related to the text-based task, and complex nominals per clause and coordinate phrases per clause were only related to the non-text-based task. Despite these findings, the limited number of studies on source-based writing tasks prevents definitive conclusions about how syntactic skills may differentially contribute to the writing of different task types. 1.4.4. Syntactic complexity measurement Researchers have utilized numerous syntactic measures to explore the relationship between syntactic complexity and writing, making cross-study comparisons challenging. Traditional syntactic complexity measures, such as T-unit length, clause length, and the number of clauses per T-unit, are commonly used. However, these measures operate at relatively larger grain sizes (e.g., T-unit or clause level), limiting their ability to capture specific syntactic complexity features of a text. Consequently, interpreting these measures can be challenging (Norris & Ortega, 2009). The reliance on traditional measures may partially explain the inconsistent findings in the literature examining the relationship between syntactic complexity and writing quality. Kyle and Crossley (2018) used a variety of syntactic complexity indices at the phrase, clause, and sentence levels. They found that the phrase- and clause-level measures accounted for more variance in holistic writing quality scores in TOEFL independent argumentative essays than a traditional measure (words per clause). Their findings suggest that fine-grained measures may explain greater variance in writing quality and exhibit stronger relationships with writing quality. Given the potential limitations of traditional syntactic complexity measures, studies in second-language writing began to examine syntactic complexity at the clause and phrase levels (e.g., Biber et al., 2016; Crossley & McNamara, 2014). Moreover, given that distinct syntactic complexity measures tap into different aspects of syntactic complexity, multiple studies (Beers & Nagy, 2009, 2011) have suggested that different syntactic complexity measures perform differentially when evaluating the same writing prompt. As a result, the relationship between syntactic complexity measures and writing quality may differ by the type of syntactic complexity measures used, highlighting the need to investigate how using different measures may influence this relationship. 1.5. The current study The study employs a meta-analysis approach to systematically examine the relationships between syntactic features and writing quality, as well as the potential moderating effects of student characteristics and measurement features. While previous review articles (Crossley, 2020; Crowhurst, 1983; Lu, 2017; Jagaiah et al., 2020) have synthesized research on this topic, they have not quantitatively estimated the magnitude of the relationships between syntactic features and writing outcomes and have only discussed moderating effects to a limited extent. To provide a more comprehensive understanding of the relationships between syntactic features and writing performance, this study aims to estimate the magnitude of these relationships and investigate factors that may influence them. By identifying which syntactic features are most critical for various writing tasks and student populations, this study can inform educators 4 Assessing Writing 63 (2025) 100909 J. Wang et al. about the language skills needed to support proficient writing across different genres and contexts and how to assess language use in writing with considerations of student characteristics and measurement features. The following research questions guide the present study: 1. What is the magnitude of the relationships between syntactic features (accuracy and complexity) and writing performance? 2. Do the relationships between syntactic features and writing vary as a function of student characteristics (grade level and language proficiency) and measurement features (writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures)? 2. Methods 2.1. Search procedures and inclusion criteria The primary literature search was conducted using the following electronic databases: Educational Resources Information Center (ERIC), APA PsycInfo, Linguistics and Language Behavior Abstracts (LLBA), Dissertations & Theses Global, ProQuest Dissertations & Theses A & I, and Sociological Abstracts, all accessed via ProQuest. The search covered studies published from January 1, 1960, to December 31, 2021, as research on syntactic features in writing began in the 1960s (Crowhurst, 1983). No restrictions were placed on participant age or publication types. The Boolean search terms were as follows: “ab(("gramma* complex*" OR "complex* gramma*" OR "synta* complex*" OR "complex* synta*" OR "MLU" OR "text complex*" OR "sentence complex*" OR "t-unit*" OR "synta* density" OR "claus* density" OR "synta* accur*" OR "gramma* accur*")) AND ab((writ*))”. The initial search yielded 2951 articles. Inclusion criteria were as follows: (a) both syntactic features and writing quality were measured; (b) participants of various levels of language proficiency and disability statuses (e.g., ADHD, developmental language disorder, dyslexia) were included, but studies focused on populations diagnosed with severe disabilities and sensory impairments (e.g., traumatic brain injuries, aphasia, or Down syndrome) were excluded; (c) studies had sample sizes exceeding four participants; (d) studies reported sufficient data to calculate effect sizes (zero-order correlations); otherwise, authors were contacted; (e) studies were published in English; and (f) data were not impacted by interventions. 2.2. Study selection and exclusion The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA; Page et al., 2021) chart (Fig. 1) summarizes the study selection process. After removing 772 duplicates from the initial 2951 articles, 2179 studies underwent title/abstract screening using Rayyan (Ouzzani et al., 2016). During title/abstract screening, 622 additional articles were excluded based on the following criteria: (1) did not include any writing task (n = 283); (2) were methodological or theoretical articles (n = 109); (3) did not include syntactic features (n = 56); (4) were of incorrect publication type, such as book reviews and letters to the editor (n = 51); (5) conducted secondary analysis using a public corpus (n = 35); (6) were qualitative studies (n = 25); (7) were review studies (n = 31); (8) had four or fewer participants (n = 17); and (9) focused on a clinical population with specific needs/conditions (n = 15). An interrater agreement of 95 % was achieved in the title/abstract screening stage using 4 % of all articles (n = 120). Discrepancies were resolved through discussion before coders began independent coding. Studies included during the title/abstract screening were screened via full-text review, with interrater reliability of 95.45 % on 10 % of the included articles (n = 154). Zero-order correlation values were calculated for articles that reported raw scores (n = 7). In addition, for all articles published within the past ten years (starting from 2012), authors of the articles that clearly measured our target skills but did not report correlations were contacted for information on zero-order correlations. Beyond electronic database searches, we also searched for articles from other sources. First, the reference lists of literature review articles on the topic were checked (Crossley, 2020; Crowhurst, 1983; Jagaiah et al., 2020; Lu, 2017; Ortega, 2003). We also manually searched three journals likely to publish research articles on the topic, including Journal of Second Language Writing, Reading & Writing, and Journal of Speech, Language, and Hearing Research. A total of 121 studies were included through database searching, reference chasing, and journal searching. 2.3. Coding procedures All eligible studies were coded for effect sizes, sample size, and sample characteristics, including age, language proficiency (language learning context and language proficiency status), socio-economic status, gender, race, primary language, and disability status. We also coded types of syntactic complexity measures, writing genres, writing outcomes, language of writing samples, and reliability information of the syntactic and writing performance measures. Two PhD students with backgrounds in linguistics double-coded all studies. Interrater reliability (exact agreement) using 20 % of the randomly selected studies (n = 24) was 90.8 %. Discrepancies were discussed and resolved by the same raters. For the analysis, measures appearing in fewer than three studies (e.g., relative clauses per T-unit, adverbial clauses per T-unit) or those operationalized as mixed syntactic complexity features were excluded. Twelve studies were removed during this stage, leaving 109 studies for data analysis. Measures of words per clause and words per T-unit were combined into a single category, “words per unit”, because both measured length and their effect sizes were very similar. 5 Assessing Writing 63 (2025) 100909 J. Wang et al. Fig. 1. PRISMA flow diagram showing the searching and screening processes. 6 Assessing Writing 63 (2025) 100909 J. Wang et al. 2.3.1. Student characteristics Grade level was coded both as a categorical variable and a continuous variable. We used a categorical variable because the moderating effect of grade level may not be linear. Grades were divided into five categories: primary grades (Grades K-2), upper elementary (Grades 3–5), middle school (Grades 6–8), high school (Grades 9–12), and adults (undergraduate level and beyond). We also used a continuous variable to test linear growth and as a sensitivity analysis. For the continuous variable, the weighted average grade level of the sample was used. Language proficiency in the target language was operationalized in two ways. First, the language learning context of the sample was recorded. Foreign language context is when writers were learning in a foreign context where the target language in the writing task was not the official language of the country (e.g., students taking an English course in China). Second language context refers to writers in a context where the target language was the official language of the location, but students were identified as being second language learners with limited proficiency in the target language (e.g., English as a Second Language courses in the U.S. context). Contexts with monolingual and bilingual students (e.g., U.S. classrooms with both monolingual and bilingual learners) were coded as mixed, and contexts with only monolingual speakers were coded as monolingual. 2.3.2. Measurement features Multiple measurement features were examined, including writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity features. Writing genres were coded into the following categories: narrative, argumentative, expository, compare and contrast, problem solution, informational (did not specify which subgenre or uses multiple different subgenres of informational tasks), mixed, and others (e.g., poem, diary). Due to limited effect sizes, compare and contrast and problem-solution were merged with the informational category. Writing outcomes were coded into writing quality, productivity, and fluency. Writing quality includes either holistic or analytic scoring (commonly examined analytic aspects include content, organization, coherence, and language use; writing conventions, such as punctuation and capitalization, were not counted toward writing quality). Writing productivity measures included measures of different grain sizes, such as the number of clauses/T-units/sentences. Writing fluency refers to productivity during a certain amount of time. Writing productivity measures with a time limit of five minutes or less were coded as writing fluency. Types of syntactic complexity measures were coded (see Supplementary material 1). Noun phrase complexity included word per noun phrase, complex nominal phrase/T-unit, complex nominal phrase/clause, incidence of prepositions, and number of modifiers per noun phrase. Subordination measures included dependent clauses per clause, dependent clauses per T-unit, number of clauses per Tunit, subordinating conjunctions, and percentage of sentences containing embedded clauses. 2.4. Data analysis strategies To address the research questions, effect sizes were standardized using Fisher’s z scores, calculated based on the sample size and reported correlation values. This standardization ensured that no sample was over- or underrepresented due to variations in sample size. Additionally, sample variances were calculated and used instead of raw sample sizes in all analyses to account for differences in study precision. Analyses were conducted using R with the metafor (Viechtbauer, 2010) and robumeta (Hedges et al., 2010) packages. The robumeta package was employed to account for the nested nature of the data, as many studies included multiple syntactic complexity measures evaluated within the same sample. This robust variance estimation technique reduces the risk of Type 1 errors by accounting for dependencies within the dataset. Unlike traditional meta-analytic methods that assume independence among effect sizes, robumeta is designed to handle data structures with interrelated measures. Publication bias was assessed using funnel plots and Egger’s regression test (Sterne & Egger, 2005), which are designed to identify asymmetries that may indicate selective reporting of significant findings. To answer the first research question, weighted effect sizes were calculated for each type of syntactic measure. When a study reported multiple syntactic measures, the mean effect size was computed for those measures using the robumeta package. For the second research question, meta-regressions were run where moderators were tested. The analyses were conducted separately for each syntactic complexity measure to account for variability in how these measures relate to writing performance. In meta-regression analyses, we designated each category of the moderator as the reference group in turn to cover all possible comparisons. For n categories, we conducted n - 1 separate analyses, rotating which category served as the reference group each time. Categories or analyses with insufficient effect sizes (df < 4) were excluded from moderation analyses due to the limited statistical power. Left embeddedness was not included in the moderation analysis due to a small number of subsamples available (k = 7). 3. Results 3.1. Characteristics of included studies This meta-analysis included 109 studies (see Supplementary material 2 for a list of the included articles) with 871 effect sizes clustered within 162 unique samples and a total of 24,628 participants (individual study sample sizes ranged from 5 to 2916). The majority of studies were conducted in the U.S. (n = 68), followed by China (n = 7), Japan (n = 5), and the U.K. (n = 3). The publication years of the included studies spanned from 1977 to 2022. Of the 162 unique samples, 81 were from journal articles, 55 from dissertations, 11 from book chapters, nine from research reports or conference presentations, and six from other types of studies. 7 Assessing Writing 63 (2025) 100909 J. Wang et al. Among these samples, 101 reported students’ gender distribution, indicating a balanced composition of male and female students, and 147 reported students’ grade levels. Seven samples provided information about the number of students with learning disabilities, and 25 samples excluded students with learning disabilities. The overall socioeconomic status (SES) data were available for 29 samples, with nine categorized as middle SES, six as medium and low SES, five as medium and high SES, five as mixed SES, and four as low SES. Of the 162 unique samples, 90 samples provided information about language learning context: 37 in a Foreign Language context, 14 in a Second Language context, 19 in mixed contexts, and 20 in monolingual contexts. Language proficiency was reported for 84 samples, 44 of which predominantly included students with limited language proficiency in the target language. Of the 156 unique samples that described writing prompt formats, 26 utilized text-based prompts requiring students to read materials before writing. Writing genre data indicated that 38 samples used argumentative tasks, 37 used narrative tasks, 32 used expository tasks, 15 used informational tasks (without specifying subgenres or including multiple subgenres), 15 involved mixed writing tasks, six used compare and contrast, five used other genres, one used a problem-solution task, and 16 did not report the genre of the writing task. Of the 147 samples that provided information on writing prompts, only 22 used norm-referenced tasks. Reliability data for writing outcomes were reported in 90 out of 162 samples, while only 49 provided reliability information for syntactic features. Most samples (n = 147; 90.7 %) utilized English writing tasks. Only 21 samples specified that a digital writing format was used. Of the 38 samples that provided information about whether spelling was corrected, eight corrected spelling before generating writing outcome measures. 3.2. Research question 1: magnitude of relationships between syntactic features and writing outcomes The magnitudes of zero-order correlation for each syntactic measure were reported in Table 1. Syntactic accuracy (b = .25, SE = .07; p = .001) and overall syntactic complexity (b = .16, SE = .02, p < .001) had a weak but significant relationship to writing outcomes. Among individual syntactic complexity measures, all except left embeddedness showed weak but significant relationships to writing outcomes, including noun phrase complexity (b = .24, SE = .04; p < .001), words per unit (b = .19, SE = .03; p < .001), subordination (b = .14, SE = .03, p < .001), and words per sentence (b = .10, SE = .04, p = .02). 3.3. Research question 2: factors that moderate the relationships between syntactic features and writing outcomes 3.3.1. Moderation effects for syntactic accuracy 3.3.1.1. Student characteristics. When grade level was operationalized as a categorical variable, there was a stronger correlation between syntactic accuracy and writing performance among primary grade students (r = .35, SE = .07, p = . 03 (.17 plus .18—see Table 2)) compared to adults (r = .18, SE = .07, p = .02; see intercept in the first panel of Table 2). In addition, the correlation was stronger for primary grades students (r = .35; SE = .10; p = .04) than upper elementary grades students (r = .07; SE = .10; p = .52). However, when grade level was operationalized as a continuous variable, no difference in the relationship’s strength was detected. As for the moderating effects of language proficiency, the relationship between syntactic accuracy and writing outcome was weaker for monolingual contexts (r = .03, SE = .17, p = .03) compared to FL contexts (r = .47; SE = .13, p = .008). Language proficiency status did not moderate the relationship. 3.3.1.2. Measurement features. Neither writing genres nor writing outcomes moderated the relationship between syntactic accuracy and writing outcomes. 3.3.2. Moderation effects for noun phrase complexity Due to limited effect sizes, fewer moderators were examined for noun phrase complexity (see Table 3). Among the moderators examined, all were found to be significant. 3.3.2.1. Student characteristics. As for language proficiency, the relationship was weaker for samples with predominantly students who are proficient in the target language limited language proficiency students (r = -.01, SE = .06, p = .005) compared to samples with predominantly students with limited language proficiency in the target language (r = .26, SE = .04, p < .001). 3.3.2.2. Measurement features. Writing genre significantly moderated the relationship between noun phrase complexity and writing outcomes. The relationship was stronger for expository texts (r = .43, SE = .12, p = .05) compared to argumentative texts (r = .15, SE = .07, p = .08). Additionally, the relationship between noun phrase complexity and writing performance was also moderated by how writing outcome was measured: the relationship was weaker when writing productivity was the outcome (r = .11, SE = .08, p = .03) compared to writing quality (r = .29, SE = .06, p < .001). 3.3.3. Moderation effects for words per unit 3.3.3.1. Student characteristics. Grade level did not moderate the relationship between words per unit and writing outcomes (see Table 4). As for language learning context, the relationship was stronger for students in mixed context (r = .25, SE = .07, p = .03) than monolingual context (r = .09, SE = .04, p = .06). 8 Assessing Writing 63 (2025) 100909 J. Wang et al. 3.3.3.2. Measurement features. Writing genre significantly moderated the relationship between words per unit and writing outcomes. The relationship was weaker for argumentative texts (r = .07, SE = .08, p = .008) and informational texts (r = .05, SE = .09, p = .02) than expository texts (r = .28, SE = .07, p < .001). 3.3.4. Moderation effects for words per sentence No significant moderation effects were found for the relationship between words per sentence and writing outcomes (see Table 5). 3.3.5. Moderation effects for subordination 3.3.5.1. Student characteristics. The relationship between subordination and writing outcomes was weaker in mixed contexts (r = .09, SE = .09, p = .03) compared to FL contexts (r = .14, SE = .07, p = .06; see Table 6). 3.3.5.2. Measurement features. Whether the writing task is text-based or not was a significant moderator: writing tasks that included source materials had a stronger relationship between subordination and writing outcomes (r = .24, SE = .06, p = .03) compared to tasks without source materials (r = .08, SE = .03, p = .01). 3.3.6. Average correlations by student characteristics and measurement features Average correlations by student characteristics and measurement features can be found in Tables 7–11. 3.3.7. Measurement of syntactic complexity measures Moderation effects of the type of syntactic complexity measures are reported in Table 12. The relationships between syntactic complexity and writing outcomes differed for words per sentence, whereas no difference was found across other measures. The relationship between writing outcome and words per sentence (r = .09; SE = .04, p = .03). was weaker than the relationship between writing outcome and words per unit (r = .19, SE = . 03, p < .001). 3.4. Sensitivity analysis Sensitivity analysis was conducted to understand (1) whether the relationships between syntactic features and writing outcomes differ by the language of writing, and (2) what are potential reasons for nonsignificant moderation effects. Given that most studies included involved writing tasks conducted in English, the effect sizes were insufficient to test the language of writing tasks as a moderator. Nonetheless, we conducted a sensitivity analysis to explore whether the relationship between syntactic features and writing outcomes differed by language. Interpretation of this sensitivity analysis should be taken with caution, as all languages other than English were grouped into a single category despite differences in syntactic features and functions across languages. In addition, most of the moderation effects for words per unit, subordination, and words per sentence were not statistically significant, which may be attributed to excess variability in effect sizes. We conducted sensitivity analyses based on study characteristics –publication year, reporting of reliability, and publication type –for key moderators (grade level, language proficiency status, writing genres, and writing outcomes) to understand potential sources of variations that may have masked significant moderators. These sensitivity analyses were restricted to words per unit and subordination measures due to the limited effect sizes for other syntactic measures. 3.4.1. Language of the writing task The overall weighted effect sizes estimations for the relationship between syntactic complexity and writing performance were nearly identical for studies including non-English writing tasks (r = .16, SE = .02, p < .001) and those excluding non-English tasks (r = .15, SE = .06, p = .03). For syntactic accuracy, there were no sufficient effect sizes to estimate the relationship when non-English writing tasks were excluded. Table 1 The relationships of various syntactic features with writing outcomes. Variable Syntactic accuracy Syntactic complexity Noun phrase complexity Words per unit Subordination Words per sentence Left embeddedness r (df) SE CI.LB CI.UB p .25(30) .16(139) .24(11) .19(96) .14(59) .10(12) .10(4) .07 .02 .04 .03 .03 .04 .04 .11 .13 .15 .13 .07 .02 − .02 .40 .20 .33 .24 .20 .18 .22 .001** < .001*** < .001*** < .001*** < .001*** .02* .08 * p < .05. p < .01. *** p < .001. ** 9 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 2 Multilevel random effects model: meta-regression of moderators of syntactic accuracy. Moderator Intercept(SE) Grade level (categorical) Reference: Adults Intercept High school Middle school Upper elementary grades Primary grades Reference: High school Intercept Adults Middle school Upper elementary grades Primary grades Reference: Middle school Intercept Adults High school Upper elementary grades Primary grades Reference: Upper elementary Intercept Adults High school Middle school Primary grades Grade level (continuous) β(SE) df p CI.LL CI.UL .25(.17) .07(.18) − .11(.12) .17*(.07) 12 14 18 10 12 .02 .20 .71 .39 .03 .03 − .14 − .32 − .38 .02 .33 .64 .46 .16 .31 .25(.18) .18(.24) .36(.20) .08(.17) 7 14 14 11 7 .04 .20 .47 .10 .66 − − − − .02 .63 .69 .79 .48 .83 .14 .34 .07 .32 − .07(.18) .18(.24) − .18(.20) .10(.17) 8 17 14 11 8 .18 .71 .47 .38 .58 − − − − − .15 .46 .34 .62 .30 .64 .32 .69 .26 .49 .52 .39 .10 .38 .04 .25 .40 − − − − .01(.01) 5 10 11 11 5 11 15 .19 .16 .07 .26 .02 − .13 − .02 .33 .38 .79 .62 .54 .44 .04 .11(.23) − .44*(.17) 8 8 9 .008 .65 .03 .16 − .41 − .82 .78 .62 − .07 − .14(.22) 10 13 .01 .54 .11 − .62 .69 .34 .04(.14) .01(.13) 13 10 5 .12 .79 .76 − .05 − .28 − .55 .43 .35 .43 − .04(.14) − .10(.18) 6 10 6 .05 .79 .60 .00 − .35 − .53 .44 .28 .33 − .17(.10) 23 33 < .001 .10 .14 − .37 .45 .03 .18*(.07) . 43(.17)* − − − − .25(.17) .07(.10) .11(.12) .36(.20) .18(.20) .28*(.10) .16(.13) Language learning context Reference: Foreign language Intercept Second language Monolingual Language proficiency in the target language Reference: Limited Intercept Proficient Measurement features: writing genres Reference: Narrative Intercept Expository Argumentative Reference: Expository Intercept Narrative Argumentative Measurement features: writing outcomes Reference: Quality Intercept Productivity .47**(.13) .40*(.13) .19(.11) .22(.09) .30***(.08) Note. Analyses or results were not included if df <4; * p < .05 ** p < .01 *** p < .001. 3.4.2. Publication year The development of automated tools for generating linguistic features, coupled with the increasing adoption of digital writing tasks, has likely contributed to improvements in the precision and reliability of syntactic feature measurements. Since this metaanalysis includes articles dating back to the 1970s, it is likely that advancements in technology and methodologies over time have enhanced the accuracy and consistency of syntactic analyses. To examine whether these advancements influenced the findings, we restricted the analysis to studies published between 2009 and 2022, a period marked by the widespread use of automated tools such as Coh-Metrix, L2SCA, and TAASSC. However, applying this restriction did not alter the results of moderation effects, suggesting that publication year is unlikely to account for the lack of significant moderation effects observed in the analyses. 3.4.3. Reporting of reliability Studies that did not report reliability may have low reliability with their measures. To evaluate whether the lack of reported 10 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 3 Multilevel random effects model: meta-regression of moderators of noun phrase complexity. Moderator Language proficiency Reference: limited Intercept Proficient Measurement features: writing genres Reference: Argumentative Intercept Narrative Expository Measurement features: writing outcomes Reference: Quality Intercept Productivity Intercept(SE) .26***(.04) .15(.07) .29***(.06) β(SE) df p CI.LL CI.UL − .27**(.06) 10 6 < .001 .005 .17 − .42 .35 − .11 − .05(.10) .28*(.12) 5 6 8 .08 .64 .05 − .02 − .29 .00 .32 .19 .57 − .18*(.08) 15 14 < .001 .03 .15 − .34 .43 − .02 Note. Analyses or results were not included if df <4. * p < .05. ** p < .01. *** p < .001. reliability influenced the observed moderation effects, we conducted a sensitivity analysis only including studies that reported reliability information. When limiting the analysis to studies that have reported reliability for syntactic measures (32 studies), a significant moderation effect of genre was observed. Specifically, the relationship between words per unit and writing outcomes was weaker in informational genre (r = -.06, SE = .06, p = . 009) compared to narrative genre (r = .16, SE = .05; p = . 01). Other results remained the same. When limiting the analysis to studies that reported reliability for writing outcomes (65 studies), a significant moderation effect of grade level was identified. The relationship between words per unit and writing outcomes was weaker for primary grade students (r = .03, SE = .05, p = . 004) than for adults (r = .19, SE = .05, p = . 001). In addition, a significant moderation effect was found for the type of writing outcome. The relationship between word per unit and writing outcomes was weaker for writing fluency (r = .10, SE = .03, p < .001) compared to writing quality (r = .21, SE = .03, p < .001). Other results remained the same. 3.4.4. Publication type We also conducted a sensitivity analysis by only including peer-reviewed journal articles to assess whether publication type influenced the results. By limiting the analyses to journal articles, a significant moderating effect of genre emerged. The relationship between words per unit and writing outcomes was stronger for expository texts (r = .48, SE = .05, p = . 009) compared to narrative texts (r = .18, SE = .10, p = . 004). Additionally, a significant moderation effect was found for the relationship between subordination and writing outcomes. Specifically, subordination showed a stronger relationship with productivity (r = .11, SE = .04, p = .05) than with quality (r = .00, SE = .05, p = .97). These results suggest that some variability in findings from non-peer-reviewed studies may have masked significant moderation effects. However, overall, traditional syntactic complexity measures did not exhibit sensitivity to changes in student characteristics. To further evaluate potential publication bias, we conducted Egger’s test and created a funnel plot (see Fig. 2). The funnel plot was symmetrical, and Egger’s test was not statistically significant (b = .20, p = .51), indicating no evidence of publication bias. 4. Discussion The primary purpose of this meta-analysis was to examine the relationships between syntactic features in writing and writing outcomes and to investigate how the relationships are moderated by student characteristics (grade level and language proficiency) and measurement features (writing genres, and writing outcomes, whether the writing task is text-based or not, and type of syntactic measures). 4.1. Overall relationships between syntactic features in written composition and writing performance There was a weak relationship between syntactic accuracy and the majority of syntactic complexity measures and writing outcomes. Although the relationship is weak, this result confirmed the critical role of syntactic skills in writing for several reasons (Berninger et al., 2002; Graham, 2018; Kim & Graham, 2022). First, the current study focused on text-based syntactic features, which reflect only one aspect of language skills. Writing proficiency also relies on other aspects of language skills, such as vocabulary use, rhetorical strategies, and cohesive devices. Second, as highlighted in previous research, writing is a complex activity that integrates multiple cognitive, linguistic, and literacy skills (e.g., Graham, 2018; Wagner et al., 2011). A skill demonstrating a weak relationship with writing outcomes should not be interpreted as not important. Considering the aforementioned reasons, despite the modest correlations observed, the results highlight the importance of syntactic proficiency as one critical component of writing performance. The present study highlights the importance of addressing syntactic demands in writing to improve overall writing performance. 11 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 4 Multilevel random effects model: meta-regression of moderators of words per unit. Moderator Grade level (categorical) Reference: Adults Intercept High school Middle school Upper elementary grades Reference: High school Intercept Adults Middle school Upper elementary grades Reference: Middle school Intercept Adults High school Upper elementary grades Language learning context Reference: Foreign language Intercept Second language Monolingual Mixed Reference: Second language Intercept Foreign language Monolingual Mixed Reference: Monolingual Intercept Foreign language Second language Mixed Language proficiency in the target language Reference: Proficient Intercept Limited Measurement features: writing genres Reference: Narrative Intercept Argumentative Expository Informational Reference: Expository Intercept Narrative Argumentative Informational Reference: Argumentative Intercept Narrative Expository Informational Measurement features: writing outcomes Reference: Quality Intercept Productivity Measurement features: text-based Reference: Not text-based Intercept Text-based Intercept(SE) β(SE) df p CI.LL CI.UL .15(.07) .07(.06) .08(.07) 36 41 48 37 < .001 .05 .22 .22 .03 .00 − .05 − .05 .18 .30 .19 .22 − .15(.07) − .08(.08) − .07(.08) 20 41 42 38 < .001 .05 .33 .40 .13 − .30 − .24 − .24 .39 .00 .08 .10 − .07(.06) .08(.08) .01(.07) 22 48 42 38 < .001 .22 .33 .92 .09 − .19 − .08 − .14 .28 .05 .24 .15 .05(.12) − .07(.06) .09(.07) 22 15 32 30 < .001 .70 .21 .18 .09 − .20 − .19 − .05 .24 .29 .04 .23 − .05(.12) − .12(.12) .05(.12) 8 14 16 17 .10 .70 .33 .72 − − − − .05 .29 .37 .22 .46 .20 .13 .31 .07(.06) .12(.12) .16*(.07) 15 32 16 29 .06 .21 .33 .03 .00 − .04 − .13 .02 .18 .19 .37 .31 − .01(.06) 30 60 < .001 .81 .12 − .10 .26 .13 − .11(.05) .11(.08) − .12(.07) 23 42 46 28 < .001 .06 .17 .09 .10 − .21 − .05 − .27 .25 .00 .26 .02 − .11(.08) − .21**(.08) − .23*(.09) 25 46 43 28 < .001 .17 .008 .02 .14 − .26 − .37 − .41 .42 .05 − .06 − .05 .11(.05) .21**(.08) − .02(.07) 21 42 43 28 .08 .06 .008 .81 .00 .00 .06 − .16 .15 .21 .37 .13 − .07(.05) 91 88 < .001 .15 .16 − .16 .27 .03 . 14(.08) 81 22 < .001 .09 .10 − .02 .19 .30 .11***(.04) .26***(.06) .18***(.05) .16***(.04) .21(.11) .09(.04) .19***(.04) .17***(.04) .28***(.07) .07(.04) .21***(.03) .15***(.02) Note. Analyses or results were not included if df < 4. * p < .05. ** p < .01. *** p < .001. 12 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 5 Multilevel random effects model: meta-regression of moderators of words per sentence. Moderator Grade level (categorical) Reference: Adults Intercept High school Language learning context Reference: Foreign language Intercept Second language Language proficiency in the target language Reference: Limited Intercept Proficient Measurement features: writing genres Reference: Narrative Intercept Argumentative Expository Informational Reference: Argumentative Intercept Narrative Expository Informational Reference: Informational Intercept Narrative Argumentative Expository Measurement features: writing outcomes Reference: Quality Intercept Productivity Intercept(SE) .14***(.04) .16***(.04) .15***(.04) .10(.09) .13(.08) .04(.07) .14***(.04) β(SE) df p CI.LL CI.UL − .10(.08) 21 8 .001 .24 .06 − .27 .22 .08 − .09(.09) 16 6 .001 .36 .07 − .30 .25 .13 .04(.15) 16 5 .004 .81 .06 − .36 .25 .45 .03(.12) .05(.10) − .06(.12) 8 12 8 10 .33 .80 .63 .61 − − − − .12 .23 .18 .32 .32 .30 .27 .20 − .03(.12) .02(.08) − .09(.10) 6 12 9 11 .14 .80 .84 .38 − − − − .06 .30 .17 .32 .32 .23 .20 .13 .06(.12) .09(.10) .11(.07) 5 10 11 9 .60 .61 .38 .16 − − − − .13 .20 .13 .05 .21 .32 .32 .27 − .07(.07) 26 14 .001 .33 .06 − .23 .22 .08 Note. Analyses or results were not included if df <4. *** p < .001. 4.2. Moderating effects of student characteristics Moderation effects of student characteristics were observed for certain syntactic features. The moderation effect of grade level (a proxy for developmental phase) was significant only for syntactic accuracy. Primary-grade students (Kindergarten to Grade 2) demonstrated a stronger relationship between syntactic accuracy and writing performance compared to high school students and adults, indicating that syntactic accuracy is particularly critical in the early stages of writing development. Children in primary grades are rapidly developing foundational syntactic knowledge and are thus more prone to grammatical errors (Datchuk et al., 2021), which may impact the overall writing performance (Wang et al., 2024). Previous research has suggested that syntactic complexity measures may be sensitive to students’ syntactic development over time (Beers & Nagy, 2011; Bulté & Housen, 2014; Crossley et al., 2011; Crossley & McNamara, 2014; Crowhurst, 1980; Crowhurst & Piche, 1979; Hunt, 1970; Jagaiah et al., 2020; Wagner et al., 2011). These studies predominantly relied on cross-sectional data. In the current study, however, no moderation effects of grade level have been found, potentially due to substantial variability across studies. Given the large number of studies investigating the relationship between syntactic complexity and writing performance, these variations may stem from diverse factors, such as differences in writing tasks and characteristics of the student populations. Language proficiency in the target language moderated the relationship between syntactic features and writing performance. The relationship between noun phrase complexity and writing performance was stronger for students with limited language proficiency in the language they are writing in. Similarly, certain syntactic features play a more important role in contexts where there are bilingual learners than in contexts with only monolingual learners. Words per unit has a stronger relationship with writing outcomes in a mixed context of monolingual and bilingual learners than in a context of only monolingual learners. Subordination had a stronger relationship with writing outcomes in a foreign language learning context than in a mixed context. These findings are likely attributed to the fact that some bilingual students are still developing their language skills in the target language. Therefore, their language skills in this language are constrained by the linguistic knowledge they have obtained thus far, which may consequently influence their overall writing quality. During the time when they are still developing syntactic skills, syntactic accuracy, noun phrase complexity, words per unit, and subordination measures can be considered when assessing students’ ability to employ these structures in their writing. 13 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 6 Multilevel random effects model: meta-regression of moderators of subordination. Moderator Intercept(SE) Grade level (categorical) Reference: Adults Intercept High school Middle school Upper elementary grades Reference: High school Intercept Adults Middle school Upper elementary grades Reference: Middle school Intercept Adults High school Upper elementary grades Grade level (continuous) Language learning context Reference: Foreign language Intercept Second language Monolingual Mixed Reference: Second language Intercept Foreign language Monolingual Mixed Reference: Monolingual Intercept Foreign language Second language Mixed Language proficiency in the target language Reference: Limited Intercept Proficient Measurement features: writing genres Reference: Narrative Intercept Argumentative Expository Informational Measurement features: text-based Reference: Not text-based Intercept Text-based β(SE) df p CI.LL CI.UL .09(.11) − .02(.06) .06(.07) 25 26 29 23 .02 .44 .79 .41 .01 − .15 − .14 − .09 .15 .33 .11 .21 − .09(.12) − .11(.12) − .03(.13) 13 26 26 24 .14 .44 .39 .81 − .067 − .33 − .36 − .29 .410 .15 .14 .23 .23 .79 .39 .36 .03 .42 − − − − .00(.01) 14 29 26 24 23 32 .05 .11 .14 .09 .02 − .02 .18 .14 .36 .25 .28 .01 − .02(.12) − .06(.09) − .23***(.09) 19 10 27 6 .06 .85 .50 .03 .00 − .28 − .24 − .43 .29 .23 .12 − .03 .02(.12) − .04(.10) − .21(.10) 6 10 12 9 .24 .85 .74 .07 − − − − .11 .23 .26 .44 .35 .28 .19 .02 .06(.09) .04(.10) − .17***(.07) 12 27 12 7 .12 .50 .74 .04 − − − − .02 .12 .19 .33 .19 .24 .26 − .02 − .03(.08) 24 40 .03 .68 .02 − .20 .28 .13 − .05(.08) .03(.07) .10(.12) 17 31 20 24 .14 .53 .72 .41 − − − − .03 .21 .12 .15 .19 .11 .18 .35 .16***(.06) 56 7 .01 .03 .02 .02 .15 .30 .08***(.03) .17(.11) .06(.05) .02(.06) .11(.12) .08(.08) .15***(.06) .14(.07) .12(.09) .08(.05) .15***(.06) .08(.05) .08(.03) Note. Analyses or results were not included if df <4. *** p < .001. 4.3. Moderating effects of measurement features Another important goal of the study was to investigate how measurement features – writing genres, writing outcomes, whether the writing task is text-based or not, and type of syntactic complexity measures – might influence the relationship between syntactic features in writing and writing performance. As noted earlier, DIEW posits that the way constructs are measured can change the relationships being examined (Kim & Graham, 2022). The relationship between syntactic features and writing performance varied by writing genre, which aligns with prior studies (Jagaiah et al., 2020). Specifically, the relationship between noun phrase complexity and writing performance was stronger in expository than argumentative texts. This finding highlights the greater syntactic demands of expository writing. Noun phrase complexity has been linked to academic style and elaboration in writing (Biber & Gray, 2011; Staple et al., 2016), suggesting that expository genre, being more aligned with academic contexts, requires greater elaboration and sophistication to achieve high-quality writing. In addition, the relationship between words per unit and writing performance was stronger in expository genre than argumentative and informational genres, further confirming the finding that expository genre, as a subgenre of informational genre, may post extra demands on syntactic skills than other informational texts. This is a unique finding since comparisons between expository 14 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 7 Correlation (df) matrix between syntactic features and writing performance by grade level. Grade levels Syntactic measure Syntactic accuracy Noun phrase complexity Words per unit Words per sentence Subordination – .06 (5) .25 (8) .43* (7) .18* (12) – – – .19** (4) .18*** (4) .26*** (4) – – – – .14* (11) .06 (14) .17 (13) .08* (25) Primary grades Upper elementary grades Middle school High school Adults – .31 (4) .23*** (10) – .05 (5) .14** (21) Note. Missing values are marked as “–” and are due to insufficient samples to run correlational analyses (df < 3). * p < .05. ** p < .01. *** p < .001. Table 8 Correlation (df) matrix between syntactic features and writing performance by language proficiency status. Language proficiency status Syntactic measure Syntactic accuracy Noun phrase complexity Words per unit Words per sentence Subordination .40* (10) .26 (6) .26 (10) .00 (4) .19*** (30) .20*** (4) .15** (16) – .15* (24) .12* (19) Predominantly Limited proficiency Predominantly proficient in the target language Note. Missing values are marked as “–” and are due to insufficient samples to run correlational analyses (df < 3). * p < .05. ** p < .01. *** p < .001. Table 9 Correlation (df) matrix between syntactic features and writing performance by language learning context. Language learning context Foreign language Second language Monolingual Mixed Syntactic measure Syntactic accuracy Noun phrase complexity Words per unit Words per sentence Subordination .47** (8) .57* (4) .02 (4) – .24*** (10) – .16*** (4) .21 (8) .09 (15) .25*** (14) .16** (16) .07 (4) – .14 (19) .12 (6) .08 (12) − .08 (4) .03 (4) – – Note. Missing values are marked as “–” and are due to insufficient samples to run correlational analyses (df < 3). * p < .05. ** p < .01. *** p < .001. and other informational genres are not common in the literature. In terms of writing outcomes, the relationship between noun phrase complexity and productivity was weaker compared to its relationship with writing quality. The use of complex noun phrases may signal a higher level of formality (Biber & Gray, 2011). In addition, Biber and Gray (2011) noted that expansions introduced by noun modifiers are more likely to be accompanied by a corresponding enrichment of meaning. This suggests that noun phrase complexity contributes more to elaborating on key content and signaling a formal tone, which is critical for writing quality but less relevant for productivity measures. Whether the writing task is text-based or not also moderated the relationship between syntactic features and writing performance. Specifically, subordination had a stronger relationship with writing outcomes in text-based tasks. This is reasonable since text-based writing tasks often provide content for students to elaborate on, which can encourage the use of more subordinating structures. Additionally, reorganizing or citing information from sources – a common feature of text-based tasks – often requires subordination structures. Notably, many studies examining the relationship between syntactic features and writing performance used writing tasks that are not text-based. Thus, the moderation test for whether the writing task is text-based or not was only conducted for words per 15 Assessing Writing 63 (2025) 100909 J. Wang et al. Table 10 Correlation (df) matrix between syntactic features and writing performance by writing genres. Writing genre Syntactic measure Syntactic accuracy Noun phrase complexity Words per unit Words per sentence Subordination .18 (13) – .10 (4) – .22 (6) – .43* (5) – .17*** (23) .07 (21) .28*** (25) .05 (13) .10 (8) .13 (6) .15** (4) .04 (5) .08 (17) .03 (15) .11 (9) .18 (11) Narrative Argumentative Expository Informational Note. Missing values are marked as “–” and are due to insufficient samples to run correlational analyses (df < 3). * p < .05. ** p < .01. *** p < .001. Table 11 Correlation (df) matrix between syntactic features and writing performance by whether the writing task is text-based or not. Writing task Syntactic measure Syntactic accuracy Noun phrase complexity Words per unit Words per sentence Subordination .23** (29) – .26*** (16) – .15*** (81) .29** (16) .26*** (16) – .08* (56) .25 (7) Not text-based Text-based Note.Missing values are marked as “–” and are due to insufficient samples to run correlational analyses (df < 3). * p < .05. ** p < .01. *** p < .001. Table 12 Multilevel random effects model: meta-regression of moderators controlling for syntactic complexity measures. Variable Words per unit (Intercept) Subordination Noun phrase complexity Left embeddedness Words per sentence Model 1 Model 2 Model 3 Model 4 .19(.03)*** − .04(.05) .05(.05) − .09(.05) − .10(.04)* .19(.03)*** − .05(.05) .05(.05) − .10(.06) .19(.03)*** − .06(.05) .05(.05) .19(.03)*** − .06(.05) *** * p < .001. p < .05. unit and subordination. It wasn’t clear whether the relationship between other syntactic features and writing outcomes would be moderated by use of source materials. The findings showed that the relationship between syntactic features and writing outcomes differed depending on the type of syntactic complexity measures: the relationship was weaker for words per sentence compared to other measures. This may be because words per sentence encompasses coordination structures (i.e., two independent clauses connected by a coordinating conjunction), which can increase sentence length without necessarily leading to more complex syntax. Consequently, writing samples with higher values in words per sentence may rely more on coordination structures, which, as noted by Norris and Ortega (2009), are not complex syntactic features and do not facilitate the embedding of additional ideas or details within sentence structures. Overall, these results emphasize the critical role of measurement features in moderating the relationship between syntactic features and writing outcomes. They underscore the dynamic relationships between language and writing as a function of measurement features (Kim & Graham, 2022; Steiss et al., 2024). One notable finding was that many moderators for traditional syntactic complexity features (i.e., words per unit, subordination measures, words per sentence) were mostly not significant, despite prior evidence suggesting differences across grade levels, language proficiency, and measurement (Beers & Nagy, 2009, 2011; Jagaiah et al., 2020; Kim & Graham, 2022; Steward & Grobe, 1979). Among the syntactic measures examined, noun phrase complexity was the only feature for which all moderators were statistically significant. Several explanations may account for this pattern. First, our sensitivity analysis, which included only studies that reported reliability for syntactic or writing measures, revealed significant moderation effects. These effects suggest that the lack of significant moderation effects in some analyses may be attributed 16 Assessing Writing 63 (2025) 100909 J. Wang et al. Fig. 2. Funnel plot. to the absence of reliability reporting in prior studies, potentially compromising the robustness of their results. A second potential explanation is the critical role of complex noun phrase structures in writing development, especially within educational contexts (Biber & Gray, 2011; Staples et al., 2016). Given that noun phrases allow writers to pack dense information effectively, using such structures may be increasingly critical in school settings. A third possible reason may be the smaller variance introduced by more recent studies. Research on noun phrase complexity is relatively recent, whereas traditional measures have been investigated by researchers since the 1960s. Over the decades, methodological differences – such as how measures are generated, the composition of student samples, and how writing samples are collected – may have introduced greater variability across studies that focus on traditional syntactic complexity measures. A fourth explanation pertains to publication type, as evidenced by our findings from the sensitivity analysis. Although Egger’s test and funnel plot indicated no publication bias, the sensitivity analysis suggested that published articles are more likely to detect certain moderation effects. This suggests that methodological rigor associated with peer-reviewed studies may increase the likelihood of identifying moderation effects in traditional syntactic complexity measures. Lastly, prior research suggests that noun phrase complexity measures may be more sensitive to developmental differences than traditional subordination measures. Studies have shown that traditional subordination measures often fail to capture meaningful variations in writing development (Bulté & Housen, 2014; Casal & Lee, 2019; Crossley & McNamara, 2014). In contrast, noun phrase complexity measures have been linked to variations in writing quality (Casal & Lee, 2019; Kyle & Crossley, 2018; Guo et al., 2013) and age (Ansarifar et al., 2018; Bulté & Housen, 2014). These findings are consistent with Norris and Ortega’s (2009) three-staged trajectory for second-language syntactic development, which posits that coordination, subordination, and phrasal-level complexity develop sequentially. Syntactic structures such as coordination and subordination, which emerge in earlier stages, may lack the sensitivity needed to assess more advanced syntactic abilities, further emphasizing the relevance of noun phrase complexity in evaluating writing development. 5. Limitations and future directions One limitation of this meta-analysis is the limited effect sizes for some zero-order correlations. Because of the limited effect sizes, moderation analyses were only conducted with those with sufficient effect sizes to avoid inaccurate findings. However, findings with fewer effect sizes (those with smaller degrees of freedom) should be interpreted with caution. In addition, because moderation analyses were only conducted with some moderators for some syntactic features, we could not assume that only significant moderation effects are the factors that matter for the relationship between syntactic features and writing performance. Future research should focus on identifying valid and reliable measures of syntactic complexity that extend beyond traditional measures (e.g., T-unit length, clauses per T-unit). The findings of this study highlight a reliance on traditional measures, which may lack sensitivity for certain assessment purposes. It is also critical to understand which measures are more sensitive to students’ writing development, with considerations of writing genres and tasks (Casal & Lee, 2019). The current study highlights the potential of noun phrase complexity measures in assessing language and writing development, which should be validated by future studies. It is also worth noting that, generally, studies that examine syntactic features in writing did not report students’ language proficiency consistently. Many studies did not report the language proficiency level of their participants, and even those that did report this information often lacked a standardized approach to reporting. This inconsistency makes cross-study comparisons challenging. Future studies should explicitly and comprehensively document students’ language abilities so that cross-study comparisons are possible. Another significant gap in the current literature is the underrepresentation of students with learning disabilities. Most studies either omitted information about students’ special education status or excluded these students from their sample entirely. Future research should prioritize examining the syntactic features of writing in students with learning disabilities, as their inclusion is essential for 17 Assessing Writing 63 (2025) 100909 J. Wang et al. developing equitable educational practices. Additionally, potentially due to large variations across studies, the moderation effect of grade level was not found in most cases. Future research should consider employing longitudinal designs to better understand the role of developmental phases in the relationship between syntactic skills and writing quality. The current study also underscores the importance of considering the reliability of both syntactic and writing measures when evaluating relationships. Many studies included in the current meta-analysis did not report reliability of either syntactic measures or writing measures, or both. Future studies should ensure the reporting of reliability information to strengthen the validity of the findings. 6. Implications for practice This study indicates the crucial role of syntactic abilities in achieving high writing performance. Supporting students in developing syntactic accuracy and expanding their repertoire of syntactic structures can enhance their ability to perform diverse writing functions effectively. Additionally, the findings highlight the specific syntactic demands that students may encounter at different developmental stages and across various writing genres. For example, the results indicate that syntactic accuracy is particularly critical for students in primary grades and expository genre may demand more complex syntactic structures. For students with limited language proficiency in the language they are writing in, acquiring certain syntactic features, such as complex noun phrases and subordination, and improving syntactic accuracy may be even more crucial. Although there are mixed results regarding whether direct grammar instruction can improve writing development (Andrews et al., 2006), evidence suggests that grammar instruction that is integrated and contextualized within writing instruction is beneficial for learners (Jones et al., 2013). For example, based on our findings, subordination clauses can help writers introduce alternate perspectives or cite evidence in source-based tasks. Future studies could explore whether explicitly teaching such features and their functions improves students’ writing skills. The study also has important implications for assessing writing. For instance, given the importance of noun phrase complexity in expository writing to pack detailed information into sentences, educators should focus on evaluating students’ ability to construct complex noun phrases in such tasks. Similarly, in text-based writing tasks, assessing students’ ability to use subordination structures can provide insights into their capacity to handle the linguistic demands of these tasks. Moreover, the findings suggest that proficiency in using linguistic resources within one writing task or genre does not necessarily translate to broader linguistic literacy – the ability to employ specific linguistic features tailored to the distinct demands and purposes of various tasks or genres (Ravid & Tolchinsky, 2002). This underscores the importance of assessing linguistic skills across multiple genres and writing tasks. Specifically, the findings indicate that expository genre might have higher syntactic demands than other informational genres, indicating it might be especially important to monitor students’ use of syntactic features in this genre. Finally, this study indicates that for bilingual learners who are still developing their language proficiency in the target language, measures such as noun phrase complexity, subordination, syntactic accuracy, and words per unit are particularly valuable for assessing their writing abilities. These measures not only provide insights into potential linguistic challenges these learners face but also highlight areas where targeted instructional support may be more effective. CRediT authorship contribution statement Molly Ann Leachman: Writing – review & editing, Data curation. Young-Suk Grace Kim: Writing – review & editing, Supervision, Conceptualization. Joseph Hin Yan Lam: Writing – review & editing, Data curation. Jiali Wang: Writing – original draft, Methodology, Formal analysis, Data curation, Conceptualization. Declaration of Competing Interest All authors declare no conflicts of interest. Appendix A. Supporting information Supplementary data associated with this article can be found in the online version at doi:10.1016/j.asw.2024.100909. Data Availability Data will be made available on request. References References marked with an asterisk indicate studies included in the meta-analysis. 18 Assessing Writing 63 (2025) 100909 J. Wang et al. Andrews, R., Torgerson, C., Beverton, S., Freeman, A., Locke, T., Low, G., Zhu, D., et al. (2006). The effect of grammar teaching on writing development. British Educational Research Journal, 32(1), 39–55. https://doi.org/10.1080/01411920500401997 Ansarifar, A., Shahriari, H., & Pishghadam, R. (2018). Phrasal complexity in academic writing: A comparison of abstracts written by graduate students and expert writers in applied linguistics. Journal of English for Academic Purposes, 31, 58–71. https://doi.org/10.1016/j.jeap.2017.12.008 * Beers, S. F., & Nagy, W. E. (2009). Syntactic complexity as a predictor of adolescent writing quality: Which measures? Which genre?. Reading and Writing, 22(2), 185–200. https://doi.org/10.1007/s11145-007-9107-5. Beers., S. F., & Nagy, W. E. (2011). Writing development in four genres from grades three to seven: Syntactic complexity and genre differentiation. Reading and Writing, 24, 183–202. https://doi.org/10.1007/s11145-010-9264-9 Berninger, V. W., Vaughan, K., Abbott, R. D., Begay, K., Coleman, K. B., Curtin, G., Graham, S., et al. (2002). Teaching spelling and composition alone and together: Implications for the simple view of writing. Journal of Educational Psychology, 94(2), 291. https://doi.org/10.1037/0022-0663.94.2.291 Biber, D., & Gray, B. (2011). Grammatical change in the noun phrase: The influence of written language use. English Language Linguistics, 15(2), 223–250. https://doi. org/10.1017/S1360674311000025 Biber, D., Gray, B., & Staples, S. (2016). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, 37 (5), 639–668. https://doi.org/10.1093/applin/amu059 Bulté, B., & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42–65. https:// doi.org/10.1016/j.jslw.2014.09.005 Casal, J. E., & Lee, J. J. (2019). Syntactic complexity and writing quality in assessed first-year L2 writing. Journal of Second Language Writing, 44, 51–62. https://doi. org/10.1016/j.jslw.2019.03.005 Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415–443. https://doi.org/10.17239/ jowr-2020.11.03.01 Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–79. https://doi.org/10.1016/j.jslw.2014.09.006 Crossley, S. A., Weston, J. L., McLain Sullivan, S. T., & McNamara, D. S. (2011). The development of writing proficiency as a function of grade level: A linguistic analysis. Written Communication, 28(3), 282–311. https://doi.org/10.1177/0741088311410188 Crowhurst, M. (1980). Syntactic complexity and teachers’ quality ratings of narrations and arguments. Research in the Teaching of English, 14(3), 223–231. Crowhurst, M. (1983). Syntactic complexity and writing quality: A review. Canadian Journal of Education/Revue canadienne de l′education, 1–16. https://doi.org/ 10.2307/1494403 Crowhurst, M., & Piche, G. L. (1979). Audience and mode of discourse effects on syntactic complexity in writing at two grade levels. Research in the Teaching of English, 13(2), 101–109. https://doi.org/10.58680/rte197917847 Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10(1), 5–43. https://doi.org/10.1016/j.asw.2005.02.001 Cumming, A., Lai, C., & Cho, H. (2016). Students’ writing from sources for academic purposes: A synthesis of recent research. Journal of English for Academic Purposes, 23, 47–58. https://doi.org/10.1016/j.jeap.2016.06.002 Datchuk, S. M., Hier, B. O., & Watts, E. A. (2021). Accounting for levels of language in narrative and expository writing: A skills analysis of second-grade student writing. The Elementary School Journal, 121(4). https://doi.org/10.1086/714051 Ferris, D. R. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28(2), 414–420. https://doi.org/ 10.2307/3587446 Graham, S. (2018). A revised writer(s)-within-community model of writing. Educational Psychologist, 53(4), 258–279. https://doi.org/10.1080/ 00461520.2018.1481406 Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145. https://doi.org/10.1016/S1060-3743(00)00019-9 Grobe, C. (1981). Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings. Research in the Teaching of English, 75–85. https://doi.org/10.58680/ rte198115785 * Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238. https://doi.org/10.1016/j.asw.2013.05.002. Hayes, J. R. (1996). A new framework for understanding cognition and affect in the writing process. In C. M. Levy, & S. Ransdell (Eds.), The sciences of writing (pp. 3–30). Erlbaum. Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods, 1 (1), 39–65. https://doi.org/10.1002/jrsm.5 Hunt, K. W. (1970). Syntactic maturity in school children and adults. Monographs of the Society for Research in Child Development, 35(1), iii–67. Jagaiah, T., Olinghouse, N. G., & Kearns, D. M. (2020). Syntactic complexity measures: variation by genre, grade-level, students’ writing abilities, and writing quality. Reading and Writing, 33(10), 2577–2638. https://doi.org/10.1007/s11145-020-10057-x Jones, S., Myhill, D., & Bailey, T. (2013). Grammar for writing? An investigation of the effects of contextualised grammar teaching on students’ writing. Reading and Writing, 26, 1241–1263. https://doi.org/10.1007/s11145-012-9416-1 Kim, M., & Crossley, S. A. (2018). Modeling second language writing quality: A structural equation investigation of lexical, syntactic, and cohesive features in sourcebased and independent writing. Assessing Writing, 37, 39–56. https://doi.org/10.1016/j.asw.2018.03.002 * Kim, Y.-S. G., Al Otaiba, S., Folsom, J. S., Greulich, L., & Puranik, C. (2014). Evaluating the dimensionality of first-grade written composition. Journal of Speech, Language, and Hearing Research, 57(1), 199–211. https://doi.org/10.1044/1092-4388(2013/12-0152). Kim, Y.-S. G., & Graham, S. (2022). Expanding the direct and indirect effects model of writing (DIEW): Dynamic relations of component skills to various writing outcomes. Journal of Educational Psychology, 114(2), 215–238. https://doi.org/10.1037/edu0000564 * Kyle, K., & Crossley, S. A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102(2), 333–349. https://doi.org/10.1111/modl.12468. Li, H. (2015). Relationship between measures of syntactic complexity and judgments of EFL writing quality. In Proceedings of 2015 youth academic forum on linguistics, literature, translation and culture (pp. 216–222). American Scholars Press. Loban, W. (1976). Language development: Kindergarten through grade twelve. NCTE Committee on Research Report No. 18. Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62. https://doi.org/10.5054/tq.2011.240859 Lu, X. (2017). Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Language Testing, 34(4), 493–511. https://doi.org/10.1177/0265532217710675 MacArthur, C. A., Jennings, A., & Philippakos, Z. A. (2019). Which linguistic features predict quality of argumentative writing for college basic writers, and how do those features change with instruction? Reading and Writing, 32(6), 1553–1574. https://doi.org/10.1007/s11145-018-9853-6 Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044 Olinghouse, N. G., & Wilson, J. (2013). The relationship between vocabulary and writing quality in three genres. Reading and Writing, 26(1), 45–65. https://doi.org/ 10.1007/s11145-012-9392-5 Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518. https://doi.org/10.1093/applin/24.4.492 Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—A web and mobile app for systematic reviews. Systematic Reviews, 5(1), 1–10. https:// doi.org/10.1186/s13643-016-0384-4 19 Assessing Writing 63 (2025) 100909 J. Wang et al. Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Moher, D., et al. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372. https://doi.org/10.1136/bmj.n71 Puranik, C. S., Lombardino, L. J., & Altmann, L. J. (2008). Assessing the microstructure of written language using a retelling paradigm. doi:10.1044/1058-0360. Qin, W., & Uccelli, P. (2016). Same language, different functions: A cross-genre analysis of Chinese EFL learners’ writing performance. Journal of Second Language Writing, 33, 3–17. https://doi.org/10.1016/j.jslw.2016.06.001 Ravid, D., & Tolchinsky, L. (2002). Developing linguistic literacy: A comprehensive model. Journal of Child Language, 29(2), 417–447. https://doi.org/10.1017/ S0305000902005111 Schleppegrell, M. J. (1998). Grammar as resource: Writing a description. Research in the Teaching of English, 32(2), 182–211. https://doi.org/10.58680/rte19983904 Scott, C. M., & Windsor, J. (2000). General language performance measures in spoken and written narrative and expository discourse of school-age children with language learning disabilities. Journal of Speech, Language, and Hearing Research, 43(2), 324–339. https://doi.org/10.1044/jslhr.4302.324 Staples, S., Egbert, J., Biber, D., & Gray, B. (2016). Academic writing development at the university level: Phrasal and clausal complexity across level of study, discipline, and genre. Written Communication, 33(2), 149–183. https://doi.org/10.1177/0741088316631527 Steiss, J., Wang, J., Kim, Y. S. G., & Booth Olson, C. (2024). US secondary students’ source-based argument writing in history. Written Communication, 41(4), 693–725. https://doi.org/10.1177/07410883241263549 Sterne, J. A., & Egger, M. (2005). Regression methods to detect publication and other bias in meta-analysis. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 99–110). Wiley. https://doi.org/10.1002/0470870168.ch6. * Stewart, M. F., & Grobe, C. H. (1979). Syntactic maturity, mechanics of writing, and teachers’ quality ratings. Research in the Teaching of English, 13(3), 207–215. https://doi.org/10.58680/rte201117858. Taguchi, N., Crawford, W., & Wetzel, D. Z. (2013). What linguistic features are indicative of writing quality? A case of argumentative essays in a college composition program. TESOL Quarterly, 47(2), 420–430. https://doi.org/10.1002/tesq.91 * Troia, G. A., Shen, M., & Brandon, D. L. (2019). Multidimensional levels of language writing measures in grades four to six. Written Communication, 36(2), 231–266. https://doi.org/10.1177/0741088318819473. Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36, 1–48. https://doi.org/10.18637/jss.v036.i03 * Wagner, R. K., Puranik, C. S., Foorman, B., Foster, E., Wilson, L. G., Tschinkel, E., & Kantor, P. T. (2011). Modeling the development of written language. Reading and Writing, 24(2), 203–220. https://doi.org/10.1007/s11145-010-9266-7. Wang, J., Kim, Y. S. G., & Cho, M. (2024). Linguistic features in narrative and opinion genres and their relations to writing quality in fourth grade writing. Journal of Research in Reading, 47(2), 220–239. https://doi.org/10.1111/1467-9817.12453 Jiali Wang is currently a postdoctoral research associate at Texas A&M University. Her research interests include language and literacy development and interventions for bilingual and developmental languagde disorder populations, especially writing assessment and developmental language disorder. Young-Suk Grace Kim is a professor and the senior associate dean at the School of Education at University of California, Irvine. Her research focuses on development and effective instruction of language, cognition, reading, and writing skills for children from diverse linguistic, cultural, and socioeconomic backgrounds. Joseph Hin Yan Lam is a Ph.D. candidate at University of California, Irvine. His research interests include bilingualism, developmental language disorder, assessment and intervention, language and literacy development, and the relationship between math and language. Molly Ann Leachman is a Ph.D. candidatenagy at University of California, Irvine. Her research interests include bilingual cognitive and linguistic development for children in early childhood. 20
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )