Introduction
Background information and statement of the problem
Over the last two decades, L2 writing scholars in the field of instructed second-language acquisition (ISLA) have taken up investigating the elaborated degree of linguistic features in L2 learners' written output to better understand learners writing competence and L2 growth from a linguistic and psycholinguist perspective 929. Syntactic complexity refers to the variety and sophistication of syntactic features exhibited in L2 outputs, including the length of production units i.e., the number of words per clause, the amount of subordination, or the frequency of distinct clauses, or phrase structures used 35. Per se, analyzing L2 output in terms of indices measuring these syntactic features is a common means to gauge L2 growth or writing quality. The default hypothesis has been that as L2 learners' interlanguage develops, they begin to produce less frequent and diverse syntactic features 527. That said, L2 learners’ language-using ability or writing quality is demonstrated through their control over complex and sophisticated levels of syntactic knowledge (Li & Yang, 2023) 34.
Presuming this, investigating the relationship between L2 writing quality human ratings writing scores and syntactic complexity indices has been the focus of much L2 writing research 223. This focus has been motivated by at least three reasons conceptually, researchers are interested in identifying features that both adequately reflect the multidimensional nature of syntactic complexity and strongly correlate with quality ratings of L2 writing 5. From the assessment perspective, this line of research has informed the empirical development of writing proficiency scales and facilitated automated scoring 15. Pedagogically, a detailed understanding of this relationship can offer useful insights into the aspects of syntactic complexity to focus on for different learner groups, genres, and writing tasks (Bulté & Housen, 2019). Additionally, this focus has been motivated by the advent of computational tools such as L2 Syntactic Complexity Analyzer (L2-SCA) 21 has made it possible to examine this relationship on a larger scale quickly and accurately 42430.
Despite recent advances, existing L2 research on syntactic complexity has several theoretical, conceptual, and methodological flaws 3 (Housen, De Clercq, Kuiken, & Vedder, 2019). As an illustration, syntactic complexity has traditionally been conceived as the basic internal formal structure of the text 6 and measured using only large-grained complexity indices such as clausal subordination and T-unit (consisting of one main clause plus extra, embedded clauses, Hunt, 1965) which have high predictive power 35. However, in recent years, the use of these absolute complexity, large-grained, indices has come under criticism for failing to account for syntactic variation, the specific subtypes of clausal and phrase level elaboration that emerge as L2 learners advance (Housen et al., 2019) and for the difficulty in interpretation 5. Recent studies 30 have indicated that fine-grained indices that capture specific subtypes of phrasal or clausal structures (e.g., clause complements per clause) are better indicators of holistic scores of TOFEL independent essay quality 19.
Another criticism is that the traditional absolute complexities do not account for the relative frequency of syntactic forms and the entangled link between syntactic forms and lexical items 17. That said, the absolute complexity indices do not strongly coincide with some aspects of L2 acquisition theories, such as usage-based approaches, which posit that linguistic constructions emerge from language use, such that, features related to the frequency, saliency, and contingency of syntactic constructions, not absolute complexity, is the main indicator of L2 acquisition 1113. Aligned with the theory of usage-based approach, some scholars most notably Kyle (2016) proposed features related to the use of Verb Argument Constructions (VACs, which consist of a verb slot and its related arguments) as a reliable method of measuring relative complexity or syntactic sophistication 24 (Li & Yang, 2023) 30. These studies have confirmed that the newly proposed VACs sophistication indices are better indicators of L2 writing quality. However, the majority of L2 writing studies in this line of inquiry have primarily focused on argumentative, and narrative essays written by advanced or college-level L2 learners in large-scale standardized tests (e.g., TOEFL) in the English-dominant contexts 1936. Thus, it is unclear as to what absolute and fine-grained along with VACs sophistication features capture the writing quality and syntactic developmental trajectories of Ethiopian EFL. Moreover, little is known about the link between syntactic features and EFL course instructors’ holistic ratings of genres commonly assessed in EAP classrooms context. Therefore, the present study then tapped into what specific syntactic features predict EFL teachers' holistic rating score of argumentative essays produced by undergraduate EFL learners in their classroom context using computational tools namely TAASSC 17. As such, exploring the relationship between L2 writing quality and syntactic features of argumentative essays produced by this group of learners enables us to obtain useful information regarding their linguistic competence and repertoires, helping to create a more comprehensive picture of the writing performance of EFL learners at different stages of L2 learning.
The overall structure of this study consists of five sections, including the introductory part. In the following, section two presents a brief overview of syntactic complexity as a multi-dimensional construct by laying out the theoretical dimensions and looking at how syntactic complexity is evaluated in L2 acquisition. The third section is concerned with the methodology used for this study. The fourth section presents the analysis and results of the study. Finally, the conclusion gives a summary as well as the implication of the findings to pertinent future studies.
Syntactic complexity as a multi-dimension construct
Recognizing the multidimensional nature of syntactic complexity, Ortega (2015) defined it as the "expansion of the capacity to use the additional language in ever more mature and skillful ways, tapping the full range of linguistic resources offered by the given grammar to successfully fulfill various communicative goals" 29. Her definition highlights that the term "syntactic complexity" refers to the ability to generate longer and more complex structures (i.e., absolute complexity) as well as knowledge of how to skillfully leverage the complexity and subtleties of language 617. This implies that these two dimensions are critical in comprehending and analyzing syntactic complexity as a multidimensional construct 9, and this viewpoint is taken into account for the current study.
Syntactic Complexity
The first construct, "syntactic complexity," as defined in this study, pertains to the hierarchical organization and nesting of the basic formal structures of language units (e.g., phrases and clauses) within a sentence (Housen et al., 2019). Measuring this dimension helps to demonstrate the extent to which EFL/ESL learners can manipulate and combine diverse grammatical patterns to build complex linguistic systems as their interlanguages develop. Given this, in ISLA literature, various indices have been proposed to quantify syntactic complexity as a multidimensional entity, 22, which can be summarized into two approaches: absolute complexity indices 21 and fine-grained indices 17. The former assesses the internal complexity of the text in a largely holistic way, without differentiating specific subtypes of the structures concerned 5. Several L2 writing studies 82137 assess this dimension of complexity using L2 Syntactic Complexity Analyzer. Whereas the latter, fine-grained indices, take into account specific subtypes of phrasal (e.g., prepositions per direct object) or clausal structures (e.g., clause complements per clause, which taps into the use of a particular type of dependent clause). Several fine-grained indices and computational tools that measure these indices have been proposed and developed by different scholars 317 and employed in L2 writing studies. Of these fine-grained syntactic complexity indices operationalized by Kyle (2016) and computed in TAASSC which have had an impact on recent discussions of clausal and phrasal complexity are then considered in the current study 19 (Li & Yang, 2023) 30.
Syntactic Sophistication
The second construct, "syntactic sophistication," relates to the acquisitional sequence of specific sets of linguistic construction including syntactic constructions and the mental processes required for producing, and breaking them down 717. The basic premise is that syntactic construction (e.g., VACs) learned later are more sophisticated 18. It takes its theoretical cues from the usage-based approach, which maintains that L2 acquisition is a process of contingency learning in which L2 learners create form-meaning pairings through repeated exposure to the target language constructions (e.g., words) using cognitive processes like entrenchment, categorization, association, and generalization 12. This suggests that grammatical constructions that emerge from the interaction of language input and cognitive processes carry meaning independent of specific lexical items within the sentence. For instance, the ditransitive construction consistently forms (i.e., subject–verb–indirect object–direct object) structure carries the meaning of transferring something from one entity to another as in the sentence “she got/give/kicked/ me a ball,” regardless of the verbs (Park & Sung, 2022). These sentence or clause-level parings between form and meaning are referred to as Verb Argument Constructions (VACs) and are analyzed as independent syntactic constructions in English and other languages 1417.
According to the usage-based view, the acquisition of VACs as a type of construction is subject to multiple psycholinguistic factors 12, among which input frequency is the most important driving factor. Specifically, VACs with relatively high frequency in input have a better chance to entrench in the learner's memory and are more likely to be acquired earlier and deemed as less complex, while VACs with low input frequency are considered to be acquired in later stages and thus more complex 1333.
Another key factor predicting VAC acquisition is the contingency of mapping between a main verb and the VAC in which it occurs 12. It refers to the degree of reliability to which a given cue (a particular verb) can predict a certain outcome (VAC) or in reverse, how likely a verb will occur given a certain VAC 18. Previous research has shown that L2 learners employ verbs that are less strongly associated with a target construction as their proficiency increases 189. The findings of these studies lent support for the claim that closely associated verb-VAC acquired early, and are thus regarded as less complex, while the more unusual combinations are deemed more difficult to acquire and thus more complex. For example, Kyle and Crossley (2017) found that higher-scoring L2 argumentative essays contained less frequent and more strongly associated verb-VAC combinations (Li & Yang, 2023). In keeping with the taxonomy in recent studies 181922, we measured syntactic complexity using traditional absolute complexity indices and fine-grained complexity indices that differentiate structural subtypes of clauses and phrases, and syntactic sophistication using VAC-related features, i.e., frequency and contingency features, ( see Table 1, Table 2, Table 3, Table 4 in the methodology section). In what follows, we provide a systematic review of previous studies that have examined the relationship of L2 writing quality to these different dimensions of syntactic complexity features.
Syntactic complexity and Writing quality
A number of cross-sectional L2 studies have investigated the relationship between writing quality and different measures of syntactic complexity 1836. Most of these studies have relied heavily on various large-grained indices that quantify the average length of a certain linguistic unit (e.g., mean length sentences (MLS), mean length T-unit (MLT), mean length clauses (MLC), etc.) and the amount of subordination such as dependent clauses per clause (DC/C) and clauses per T-unit (C/T) 9. Based on the notion that academic writing derives its complexity from the elaborate use of clausal constructions or clausally complex, with many subordinate clauses (Hyland, 2002).
In their synthesis and meta-analyses, Ortega (2003) and Wolfe-Quintero et al. (1998) noted that the majority of these investigations, as they (1988) reported 23 studies out of 40 studies, discovered a largely consistent positive link between global complexity features and overall assessment of composition quality. To illustrate, among the earliest studies, Homburg (1984) investigated the relationship between a holistic evaluation of ESL writing quality based on the Michigan Test of English Language Proficiency grading scheme (MTELP) and 10 measures of syntactic complexity and found a significant relationship between MLS, MLC, finite clausal subordination (DC/C), and writing score. More recently, to identify key linguistic features on the writing performance of EFL learners, Chuenchaichon, (2022) analyzed a collection of opinion essays written by Thai EFL university students from CU-TEP (The Chulalongkorn University Test of English Proficiency) corpus and found significant correlations between MLT, MLC, MLS and the quality of argumentative essays score. That said, essays with longer T-units and higher amounts of subordination received higher scores and were considered quality text. This finding is also corroborated by a few longitudinal research. For instance, Bulté and Housen (2014) reported longitudinal growth in MLT scores throughout a semester-long English for Academic Purposes (EAP) L2 writing course 20.
In addition, in search of more global indices, Lu (2011) included 14 measures of syntactic complexity recommended by Ortega (2003) and Wolfe-Quintero et al. (1988) and developed an automated analyzing tool, the L2-SCA, based on large-scale data gathered from the Written English Corpus of Chinese Learners (WECCL). He found that seven out of 14 measures progressed linearly across levels of L2 writing quality these were the mean length of MLT, MLC, MLS, and coordinate phrases per clause (CP/C), and T-Unit (CP/T), complex nominals per clause (CN/C) and per T-unit (CN/T). Moreover, Lu found that contextual factors like institution (i.e., the universities the learners attended), timing condition (timed versus untimed), and genre (argumentative versus narrative) have an impact on the linguistic characteristics of written products in addition to the proficiency level of L2 learners. According to his findings (2011), argumentative writing has lengthier units and is more complicated at the phrasal level than narrative essays 37 or application letter writings 38. Subsequent research has tested the reliability of the 14 measures of absolute complexity, large-grained, indices provided by L2-SCA, with a wide range of variables such as topics 36, instructional setting 28 that affect the production of complexity.
Yang et al. (2015) used seven indices from L2-SCA to analyze a corpus of argumentative essays on two topics written by ESL graduate students, scored using the TOEFL iBT independent writing rating rubric, and found that MLT significantly predicted essay scores on various topics. In another study, using 10 indices from L2-SCA, Ai and Lu (2013) examined the differences in syntactic complexity in the writing of native speakers (NS) and non-native speakers (NNS) of English, showed that NNS produced shorter clauses, sentences, and T-units, less subordination, and fewer noun phrases than do NS. This shows that EFL writers are characterized by distinct syntactic structures from L1 or L2 learners 23. Taken together the aforementioned L2 writing studies have indicated a positive relationship between large-grained indices of syntactic complexity and writing quality.
Despite the ubiquity and usefulness of such structure-based measures (MLT, and DC/C) recently, however, a number of scholars have questioned their use to investigate syntactic complexity in L2 writing 518. The first, limitation is that absolute complexity measures mostly capture the degree of elaboration, but give less attention to the degree of variation, in other words, syntactic diversity 27. Second, 3 argue, with corpus-based evidence, that clausal elaboration assessed by T-unit indices is more characteristic of conversation whereas academic writing is characterized syntactically by the use of noun complex phrase constituents, particularly noun phrase complexity, thus they recommend phrase-level structures as more reliable measures for L2 writing in an academic setting.
Subsequent cross-sectional and longitudinal research (e.g., Biber et al., 2016) 3438 has indicated that phrasal complexity is a better predictor of writing quality scores than clausal subordination. For instance, Kyle and Crossley (2018) reported that fine-grained phrasal complexity indices (related to nominal subject, direct object, and prepositional object modifiers) accounted for a larger proportion of variance in essay scores than both holistic syntactic complexity indices and fine-grained clausal complexity indices. Similar results were reported by Zhang and Lu (2022) in their analysis of a collection of application letters and argumentative essays produced by college-level Chinese EFL learners. Similarly, Taguchi et al. (2013) analyzed a collection of argumentative essays written by a number of non-native speakers of English in a university in the United States using ten syntactic complexity indices provided by Biber tagger and found that noun phrase modification features such as attributive adjectives, and post modifying prepositional phrases, contributed to essay quality. Demonstrating that as the academic level increased, so did the use of phrasal complexity features in writing. This finding is also corroborated by some longitudinal research, in which Crossley and McNamara (2014) discovered that high-quality university L2 learners' descriptive texts included more complex noun phrases and fewer embedded clauses.
Although the above reviewed, studies have revealed a positive relationship between different dimensions of syntactic complexity and L2 writing quality, this model is based on considering syntactic structure only and is difficult to link to the frequency of occurrence of form-meaning mappings found in usage-based perspectives on language learning. In complementing this gap, Kyle (2016) by developing and validating TAASSC proposed a range of usage-based VAC indices related to frequency and contingency as a new way of representing syntactic sophistication (M. Abdi Tabari et al., 2023). This proposal inspired a new wave of research on comparing absolute versus relative complexity measures in terms of their predictive power of L2 writing quality 18 (Li & Yang, 2023) 30. This line of research has generated evidence for greater predictive power of VACs-based complexity measures over absolute ones. As an instance, using a corpus of TOEFL independent and integrated essays as well as descriptive essays selected from the Michigan State University corpus, Mostafa and Crossley (2020), found the stronger predictive power of usage-based VAC indices, as these measures explained more variance in L2 writing particularly in the argumentative task than in descriptive and integrated tasks (sources based writing).
Similarly, M. Abdi Tabari et al., (2023) examined the extent to which a number of VAC-based syntactic sophistication indices could predict ESL learners’ argumentative writing quality in different writing task conditions and found that only verb-VAC combinations (contingency-related features) were the most significant predictor of syntactic sophistication for all groups. In another study, Li and Yang (2023) compared indices of absolute complexity and VAC sophistication in terms of their ability to predict the writing quality of the overall as well as major sections of English research articles (RA) written by Chinese Ph.D. students. The finding indicated that VAC-based measures are more useful for indexing the writing quality of RAs at the whole text level while absolute measures have a stronger predictive power at the part-genre level, pointing out that both holistic and fine-grained indices usefully complement each other in capturing the syntactic complexity of L2 production.
Three important observations can be made in the body of research reviewed above. First, the majority of studies of the relationship of syntactic complexity to L2 writing quality have primarily focused on advanced and college-level L2 learners’ writing for large-scale standardized test samples (e.g., TOEFL) in the English-dominant context, without taking into account relevant contextual factors including in particular texts written by different EFL learner groups in different socio-cultural contexts. Additionally, few studies on L2 writing have systematically examined the writing produced by these students within the context of their writing classes, though with a few exceptions 31. Furthermore, the recent conceptualization of syntactic complexity as a multidimensional construct 27, calls for the use of multiple measures that tap into multiple dimensions of complexity, yet as the forgoing review highlighted that there are no studies that explore the syntactic complexity employed a large set of multiple measures in a single study 30. Specifically, in the Ethiopian context to the best of our knowledge, no single study has systematically investigated the link between writing quality and syntactic complexity features using a large set of measurements including global, fine-grained clausal and phrasal measures and syntactic sophistication thus far.
Despite the paucity of research comparing the relationship of different types of syntactic indices to the L2 writing quality of university-level students in the Ethiopian context, there are at least three reasons for studying it regarding the classroom context. First, Ethiopian students seem to suffer from several difficulties in L2 writing in that expressing complex ideas in words and sentences to form coherent writing is a challenging task (Dawit, 2013; Eskinder, 2018). In particular, at the university level, the researchers, and instructors at university, have observed the texts by university students are generally shorter, not discipline-specific, and less dependent on the use of reliable evidence or sources to defend one’s arguments; these are evidenced by their tests, examinations, class works, assignments and senior essay papers. Second, it is clear that a writing sample (i.e., an argumentative essay) produced in a classroom context may produce different syntactical features than writing samples produced in high-stakes assessment settings. Third, the assessment of students' writing samples is generally perceived to be very demanding and time-consuming for writing teachers. It is, therefore, necessary to investigate the relationship between argumentative writing quality and certain syntactic features to inform teachers’ manual rating, as well as to lay a foundation for the development of automated programs to supplement manual assessment.
The present study
Motivated by the afforded mentioned research insight and discussed gaps in the literature the present study aims to investigate the extent to which a number of syntactic complexity and VAC-based syntactic sophistication indices could predict EFL learners’ argumentative writing quality. To this end, we focus on two specific questions:
What is the relationship between syntactic complexity (with its different dimensions) and the quality of EFL students' writing?
What is the predictive power of syntactic complexity (with its different dimensions) on EFL students’ writing quality?