1 Introduction

It has been proposed that some grammatical phenomena are structured by micro-cues. Micro-cues can be described as structural manifestations that associate to a syntactic phenomenon to make its organization and acquisition accessible to users and learners (Lightfoot 1999; Westergaard 2009; 2014; see also Clark & Roberts 1993). The implication is that micro-cues should therefore have a similar qualitative and quantitative behaviour as the phenomenon they associate to, including through variation and change. Empirically testing the existence of the correlation between a phenomenon and its presumed micro-cues is the purpose of this paper. The test is conducted with reference to verb-second word-order (Woods & Wolfe 2020). The verb-second word-order (V2) sees the finite verb in a high syntactic position, preceded by some initial projection (XP). V2 is here investigated in Medieval French, where the rule has long been seen to be operative (Le Coultre 1875; Thurneysen 1892; Foulet 1928; Adams 1987; Vance 1997; Labelle 2007; Roberts 2012; Mathieu 2012; Wolfe 2018), only to be lost to SVO. Manifestations generally believed to associate to V2 in French comprise the following: subject-verb inversion (VS), the si particle, OV word order (both preceding finite and non-finite verbs), and enclitic placement of complement clitics. Whether the quantitative behaviour of V2 is reflected by that of such manifestations is the question that we seek to answer here, using a treebank method supported by statistical approaches.

The paper is organized as follows. First, the micro-cue model is sketched out, and the identity of the micro-cues of V2 in Medieval French is discussed based on existing literature. Then, a statistical approach of treebank data is proposed to test the correlation between postverbal pronominal subject, particles, preverbal object and enclisis. The results of the investigation are subsequently presented, showing that the micro-cues are statistically correlated. Particularly strong and independent of a purely shared diachronic development is the relation between VS, si and OV. Consequence of these results are envisaged in the conclusion. The results support the notion of a connection between micro-cues of V2, lending plausibility to the micro-cue model. Such multiple connections further imply an analysis of V2 as a complex set of related cues (Poletto 2002) and not as a unitary phenomenon whose grammatical definition can be encapsulated within a single parameter. Perspectives for future research are then suggested.

2 Micro-cues

In generative models, languages are structured by grammatical options, usually termed “parameters”. Differences between languages are due to the parametrical option selected by each, the options emanating from principles that are part of human genetic inheritance and thus account for commonalities across languages. Learning a language can thus be thought of as inferring the right option from the accessible input. The question is then how this inference process converges toward the right option. One way in which this question has sought to be answered is in proposing that selecting one option has an impact on other options. Such an approach is pursued in Ledgeway (2020a; 2020b).1 Based on the shape of sentences in Romance languages, Ledgeway considers several factors. Among these figure the differential marking of the direct object (as in Spanish, such as objects mostly designating a human being preceded by the preposition a ‘to’, but see Ledgeway 2012) or the absence of such a marking (as in French); preverbal negation without a postverbal negative marker (as in Spanish no sé ‘I don’t know’) or with such a postverbal negative marker (as in French je ne sais pas ‘I don’t know’); the use of a single auxiliary (Spanish haber llegado ‘have arrived’, haber dormido ‘have slept’) or two auxiliaries (French être arrivé ‘have arrived’, avoir dormi ‘have slept’). The choice of the first or the second value for each parameter is related to the high or low position of the verb in the sentence. The verb position parameter would therefore be correlated to other grammatical factors, and this correlation would presumably make them easier to acquire for new learners.

A model that puts correlation at the heart of grammatical organization is that of micro-cues. Accessible cues found in the input guide the acquisition of a phenomenon in terms of leading to the target (“adult-like”) setting of the value for the relevant parameter (Lightfoot 1995; 1999; 2006). Concurring micro-cues are investigated by Marit Westergaard, who focuses on the acquisition of V2 in Norwegian (e.g. Westergaard 2014). V2 word order is made clear by the fact that when the initial constituent is not the subject, the subject must occur in postverbal position (see (1) vs. (2)). Since no more than one phrase is generally allowed before the verb in Germanic languages, examples like (3) are ungrammatical.

    1. (1)
    1.   Ich
    2.   I
    1. las
    2. read
    1. diesen
    2. this
    1. Roman
    2. novel
    1. schon
    2. already
    1. letzes
    2. last
    1. Jahr.          (German)
    2. year
    1. (2)
    1.   Diesen
    2.   This
    1. Roman
    2. book
    1. las
    2. read
    1. ich
    2. I
    1. schon
    2. already
    1. letzes
    2. last
    1. Jahr.
    2. year           (Roberts 2012: 5)
    1.   ‘I already read this novel last year.’
    1. (3)
    1. *Ich
    2.   I
    1. diesen
    2. this
    1. Roman
    2. novel
    1. las
    2. read
    1. schon
    2. already
    1. letzes
    2. last
    1. Jahr.
    2. year.

The post-verbal subject would therefore be an indicator for learners of a language like German, who on this basis unconsciously infer that the grammar is governed by a V2 rule (Lightfoot 1995: 40; 1999: 152; 2006: 86; Yang 2000: 113; Lightfoot & Westergaard 2007: 409). A schematic rendition of this cue is in (4) (in line with Westergaard 2014):

    1. (4)
    1. Micro-cue for V2 in declaratives:
    2. DeclP[XP Decl V Subject …]

Thus, the V2 rule can be inferred from one of its salient manifestations.

Further grammatical correlates are similarly proposed in the literature on Medieval French V2, which recurrently identifies, besides post-verbal subjects (VS), also particles, preverbal objects (OV), and clitic placement. These are discussed in turn.

In the context of V2, subject inversion designates the occurrence of the subject after a finite verb, itself preceded by some XP, typically in an assertive main clause, in a language where the expressed subject is normally preverbal (5).2

    1. (5)
    1. Aprés
    2. after
    1. li
    2. 3sg.dat.cl
    1. ceint
    2. girds
    1. Lancelot
    2. Lancelot
    1. l’
    2. the
    1. espee. (Old French)
    2. sword
    1. ‘Then Lancelot girds him with the sword.’ (1220, Queste de Saint Graal)

Such a configuration “is standardly considered to be one of the most salient and robust acquisitional cues in the instantiation of a V2 grammar”, according to Ledgeway (2021: 28).3 The connection between Medieval French VS and V2 is explicitly stated by a number of authors, among whom Adams (1987); Vance (1997); Labelle (2007); Roberts (2012); Zaring (2018); Wolfe (2018; 2021) and Klævik-Pettersen (2019); but see Kaiser (1996; 2002), Rinke & Meisel (2009) and Kaiser & Zimmermann (2011).

The second micro-cue is universally recognized as closely tied to V2. It concerns the particle si, illustrated by the following example.4

    1. (6)
    1. Icil
    2. this
    1. champion
    2. champon
    1. si
    2. si
    1. est
    2. is
    1. li
    2. the.nom
    1. anemis. (Old French)
    2. enemy
    1. ‘This champion is indeed the enemy.’ (1220, Queste del saint Graal)

The difficulty in characterising the meaning of the particle has attracted considerable attention. It has been proposed to be an assertion marker (Marchello-Nizia 1985), a resumptive element (Meklenborg 2020), or a topic continuity marker (Fleischman 1991; van Reenen & Schøsler 2000). Yet others have insisted on the role of si as satisfying V2 requirements (Fleischman 1991; Ferraresi & Goldbach 2002; Ledgeway 2008; Wolfe 2017), although this is probably not its sole function since si can cooccur with other preverbal XPs (cf. (6); see Ingham and Larrivée 2015).

A third micro-cue put forward in the literature is the OV word order, where the direct object precedes rather than follows the inflected verb, as in the following illustration.

    1. (7)
    1. escu
    2. shield
    1. vos
    2. 2pl.dat.cl
    1. envoiera
    2. send.fut.3sg
    1. Diex. (Old French)
    2. God.nom
    1. ‘God will send you a shield.’ (1220, Queste del saint Graal)

As compared to VS, OV seems less frequently envisaged in relation to V2. This may be because OV does not seem to be limited to main clause finite verb (Zaring 2010), unlike V2. Yet, for Ledgeway (2012), the roll-up movement of object before the verb in Latin is a precursor of the XP movement for V2 in the relevant Romance languages (see also Roberts 2021b; Wolfe 2021). The connection between OV and V2 in Medieval French is explicitly made by some authors nonetheless, such as de Andrade (2018) and Labelle & Hirschbühler (2018). Poletto (2006; 2014) extends the relation between V2 and OV to cases of “scrambling” of the direct object before non-finite verbs. Her proposal concerns Old Italo-Romance varieties, and most notably Old Florentine. Nonetheless, the hypothetical relation between “scrambling” OV and V2 can be extended to other Medieval Romance languages showing both V2 and relatively frequent cases of OV concerning non-finite verbs, such as Medieval French.5 For this reason, we include such a micro-cue in the quantitative analysis.

Other cues have been discussed in literature on Old Romance varieties in relation to V2. One such cue is enclisis of complement clitics to the finite verb, as shown in (8) with an Old Venetian example (illustrating what is generally-known as the Tobler-Mussafia law; see Benincà 1983/84; 1995; 2006 for Italo-Romance; Wanner 1991; Fontana 1993; Fischer 2002; 2003 for Ibero-Romance).

    1. (8)
    1. … e
    2. … and
    1. mis=me
    2. put.pst=1sg.dat.cl
    1. man
    2. hand
    1. en
    2. on
    1. cavo. (Old Venetian)
    2. head
    1. ‘… and he put his hand on my head.’ (1311, Lio Mazor)

Notably, Old French had the same phenomenon, but only up to the end of the 13th century (Ramsden 1963; see Labelle & Hirschbühler 2005 for the proposal of an earlier decline in the 12th century). This is clearly confirmed by our corpus, which presents occurrences like (9), especially in the early texts.

    1. (9)
    1. Cunquerrat
    2. conquer.pst.3sg
    1. li
    2. 3sg.dat.cl
    1. les
    2. the
    1. teres
    2. lands
    1. d’
    2. from
    1. ici
    2. here
    1. qu’
    2. that
    1. en
    2. in
    1. Orient. (Old French)
    2. east
    1. ‘He conquered the lands for him from here to the East.’ (1100, Roland)

Furthermore, this cue was found to correlate with V2 in Venetian (Larrivée et al. 2024).

What we mean by cue then is a correlate of a rule, either a salient instantiation of a structure it generates, like XVS, or a cooccurring element, such as particle si, that emanates from (an aspect of) that rule. The expectation is that such correlates have a distribution and a rate of use that bear some defined relation to the target structure(s) generated by that rule. Whereas cues might have been envisaged as single elements that thus would have been indistinguishable from the triggers leading to parameter setting (Roberts 2021a: 361), what we envisage here is a cluster of cues that together make a rule’s organization and acquisition accessible to users and learners (for psycholinguistic support to a multiple cue approach, see Meisezahl et al. 2025). Such a view makes the strong prediction that the four potential micro-cues in the literature on Medieval French V2 entertain a quantitative connection between them. Such a connection should be maintained through change, making the loss of Medieval French V2 a particularly fruitful issue to explore.

Some quantitative explorations have been proposed on the interconnectedness of subject inversion and si. Based on a previous study establishing the rate of main clause subject inversion in three legal Anglo-Norman texts of the last half of the 13th century differentiated by register, Larrivée (2025) determines the number of si in the same portion of texts. The numbers are in Table 1.

Table 1

Ratio of XVS to si in three Anglo-Norman 13th century texts.

ANYBC (1270–1279) PROME (1290–1300) Fet asaver (1263)
XVS 5,1% (50/982) 8,1% (50/619) 16,7% (50/300)
si 1,3% (13/982) 1,6% (10/619) 4,3% (13/300)
ratio 4,4 5 3,9

The ratio between one cue and the other is found to be relatively stable, hovering between 3,9 times and 5 times more VS than si. The relative stability is striking given that it is achieved across register variation. However, whether a comparable connection holds in a diachronic sample remains to be investigated. The method of such an investigation is presented in the next section.

3 Testing correlations

The purpose of this first investigation is to assess the interconnectedness of four presumed micro-cues of Medieval French V2 in a large, syntactically annotated corpus. The MCVF plus Penn-BFM Parsed Corpus of Historical French (PPCHF) comprises 63 texts belonging to different forms (verse and prose), types and regions, representing 1,6 million words, from the earliest available in the 9th century to 1585. Their syntactic relations are annotated using the UPenn system (see Santorini 2016; 2021 for the most updated version) and can be extracted via the Corpus Search 2 query language (Randall et al. 2004). Based on the literature presented in the previous section, we check the correlation between four micro-cues: postverbal subjects (VS), preverbal adverbial particles such as si and ainsi (SI), configurations where the direct object precedes the verb (OV, of 5 different types), and enclisis to the finite verb (enclisis). We report here a general description of the target of the queries. The extraction queries are described in more detail in the appendix and are available in an OSF repository, along with the material necessary for reproducing our results.6

  • VS: An assertive main clause where the pronominal subject, either referential or impersonal, follows the finite verb. The reason for selecting pronominal subjects is that their postverbal position is less likely to be influenced by the type of verb (cf. the “Romance” inversion characterising NP subjects with unaccusative verbs; see e.g. Leonetti 2018). Main clauses were selected as the locus where V2 primarily manifests itself.

  • SI_VJ: An assertive main clause where a si-type adverb precedes the finite verb.

  • OV: An object that precedes its verb. The configurations in which the object precedes the verb were explored through five sub-types, following the type of verb and structure showing a preverbal direct object. This is needed as (i) a single query could not catch the whole range of occurrences at once, (ii) there might be relevant differences among the groups.7 For each subtype, all clausal environments were selected, as there is no evidence that OV is restricted to main clauses.

    • OVJ_aux: Clauses where a direct object precedes a finite auxiliary.

    • OVJ_lex: Clauses where a direct object precedes a finite lexical verb.

    • OVJ_mdj: Clauses where a direct object, whether included or not in an infinitival clause, precedes a finite modal verb.

    • OV_VPP: Clauses where a direct object precedes a past participle verb but follows the finite verb.

    • OV_INF: Clauses where a direct object precedes an infinitival verb within an infinitival clause introduced by a modal verb.

  • enclisis: An assertive clause where a complement clitic follows the finite verb.

All micro-cues have been extracted from each text in the corpus as raw numbers via the queries just described. We then relativized these raw numbers to a total for each text and for each micro-cue. This means that we did not relativize the raw numbers to a single total for each text (e.g., the total number of tokens or sentences in text), but to a relative total for each micro-cue. This is because finding, for example, n occurrences of OV in a text where direct objects have an extremely low frequency is intuitively different than finding the same number of occurrences of OV in a text where direct objects are extremely frequent, independently of the total number of tokens/sentences in the text. The relative totals are defined as follows.

  • VS_TOT: An assertive main clause with a finite verb either preceded or followed by a pronominal subject, either referential or impersonal.

  • SI_VJ_TOT: An assertive main clause with a finite verb.

  • OV_TOT: A total is calculated for each subtype.

    • OVJ_aux_TOT: clauses with both a direct object and a finite auxiliary.

    • OVJ_lex_TOT: clauses with both a direct object and a finite lexical verb.

    • OVJ_mdj_TOT: clauses with both a direct object, whether included or not in an infinitival clause, and a finite modal verb (=OV_INF_TOT).

    • OV_VPP_TOT: clauses with a direct object, a past participle verb and a finite verb.

    • OV_INF_TOT: clauses with both a direct object, whether included or not in an infinitival clause, and a finite modal verb (=OVJ_mdj_TOT).

  • Enclisis_TOT: the sum of the assertive clauses where a complement clitic directly follows the finite verb and the ones where a complement clitic directly precedes the verb.

The raw numbers resulting from the queries for the selected context and for their relative total have been reported in a spreadsheet, where each line refers to a given text and the columns to the raw numbers. Two further columns reporting the year of composition of the text and the text type have been added. Text type is defined following the criteria adopted by the corpus we investigated: P (prose texts), M (mixed prose text), R (rhymed verse) and V (verse assonant), whose definition can be found in the documentation of the MCVF and PPCHF corpora.

To investigate the interconnectedness of various micro-cues, one might simply look for correlations between their raw counts. However, larger texts naturally offer more opportunities for these cues to appear, leading to higher counts that merely reflect text size. To overcome this, we focus on proportions. By dividing each cue’s count by its relative total, we can analyse the relationships between their frequencies (proportions or probabilities). If these cues relate to a single underlying phenomenon, we anticipate they will all be positively correlated with each other. This means that the more frequent one micro-cue is in a text, the more frequent the others will be as well.

Consider Kroch (1989), who examines grammatical changes where an older rule is being replaced by a newer one. The core idea of the paper is that the rate of replacement should be the same across the different linguistic contexts in which the rule applies (Constant Rate Hypothesis). The rate of replacement is measured for each context by applying a logistic function to the proportion of the new rule over time. Hence, the paper focuses on modelling the rate of change, a diachronic measure. Similarly to what we investigate here, Kroch (1989) examines the loss of the V2 rule in Old French (among other rules) and observes that the set of phenomena generally thought to depend on it – the loss of null subjects, NP-subject inversion and pronoun-subject inversion – proceeds in parallel through time. Building on that, Zimmermann (2023) uses mixed-effects logistic regression to model the probability of a new rule across its different contexts of application, while also controlling for random variability from texts. This adds further checks to the methodology, making it more reliable. At their core, both papers track the change of one rule across its different contexts of application, asking if the progression is parallel over time in all contexts. The question that we are asking in this paper is different. We ask whether texts that show a high proportion of a given cue (i.e., context, in the terms of Kroch 1989 and Zimmermann 2023) will also display high rates of the other cues, rather than checking whether the rate of change is parallel across cues over time. That is why we are using pairwise correlation tests to examine all associations. The correlation coefficient quantifies the strength of the relationship between each pair of variables. A significant test result indicates that the observed association is unlikely to be due to random chance. Each micro-cue is represented by a variable containing its proportions across different texts. Since proportions are bounded within the [0,1] interval, these variables do not follow a normal distribution. Unlike normal distributions, where values can be any real number and are symmetrically distributed around the mean, our proportions are, as expected, heavily concentrated at lower values. For this reason, we adopted the Spearman correlation test, which is more robust for outliers and does not assume linearity or normality in the data distribution, contrary to the Pearson correlation test. We defined a weight for each text, as very short texts with few occurrences can skew the data by presenting extremely large or small proportions with respect to larger texts. The weight is based on the total of sentences in the text with a cap at 1000 sentences, beyond which all texts have equal weight; this is done to avoid having very large texts weight too much and therefore exert an overwhelming influence on the correlation. We finally opted for filtering only prose texts (P and M), which gives us a total of 42 texts.8 In what follows, we refer to the correlation analysis as defined here, that is a weighted Spearman correlation test including only prose texts. All weighted correlations have been performed in R (R version 4.4.1) via the wtd.cor function of the package weights (version 1.0.4). This function performs Pearson correlation tests for all pairs of variables. For each variable, we first calculated the ranks of its values, then applied the function to this ranked matrix to obtain the Spearman test results (as the Spearman test is essentially the Pearson test applied to ranked data). Since we are making a relatively large number of tests (15 tests for 6 variables), we needed to control the rate of false positives. All p-values resulting from our tests have been corrected via the Holm-Bonferroni correction.

As a preliminary step of the analysis, we investigated the five OV contexts, to see whether the contexts needed to be kept separate or whether they could be unified. Figure 1 displays the scatter plots, correlation coefficients, and significance levels (post-Holm-Bonferroni correction) for the test results. It also includes the density curves of the proportions for each context.

Figure 1
Figure 1

Correlation between five OV subtypes (after Holm-Bonferroni correction).

The density curves clearly illustrate that the variables do not follow a normal distribution. A normal distribution’s density curve is bell-shaped, symmetric, and peaks at the mean. In the scatter plots, the y-axis represents the proportions of the variable corresponding to the current row, while the x-axis represents the proportions of the variable corresponding to the current column. The presence of points in the top-right corner, while the majority are clustered in the bottom-left, happens because smaller texts can exhibit very high proportions. The observed upward trends in the scatter plots suggest positive correlations which is confirmed by the positive coefficients, meaning the proportions for each pair tend to increase or decrease together across texts. The higher the absolute value of the correlation coefficient, the stronger the relationship.

As shown by the scatter plots, significant levels of correlation between the categories are found. All finite contexts (OVJ_aux, OVJ_mdj and OVJ_lex) strongly correlate with each other, as well as all non-finite contexts (OV_VPP, OV_INF). The overlap among finite and non-finite contexts is extensive, the only missing correlation is between OV in front of past participles and OV in front of auxiliaries.9 In light of these results, we opted for merging together all finite and non-finite contexts in two categories, which we labelled OVJ and OV, respectively. OVJ corresponds then to what is generally defined as V2-driven OV (but consider that the numbers include both main and subordinate clauses), while OV corresponds to “scrambling” cases, where the direct object precedes the non-finite verb.

After this simplification of the OV categories, we now test the correlations among five different micro-cues, to which we add the numeric variable year, to check for the direction of the diachronic evolution. The correlation test has therefore 6 variables: year, postverbal pronominal subjects (VS), preverbal adverb si (SI_VJ), enclisis, OV with finite verbs (OVJ), and OV with non-finite verbs (OV). The scatter plots and results of the tests are reported in Figure 2.

Figure 2
Figure 2

Correlation between five cues of V2 (after Holm-Bonferroni correction).

Two things are noticeable from the scatter plots: the downward trend of the relation between each variable and year (i.e., all variables decrease through time) and the upward trend of the relation between all variables.10 This is consistent with the results of the correlation tests, which shows the statistically significant correlation coefficients after the application of the Holm-Bonferroni correction.

All variables correlate with each other. The micro-cues are all positively correlated with each other, while they are all negatively correlated with the variable year. The correlation is overall particularly strong. They are all significant both before and after applying Holm-Bonferroni correction. This means that there is a strong connection between the different micro-cues, which in turn decrease diachronically together.

While all micro-cue variables exhibit positive correlations with each other, they are also all negatively correlated with year. In other words, year is likely a confounding variable, as it could be the sole relevant connection between these micro-cues. Put it in other terms, the correlation between the variables could be driven by a shared diachronic trend – potentially triggered by independent factors – and not by a shared grammatical link. To ascertain the genuine association between the micro-cues, it is crucial to control for the time variable. To this end, we regressed (with weights) the ranks of year out of the ranks of each micro-cue variable to remove the variance explained by year.11 We then computed pairwise weighted correlation tests on these residuals. If the correlation between the variables is not just a consequence of the shared diachronic trend, the residuals obtained by “filtering out” the variance explained by year should correlate with each other. This approach ensures that any subsequent observed association is entirely independent of time. The results are in Figure 3.

Figure 3
Figure 3

Correlation between residuals (after Holm-Bonferroni correction).

These plots display the relationships between variables after regressing their ranks on year’s ranks, accounting for the influence of time. Independently of time, and after correcting for the family wise error rate, we observe significant positive correlations between VS and si, and an almost significant positive correlation between VS and OV, OV and si (the correlation was significant before applying the Holm-Bonferroni correction). This finding strongly supports the robustness of the associations between VS, si and OV. They are not due to a shared diachronic trend (as shown by the residual analysis), nor is it a chance finding from multiple testing (as shown by the Holm-Bonferroni correction). A core grammatical relation is thus indicated between having pronominal subjects after the finite verb (VS), the presence of si particle, and the positioning of direct objects before non-finite verbs (OV). On the other hand, the correlation between VS and enclisis becomes almost significantly negative. This suggests that the positive correlation found in the previous test was only due to a share diachronic path. This finding is in line with the observation that most enclisis cases in our corpus cooccur with a null subject, either due to coordination or not. A negative correlation between pronominal VS and enclisis is hence expected: null subjects increase the proportion of enclisis and (trivially) decrease the proportion of VS. No other relevant correlation is found for enclisis, either positive or negative. Finally, the rate of objects before finite verbs (OVJ) does not correlate anymore with any of the other variables, showing that also in this case the correlation effect found in the earlier test could be just a consequence of a shared diachronic trend and not of a core grammatical connection with the other variables.

With respect to enclisis and OVJ, it is also noticeable that their rates show the highest negative correlation with year (–0.76; –0.87). Since year explains most of the variation in enclisis and OVJ, little is left in the residuals. These small residuals are likely mainly noise and unlikely to show shared variation with the other cues’ residuals. So, the lack of correlation between the residuals of enclisis and OVJ and the other residuals is in this sense expected. As lack of an effect is not in itself an effect, this does not directly tell us that there is no connection between the share of OVJ/enclisis and the rest of the cues. What might be the case is that such a connection is present but blurred by additional factors that will have to be peeled away (considering OVJ, for example, an analysis by type of object might indeed be necessary to have more reliable results in this respect). This is left for future research.

The initial analysis showed that all the cues were positively correlated to each other, and all were negatively correlated with time. However, this did not tell us if the shared link is a common diachronic decreasing trend or a shared grammatical rule triggering them (as V2). The second analysis showed that VS, si and OV correlate beyond the shared diachronic trend, supporting the idea that there is a common grammatical rule triggering them. When this rule fades away from the system, VS, si and OV fade away too. The same is not proved for enclisis and OVJ, whose correlation with the other cues is not supported beyond a shared diachronic trend. Whether such a connection is nonetheless present and only hidden by the presence additional variables to be factored out is still open for debate.

4 Concluding discussion

The purpose of this paper is to test the correlation between V2 word order and its presumed micro-cues. Using a statistical approach to treebank data of Medieval French, it is demonstrated that there is a consistent relation between subject inversion, particles, and preverbal objects with non-finite verbs. This relation holds through time and is not the result of a shared diachronic trend. The correlation with enclisis and preverbal objects with finite verbs is however not proved to go beyond a shared diachronic trend, which means that the link with the other cues might be accidental (i.e., two independent changes happening at roughly the same time). The strong correlation between VS, si and OV with non-finite verbs shows how V2 word order correlates with a cluster of micro-cues and is therefore consistent with an approach by which such micro-cues shape the acquisition of given linguistic phenomena (Westergaard 2014), as implied by recent psycholinguistic research (Meisezahl et al. 2025).

The consequences for syntactic theory are potentially considerable. Two dimensions are particularly salient. The first is the consistency in the correlation across the micro-cues which we already highlighted above. This means that these phenomena form a cluster, which can be said to depend on a coherent V2 structural property. The second is the graduality of the loss. As the micro-cues for V2 gradually decrease through time, so does the V2 rule. How this information regarding rates of use is attended to by learners is then the question that arises.

This interpretation of the data is therefore compatible with the following definition of V2: Any language where the inflected verb of an assertive clause is moved to the C domain and at least one constituent is visible on its left can be considered a V2 language. Assuming a split left periphery implies that there exist several types of V2, one for each CP projection targeted by movement.12 Gradual loss of movement through the left periphery is therefore expected, also reflected in the graduality of the correlated decrease in the attestation of the micro-cues. This view has an advantage over a proposal based on “Competing Grammars” in the sense of Kroch (1989), according to which language change occurs via a competition between distinct grammars, in our case a V2 and a non-V2 grammar; while the application of the “double base hypothesis” to our situation would still be in line with the observation about the graduality of the change, the advantage of the current proposal is that it allows to highlight further sub-regularities within one linguistic system, as the variable restrictions on the number and typology of XPs preceding the finite verb (for a recent overview, see Wolfe 2019 on Old Romance; Samo 2019 on Germanic).13

The complexity of V2 is also highlighted by the micro-cues included in the cluster in a given language. For Medieval French, this study highlighted a cluster of micro-cues related to V2 which includes postverbal pronominal subjects, presence of preverbal si ‘so’, and OV with non-finite verbs. Crosslinguistic variability in the presence and frequency of such micro-cues is however expected, as they are contingent on independent properties of each language. Careful investigations are needed to map the possible cues, and what their presence or absence says about the type of V2 involved.

The two provided experimentations could be supplemented by further research. One further approach would be to augment the treebank method with a qualitative assessment of the configurations: while it was assumed that their stability throughout the period does not disrupt the expected correlation (as confirmed by our results), some VS configurations included in the count are not plausibly generated by a V2 grammar. As for OV, an interesting additional exploration would be to take into account different types of objects; as is well-known, bare quantifiers like tout ‘all’ can still be OV in modern French, and have been claimed to occupy a different sentential position in the aspectual field (Cinque 1999), potentially influencing their evolution. Another topic for future research would be to explore other word order changes, and whether micro-cues apply only to V2, a word-order ambiguous with SVO (Meisezahl et al. 2025), or whether they constrain other word orders, and indeed other syntactic phenomena.

Abbreviations

We followed the Leipzig Glossing Rules, adding the following abbreviation:

cl = clitic pronoun

Supplementary files

Supplementary materials (OSF repository): DOI 10.17605/OSF.IO/TC9DK

Acknowledgements

We would like to warmly thank our three anonymous reviewers for their constructive criticism, as well as the editor, Gabriel Martínez Vera, for his support throughout the publication process. All errors remain our own.

Funding information

The research reported in this paper was made possible by the substantial support of the Agence Nationale de la Recherche and the Deutsche Forschungsgemeinschaft to project MICLE (ANR-20-FRAL-0001-01 / DFG-449439301).

Competing interests

The authors have no competing interests to declare.

Author contributions

Pierre Larrivée and Cecilia Poletto are responsible for the initial conception of the idea. Francesco Pinzin discussed and complemented such initial idea. Natasha Romanova and Francesco Pinzin are responsible for the extraction, categorisation and checking of the data. Francesco Pinzin and Papa Hamatt Touré are responsible for the development and implementation of the statistical analyses performed. Sections 1, 2 and 4 have been drafted by Pierre Larrivée, Section 3 has been drafted by Francesco Pinzin and complemented by Papa Hamatt Touré. All sections have then been revised by all authors.

Notes

  1. See also Biberauer & Roberts (2009) for a similar approach invoking parametric hierarchies. An alternative view is proposed by Guardiano & Longobardi (2005; 2017), which models variation in terms of interconnectedness between different parameters instead of hierarchies. [^]
  2. Note that normally, clitics are not considered for counting the position of the verb; thus, (5) is an instance of strict V2, with only one XP – Aprés – before the verb ceint. [^]
  3. This cue is less straightforward in pro-drop Romance languages, which exhibit a further type of postverbal subjects whose low position is linked to their status as internal arguments of the verb (Theme/Patient). Subjects of unaccusatives, passives, etc. fall in this category (Burzio 1986; Belletti 1988). As this low position is generally constrained to indefinite nominal subjects, we decided to focus only on pronominal subjects in our analysis (see below). Furthermore, Old French (like Old Venetian, see Pinzin & Poletto 2025) does not present cases of focalized postverbal subjects with transitive and unergative verbs of the type described in Belletti (2004). Note that although Salvesen & Bech (2014) seem to propose that postverbal subjects with new information status are found in Old French, these do not correspond to focalization in the sense of evoking alternatives found in Italo-Romance; a detailed comparison would be a useful contribution from future research. [^]
  4. A reviewer suggests that the use of si is generally illustrated by examples where it is the sole preverbal element, such as Si respont as dames quesi responds to the ladies’, from the Queste del saint Graal. The reviewer further points out that icil champion ‘this champion’ in (6) might be dislocated to the left of si (see Donaldson 2012 and references therein) rather than a subject, an issue that is orthogonal to our pursuit and that we leave to future research. [^]
  5. For an analysis of OV orders in Medieval French involving non-finite verbs see Marchello-Nizia (1985); Zaring (2010; 2018); See also Wolfe (2021) for a recent overview. [^]
  6. See Supplementary materials. [^]
  7. Not all these configurations are indeed plausibly generated by a V2 grammar (e.g., OV with infinitives), hence the importance of separating the various sub-cases. [^]
  8. The correlation analysis of the poetry texts (V an R) has been performed too, but separately. We do not elaborate on the difference between the two for reason of space and scope of the article. We limit ourselves to the consideration that poetry texts show a significantly higher frequency of OV in all contexts. The fact that poetry texts behave differently from prose texts with respect to the variables we investigate further justifies the choice to separate the two groups. [^]
  9. On the same set of data but using a different statistical methodology, Pinzin et al (in prep.) show how the loss of OV in all these contexts, finite or not, proceeds in parallel. This further strengthens the correlation. [^]
  10. Three outliers are visible in the scatter plots. Two are very short texts, the Strasbourg Oaths (842) and the Psautier de l’Orne (1150). Their peculiarities (high rate of OV for the Strasbourg Oaths, of enclisis for the Psautier de l’Orne) can be explained considering their shortness, which can give extremely high rates for some phenomena. Their weight on the correlations, however, is small, given the precautions we took in building the test. The third outlier is the Clari chronicle (1205). This text is bigger and the main reason it is an outlier is its high share of adverbial si (>40% of main clauses have a preverbal si). This is not surprising, given the frequent use of si to connect sentences in chronicle narration. We kept it in our sample since (i) it is coherent with the other chronicles and (ii) there is no independent reason to think that the use of si is influenced by other than the style of the text. In line with the analysis, we expect this text to have higher shares of the other V2-related micro-cues too. [^]
  11. The five regressions (one for each micro-cue) were performed via a GAM model (Generalised Additive Model), an extension of the Generalised Linear Models (GLM) that can better handle non-linear relationships, as the ones connecting proportions (package gam, version 1.22-5, Hastie 2004). The residuals’ scatter plots have been checked for patterns which could reveal issues with the regression. No pattern was identified (the interested reader can check the plots in the OSF folder). We discarded linear regression models (GLM), as the relationship between the ranks of year and the ranks of some of the micro-cues is not linear. This is particularly visible for enclisis, where the residuals obtained with linear regression show a pattern which indicating that a correlation with time was still present. The regressions have been performed on the ranks, and not on the real values, as this methodology is more resistant to outliers. An alternative methodology using the real values can be found in the shared .rmd file. The results are similar for the positive correlations between VS & SI_VJ, VS & OV, OV & SI_VJ, and the negative correlation between enclisis and VS. Differences emerge considering OVJ and enclisis. A moderate negative correlation (–0.34) is found between OVJ and SI_VJ residuals (preverbal objects and si are possibly competing structures, as they both fill the preverbal position). An almost significant negative correlation is found between enclisis and OV. As this method is less resistant to outliers, we opted for presenting the results from the first method, also considering the similarity of the crucial outcomes. [^]
  12. While it goes beyond the objective of this paper, the question of which CP projections can be and are targeted by movement is an important one; elements of an answer are provided by Samo (2019). [^]
  13. Note that the scope of our claim is on the application of the “Competing Grammar” hypothesis to the present scenario of the loss of V2 through Old/Middle French. There are no direct implications regarding the general validity of the hypothesis itself and therefore its application to other cases. [^]

References

Adams, Marianne P. 1987. From Old French to the theory of pro-drop. Natural Language and Linguistic Theory 5(1). 1–32. DOI:  http://doi.org/10.1007/BF00161866

Belletti, Adriana. 1988. The case of unaccusatives. Linguistic inquiry 19(1). 1–34.

Belletti, Adriana. 2004. Aspects of the low IP area. The structure of CP and IP. In Rizzi, Luigi (ed.), The cartography of syntactic structures 2, 16–51. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780195159486.003.0002

Benincà, Paola. 1983/1984. Un’ipotesi sulla sintassi delle lingue romanze medievali. Quaderni Patavini di Linguistica 4. 3–19.

Biberauer, Theresa & Roberts, Ian. 2009. The return of the subset principle. In Crisma, Paola & Longobardi, Giuseppe (eds.), Historical syntax and linguistic theory, 58–74. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199560547.003.0004

Burzio, Luigi. 1986. Italian syntax: A government-binding approach. Dordrecht: Springer. DOI:  http://doi.org/10.1007/978-94-009-4522-7

Clark, Robin & Roberts, Ian. 1993. A computational model of language learnability and language change. Linguistic Inquiry 24(2). 299–345.

de Andrade, A. 2018. Aboutness topics in Old and Middle French: A corpus-based study on the fate of V2. Canadian Journal of Linguistics 63(2). 194–220. DOI:  http://doi.org/10.1017/cnj.2017.45

Donaldson, Bryan. 2012. Initial subordinate clauses in Old French: Syntactic variation and the clausal left periphery. Lingua 122. 1022–1046. DOI:  http://doi.org/10.1016/j.lingua.2012.04.003

Ferraresi, Gisella & Goldbach, Maria. 2002. V2 syntax and topicalisation in Old French. Linguistische Berichte 189. 3–25. DOI:  http://doi.org/10.46771/9783967696875_1

Fischer, Susann. 2002. The Catalan clitic system: A diachronic perspective on its syntax and phonology. Berlin / New York: Mouton de Gruyter. DOI:  http://doi.org/10.1515/9783110892505

Fischer, Susann. 2003. Rethinking the Tobler-Mussafia law. Data from Old Catalan. Diachronica 20(2). 259–288. DOI:  http://doi.org/10.1075/dia.20.2.03fis

Fleischman, Suzanne. 1991. Discourse pragmatics and the grammar of Old French: A functional reinterpretation of “si” and the personal pronouns. Romance Philology 44(3). 251–283.

Fontana, Josep M. 1993. Phrase structure and the syntax of clitics in the history of Spanish. Philadelphia, PA: University of Pennsylvania dissertation.

Foulet, Lucien. 1928. Petite syntaxe de l’ancien français. 3e édition. Paris: Champion.

Guardiano, Cristina & Longobardi, Giuseppe. 2005. Parametric comparison and language taxonomy, In Batllori, Montserrat & Hernanz, Maria-Lluïsa & Picallo, Carme & Roca, Francesc (eds.), Grammaticalization and parametric variation, 149–174. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199272129.003.0010

Guardiano, Cristina & Longobardi, Giuseppe. 2017. Parameter theory and parametric comparison. In Roberts, Ian (ed.), The Oxford handbook of Universal Grammar, 377–398. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780199573776.013.16

Hastie, Trevor. 2004. gam: Generalized Additive Models. R package version 1.22-5, https://cran.r-project.org/web/packages/gam. Accessed 18 Jun. 2025.

Ingham, Richard & Larrivée, Pierre. 2015. La structure de l’information et la sémantique de la phrase à la fin de l’ancien français. L’Information Grammaticale 145. 32–37.

Kaiser, Georg A. 1996. V2 or not V2? Subject-verb inversion in Old and Modern French interrogatives. In Brandner, Ellen & Ferraresi, Gisella (eds.), Language change and generative grammar, 168–190. Opladen: Westdeutscher Verlag. DOI:  http://doi.org/10.1007/978-3-322-90776-9_7

Kaiser, Georg A. & Zimmermann, Michael. 2011. On the decrease in subject-verb inversion in French declaratives. In Rinke, Esther & Kupisch, Tanja (eds.), The Development of grammar: Language acquisition and diachronic change. In honour of Jürgen M. Meisel, 355–381. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/hsm.11.19kai

Klævik-Pettersen, Espen. 2019. Inversion, V-to-C, and verb-second: An investigation into the syntax and word order of Old French and Late Latin. Oslo: University of Oslo dissertation.

Kroch, Anthony. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1(3). 199–244. DOI:  http://doi.org/10.1017/S0954394500000168

Labelle, Marie. 2007. Clausal architecture in early Old French. Lingua 117(1). 289–316. DOI:  http://doi.org/10.1016/j.lingua.2006.01.004

Labelle, Marie & Hirschbühler, Paul. 2005. Changes in clausal organization and the position of clitics in Old French. In Batllori, Montserrat & Hernanz, Maria-Lluïsa & Picallo, Carme & Roca, Francesc (eds.), Grammaticalization and parametric variation, 60–71. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199272129.003.0004

Labelle, Marie & Hirschbühler, Paul. 2018. Topic and focus in Old French V1 and V2 structures. Canadian Journal of Linguistics 63(2). 264–287. DOI:  http://doi.org/10.1017/cnj.2017.52

Larrivée, Pierre. 2025. Avancées dans la modélisation de l’évolution grammaticale. In Séginger, Gisèle & Yvonnet, Julien (eds.), Evolution, 235–246. Paris: Éditions Matériologiques.

Larrivée, Pierre & Poletto, Cecilia & Pinzin, Francesco & Goux, Mathieu. 2024. Asymmetry as a general cue for V2 (loss). Isogloss 10(7). 1–22. DOI:  http://doi.org/10.5565/rev/isogloss.410

Le Coultre, Julien. 1875. De l’ordre des mots dans Crestien de Troye. Leipzig.

Ledgeway, Adam. 2008. Satisfying V2 in early Romance: Merge vs. Move. Journal of Linguistics 44(2). 437–470. DOI:  http://doi.org/10.1017/S0022226708005173

Ledgeway, Adam. 2012. From Latin to Romance: Morphosyntactic typology and change. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199584376.001.0001

Ledgeway, Adam. 2020a. The north-south divide: Parameters of variation in the clausal domain. L’Italia Dialettale 81. 29–78.

Ledgeway, Adam. 2020b. Variation in the Gallo-Romance left-periphery: V2, complementizers, and the Gascon enunciative system. In Maiden, Martin & Wolfe, Sam (eds.), Variation and change in Gallo-Romance grammar, 71–99. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780198840176.003.0004

Ledgeway, Adam. 2021. V2 beyond borders: the Histoire ancienne jusqu’à César. Journal of Historical Syntax 5(29). 1–61.

Leonetti, Manuel. 2018. Two types of postverbal subject. Italian Journal of Linguistics 30(2). 11–36.

Lightfoot, David. 1995. Grammars for people. Journal of linguistics 31(2). 393–399. DOI:  http://doi.org/10.1017/S0022226700015656

Lightfoot, David. 1999. The development of language: Acquisition, change and evolution. Oxford: Blackwell.

Lightfoot, David. 2006. How new languages emerge. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511616204

Lightfoot, David & Westergaard, Marit. 2007. Language acquisition and language change: Inter-relationships. Language and Linguistics Compass 1(5). 396–415. DOI:  http://doi.org/10.1111/j.1749-818X.2007.00023.x

Marchello-Nizia, Christiane. 1985. Dire le vrai: l’adverbe “si” en français médiéval: essai de linguistique historique. Geneva: Droz.

Mathieu, Eric. 2012. The left-periphery in Old French. In Arteaga, Deborah L. (ed.), Research on Old French: The state of the art, 327–350. Dordrecht: Springer. DOI:  http://doi.org/10.1007/978-94-007-4768-5_17

Meisezahl, Marc & Kirby, Simon & Culbertson, Jennifer. 2025. Variability and learning in language change: The case of V2. Journal of Historical Syntax 9(2–10). 1–38.

Meklenborg Salvesen, Christine. 2020. Adverbial resumptive particles and verb second. In Woods, Rebecca & Wolfe, Sam (eds.), Rethinking verb second, 90–125. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780198844303.003.0005

Pinzin, Francesco & Poletto, Cecilia. 2025. On postverbal subjects in Old Venetian. Linguistic Variation. DOI:  http://doi.org/10.1075/lv.24062.pin

Pinzin, Francesco & Poletto, Cecilia & Larrivée, Pierre & Romanova, Natalia & Touré, P. Hamatt. In preparation. Parallel phases reloaded: Statistical evidence from Old French.

Poletto, Cecilia. 2002. The left-periphery of V2-Rhaetoromance dialects: A new view on V2 and V3. In Barbiers, Sjef & Cornips, Leonie & van der Kleij, Susanne (eds.), Syntactic Microvariation, 214–242. Amsterdam: Meertens Institute.

Poletto, Cecilia. 2006. Parallel phases: A study on the high and low left periphery of Old Italian. In Frascarelli, Mara (ed.), Phases of interpretation, 261–294. Berlin: de Gruyter Brill. DOI:  http://doi.org/10.1515/9783110197723.4.261

Poletto, Cecilia. 2014. Word order in Old Italian. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199660247.001.0001

Ramsden, Herbert. 1963. Weak-pronoun position in the early Romance languages. Manchester: Manchester University Press.

Randall, Beth & Kroch, Anthony & Santorini, Beatrice. 2004. CorpusSearch 2 users guide. https://github.com/beatrice57/CorpusSearch

Rinke, Esther & Meisel, Jürgen M. 2009. Subject-inversion in Old French: Syntax and information structure. In Kaiser, Georg A. & Remberger, Eva-Maria (eds.), Proceedings of the workshop “null-subjects, expletives, and locatives in Romance”. Arbeitspapier 123. 93–130. Konstanz Fachbereich Sprachwissenschaft.

Roberts, Ian. 2012. Verbs and diachronic syntax. Dordrecht: Kluwer.

Roberts, Ian. 2021a. Diachronic syntax. Oxford: Oxford University Press.

Roberts, Ian. 2021b. Second positions: A synchronic analysis and some diachronic consequences. In Wolfe, Sam & Meklenborg, Christine (eds.), Continuity and variation in Germanic and Romance, 297–328. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780198841166.003.0012

Salvesen, Christine & Bech, Kristin. 2014. Postverbal subjects in Old English and Old French. Oslo Studies in Language 6(1). 201–228. DOI:  http://doi.org/10.5617/osla.725

Samo, Giuseppe. 2019. A criterial approach to the cartography of V2. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/la.257

Santorini, Beatrice. 2016. Annotation manual for historical English (Middle to Modern). https://www.ling.upenn.edu/ppche/ppche-release-2016/annotation.

Santorini, Beatrice. 2021. Syntactic annotation manual for historical French. https://www.ling.upenn.edu/~beatrice/corpus-ling/annotation-french.

Thurneysen, Rudolf. 1892. Zur Stellung des Verbums im Altfranzösischen. Zeitschrift für Romanische Philologie 16. 289–307. DOI:  http://doi.org/10.1515/zrph.1892.16.1-4.289

Vance, Barbara. 1997. Syntactic change in Medieval French. Dordrecht: Kluwer. DOI:  http://doi.org/10.1007/978-94-015-8843-0

van Reenen, Pieter & Schøsler, Lene. 2000. The pragmatic functions of the Old French particles ainz, apres, donc, lors, or, puis, and si. In Herring, Susan C. & van Reenen, Pieter & Schøsler, Lene (eds.), Textual parameters in older languages, 59–109. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/cilt.195.05ree

Wanner, Dieter. 1991. The Tobler-Mussafia law in Old Spanish. In Campos, Héctor & Martínez-Gil, Fernando (eds.), Current studies in Spanish linguistics, 313–378. Washington, DC: Georgetown University Press.

Westergaard, Marit. 2009. The acquisition of word order: Micro-cues, information structure and economy. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/la.145

Westergaard, Marit. 2014. Linguistic variation and micro-cues in first language acquisition. Linguistic Variation 14(1). 26–45. DOI:  http://doi.org/10.1075/lv.14.1.02wes

Wolfe, Sam. 2017. Old French si, grammaticalisation, and the interconnectedness of change. In Drinka, Bridget (ed.), Historical Linguistics 2017 (Selected papers from the 23rd International Conference on Historical Linguistics, San Antonio, Texas, 31 July–4 August 2017), 353–272. Amsterdam: John Benjamins.

Wolfe, Sam. 2018. Verb second in Medieval Romance. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780198804673.001.0001

Wolfe, Sam. 2019. Redefining the typology of V2 languages: The view from Medieval Romance and beyond. Linguistic Variation 19(1). 16–46. DOI:  http://doi.org/10.1075/lv.15026.wol

Wolfe, Sam. 2021. Syntactic change in French. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780198864318.001.0001

Woods, Rebecca & Wolfe, Sam. 2020. Rethinking verb second. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oso/9780198844303.001.0001

Zaring, Laurie. 2010. Changing from OV to VO: More evidence from Old French. Ianua. Revista Philologica Romanica 10. 1–18.

Zaring, Laurie. 2018. The nature of V2 in Old French: Evidence from subject inversion in embedded clauses. Canadian Journal of Linguistics 63(2). 288–308. DOI:  http://doi.org/10.1017/cnj.2017.50

Zimmermann, Richard. 2023. An improved test of the constant rate hypothesis: Late Modern American English possessive have. Corpus Linguistics and Linguistic Theory 19(3). 323–352. DOI:  http://doi.org/10.1515/cllt-2021-0038