1 Introduction

The role played by second-language (L2) speakers in language change has attracted increasing interest in recent years, uniting research traditions in historical linguistics (Weerman 1993), sociolinguistic typology (Trudgill 2004; 2010; 2011), language complexity (Kusters 2003; 2008; McWhorter 2011) and second-language acquisition (Sorace & Serratrice 2009; Berdicevskis & Semenuks 2022). If L2 learners generally struggle to acquire certain linguistic features in a target-like fashion, and if L2 learners are sufficiently prevalent in a speech community in a particular historical situation, then features of this kind, namely L2-difficult features, may be expected to be lost from the language over extended time. This mechanism—change triggered by L2-difficulty—is supported both by empirical studies in L2 acquisition and artificial language learning (e.g. Margaza & Bel 2006; Berdicevskis & Semenuks 2022) and by large-scale typological studies which suggest a negative correlation between the proportion of L2 speakers and language complexity (Lupyan & Dale 2010; Bentz & Winter 2013; Sinnemäki & Di Garbo 2018; Sinnemäki 2020).

Although quantitative laboratory and typological studies are essential in supplying the empirical content of this hypothesis, many open questions remain about the large-scale population dynamics of this proposed mechanism of contact-induced change. For instance, just how many L2 speakers does it take to tip the balance in a contact situation? Clearly, a small minority of L2 learners is generally insufficient to lead to wholesale change. Under what conditions, then, do those who speak the language as a first language (L1) also adopt the new (non-L2-difficult) variety? Such questions can be tackled in a mathematical modelling approach, in which predictions can be generated and the range of possible behaviours of dynamical systems understood in full detail.

This paper presents a first sketch of such an approach, beginning from simple and general assumptions about language learning and social interaction. Building on the variational model of language acquisition and change (Yang 2002), I propose an extension of this model which subsumes both L1 and L2 learning under the same general mechanism. The extended model assumes that L1 and L2 learning are fundamentally alike processes, in the sense that they both depend on general mechanisms of reinforcement (Bush & Mosteller 1955). In the extended model, however, adult L2 learning is potentially disadvantaged in the sense that these learners face a difficulty in the case of certain linguistic features acquired unproblematically by children. To study the population-dynamic consequences of L1 and L2 learners interacting in the same population, the extended model considers a mixed speech community. This yields testable predictions about the eventual fates of linguistic features exposed to contact situations of different kinds, including predictions about the stable states attained, conditioned on model parameters such as the proportion of L2 speakers in the population and the relative strength of the L2-difficulty of the linguistic variable involved.

The model allows a number of qualitatively different evolutionary outcomes: complete extinction of the L2-difficult feature, temporary loss of the L2-difficult feature followed by its reacquisition, or stable variation between two grammars, one of which carries the L2-difficult feature and one which does not. These theoretically predicted outcomes correspond to different patterns observed—to varying degrees of detail—in the empirical record. To test the model, I apply it to two empirical case studies: the deflexion of the verbal paradigm in Afrikaans (Trudgill 2011) and the erosion of null subjects in Afro-Peruvian Spanish (Sessarego & Gutiérrez-Rexach 2018). Model parameters are estimated from empirical data where possible, and intuitive arguments for plausible orders of magnitude are provided where direct empirical calibration is impossible. The model is found to correctly characterize the broad outlines of both empirical developments, although challenges arise both from the lack of relevant empirical data and from a number of idealizing assumptions that must be made in order to simplify the model’s mathematics in the interest of retaining analytical tractability.

It should be emphasized that the model presented and analysed in this paper is concerned with only one potential external (“environmental”) predictor of contact-induced change, namely the proportion of L2 speakers in a speech community. Other possible external factors conditioning language change have been considered in the literature, ranging from population size (Lupyan & Dale 2010; Nettle 2012; Koplenig 2019) to social network geometries (Ke & Gong & Wang 2008; Fagyal et al. 2010; Kauhanen 2017; Josserand et al. 2021) or combinations of such parameters (Trudgill 2004). Although the present model sets such effects aside, this is not to deny their importance. Indeed, further variables may be introduced to the modelling framework whose broad outlines are laid down in this paper, in future work. It is quite likely that in a complex process such as language change, multiple factors interact in intricate ways. In fact, this is already the case for the factors considered in the present paper, L2 speaker proportion (a demographic parameter), strength of L2-difficulty (a psychological parameter) and parsing advantage (a linguistic parameter), as will be discussed at length below.

The paper is structured as follows. Section 2 provides a brief description of the two empirical test cases. Language learning, both L1 and L2, is discussed in Section 3, with the goal of characterizing the terminal state of a population of learners after a long learning period in a stationary random learning environment. This terminal state is then utilized to define inter-generational population dynamics in Section 4, in which the equilibria of the resulting nonlinear system of equations are studied. Relevant linguistic and demographic parameters are calibrated in Section 5 using available empirical data, followed by application of the mathematical model. Section 6 concludes, discusses some of the limitations of the approach, and provides a few directions for future research. To streamline discussion throughout the paper, most mathematical derivations are collected in a separate Appendix, available as a supplementary file.

2 The explananda

2.1 Afrikaans verbal deflexion

Germanic languages typically exhibit complex verbal morphology, with numerous forms found throughout the verbal paradigms, often without a transparent one-to-one mapping between form and meaning:

    1. (1)
    1. German
    2. laufen ‘to run’
    2. 1st
    3. 2nd
    4. 3rd
    1. singular
    2. laufe
    3. läufst
    4. läuft
    1. plural
    2. laufen
    3. lauft
    4. laufen
    1. (2)
    1. Dutch
    2. lopen ‘to run/walk’
    2. 1st
    3. 2nd
    4. 3rd
    1. singular
    2. loop
    3. loopt
    4. loopt
    1. plural
    2. lopen
    3. lopen
    4. lopen

In this respect Afrikaans, which began as a contact variety of Dutch in the Cape Colony in the 17th and 18th centuries, behaves rather differently. Modern Afrikaans has only one form throughout the paradigm (corresponding to the stem of the Dutch verb):

    1. (3)
    1. Afrikaans
    2. loop ‘to walk’
    2. 1st
    3. 2nd
    4. 3rd
    1. singular
    2. loop
    3. loop
    4. loop
    1. plural
    2. loop
    3. loop
    4. loop

This deflexion of verbal paradigms is just one manifestation of a larger-scale morphological regularization that separates Afrikaans from its ancestor (see Ponelis 1993).

The specific and rather unique setting in which Afrikaans arose has prompted scholars to discuss its origin extensively. There are three major competing theories: the superstratist hypothesis, according to which the structural features of Afrikaans arose from Dutch in a process of internal development; the interlectalist hypothesis, according to which the structure of Afrikaans is explained by competition between multiple different dialects of Dutch that were brought to the Cape; and the creolist hypothesis, according to which Afrikaans is a creole or semicreole that arose from the interaction between the colonizers, the native Khoekhoe population, and slaves brought by the Europeans from other parts of Africa and parts of Asia (Roberge 2002). It is clear, however, that contact must have played some role in the formation of the language; the specific classification of Afrikaans as a creole, semicreole or other kind of contact variety is less relevant here.

If the three main population constituents of the Dutch Cape Colony were in extensive linguistic contact, as seems likely (Ponelis 1993; Roberge 2002), then an amount of L2 learning will have taken place: both the native Khoekhoe and the imported slaves would have had to learn at least a limited amount of Dutch to communicate with the colonizers. The (adult) L2 Dutch spoken in the colony would form part of the input of the following generations of L1 learners, and through this mechanism of nativization of L2 output, changes could stabilize as part of the language that was in the process of formation (Trudgill 2011).1 Complex meaning-to-form mappings are thought to be difficult for L2 learners generally (DeKeyser 2005), but experimental evidence lends an additional degree of credibility to this general idea in the specific case of the Dutch verb. Blom (2006) compared child L1, child L2 and adult L2 learners of Dutch for knowledge of Dutch verbal morphology by way of a sentence completion task; the L2 learners had either Turkish or Moroccan backgrounds and spoke Turkish, Moroccan Arabic or Tarifit (a Zenati Berber language of Morocco) as their L1. Compared to Dutch L1 learners, who exhibited an accuracy of 96% standard use of the verbal paradigm, child L2 learners showed 83% (Turkish) or 85% (Moroccan) accuracy. The adult L2 learners, however, attested only 57% (Turkish) or 56% (Moroccan) overall accuracy. This suggests that verbal morphology is an L2-difficult feature to acquire (irrespective of the learner’s L1) and may thus favour paradigmatic levelling (regularization) when a significant number of adult L2 learners are involved in the contact situation.

2.2 Erosion of null subjects in Afro-Peruvian Spanish

Consistent null subject (NS) languages such as standard forms of Spanish and Italian exhibit the omission of non-emphatic, non-contrastive referential subject pronouns in finite clauses (see Roberts & Holmberg 2010), illustrated here for Spanish:

    1. (4)
    1. Hablo
    2. speak
    1. español.
    2. Spanish
    1. ‘I speak Spanish.’

In some contexts, such as with expletive subjects, a null pronoun is strictly obligatory:

    1. (5)
    1.   ∅
    1. Llueve.
    2. rains
    1.   ‘It’s raining.’
    1. (6)
    1. *Ello
    2.   it
    1. llueve.
    2. rains
    1.   ‘It’s raining.’

An overt referential subject is, however, required under certain circumstances, and the use of null vs. overt pronouns in general is typically conditioned by complex semantic and pragmatic considerations. Consider the following examples (via Montrul 2004: 226):

    1. (7)
    1. Nadiei
    2. no-one
    1. dice
    2. says
    1. que
    2. that
    1. i/j
    1. ganará
    2. will win
    1. el
    2. the
    1. premio.
    2. prize
    1. ‘No-one says that he will win the prize.’
    1. (8)
    1. Nadiei
    2. no-one
    1. dice
    2. says
    1. que
    2. that
    1. él*i/j
    2. he
    1. ganará
    2. will win
    1. el
    2. the
    1. premio.
    2. prize
    1. ‘No-one says that he will win the prize.’

With a null pronoun, the understood subject of the embedded clause in this example may or may not be coreferential with the subject of the main clause (7), whereas with an overt pronoun the referents of the subjects must differ: the interpretation in which the pronoun is bound is strictly ungrammatical if the pronoun is overt (8). Evidence now exists that features of this kind—ones that hinge on the syntax–semantics and syntax–pragmatics interfaces—are difficult for L2 learners to acquire (Sorace & Serratrice 2009; see also Walkden & Breitbarth 2019 for a theoretical account of the diachrony of null subjects as loss/gain of uninterpretable features).

Rates of subject expression differ across varieties of Spanish, with American varieties exhibiting significantly higher rates of overt subjects compared to Peninsular Spanish: Martínez-Sanz (2011: 195) reviews the available literature and shows that these rates range from as low as 12% in Valladolid, Spain, to 60% in San Juan, Puerto Rico. In particular, it has been suggested that a number of Afro-Hispanic Languages of the Americas (AHLAs)—languages that emerged from contact between Spanish and African languages in the Americas in colonial settings—attest mixed2 pro-drop systems (Sessarego & Gutiérrez-Rexach 2018). These languages not only employ overt subjects where Peninsular Spanish uses null subjects (9), they may also exhibit features of both NS and non-NS languages simultaneously (10):

    1. (9)
    1. Afro-Peruvian Spanish (Sessarego 2014: 384)
    1. Paco
    2. Paco
    1. fue
    2. went
    1. a
    2. to
    1. casa.
    2. home
    1. Él
    2. he
    1. se
    2. himself
    1. tomó
    2. took
    1. una
    2. a
    1. botella
    2. bottle
    1. de
    2. of
    1. cerveza
    2. beer
    1. y
    2. and
    1. después
    2. afterwards
    1. él
    2. he
    1. se
    2. himself
    1. fue
    2. went
    1. al
    2. to the
    1. bar
    2. bar
    1. de
    2. of
    1. fiesta.
    2. party
    1. ‘Paco went home. He drunk a bottle of beer and then went to the bar to have fun.’
    1. (10)
    1. Dominican Spanish (Toribio 2000: 319)
    1. Yo
    2. I
    1. no
    2. not
    1. lo
    2. him
    1. vi,
    2. saw
    1. él
    2. he
    1. estaba
    2. was
    1. en
    2. in
    1. Massachusetts,
    2. Massachusetts
    1. acababa
    2. finished
    1. de
    2. of
    1. llegar.
    2. arrive
    1. ‘I did not see him, he was in Massachusetts, he had just arrived.’

One way of making sense of such data is to hypothesize that speakers have access to more than one grammar simultaneously (Kroch 1989; Yang 2002)—in this case, to both a NS and a non-NS version of Spanish—and employ these competing grammars variably and probabilistically (Toribio 2000; Sessarego & Gutiérrez-Rexach 2018; for an extensive variationist account of subject pronoun expression in Spanish–English bilinguals in New Mexico, see Torres Cacoullos & Travis 2018).

Empirical learning data are again relevant in suggesting explanations for the observed patterns. Margaza & Bel (2006) tested Greek adult L2 learners of Spanish on the expression of NS pronouns; the results demonstrate that L2 learners tend to overuse overt pronouns compared to native speakers, even when the L1 is a NS language. Specifically, in a cloze task in which participants had to either express or omit a subject pronoun, intermediate L2 learners employed null subjects 52% of the time in matrix clauses and 81.66% of the time in subordinate clauses, compared to 85.50% and 98.13% in advanced learners, and 96.00% and 100% in native controls (Margaza & Bel 2006: 92). These findings suggest that L2 learners—with the possible exception of those at an advanced proficiency level—struggle with the precise pragmatic conditioning of the null/overt contrast in a NS language (for converging evidence from different language pairings, see Bini 1993; Pérez-Leroux & Glass 1999). They thus support the notion that a contact situation involving (imperfect) L2 learning followed by L1 nativization may help to explain the diachronic erosion of null subjects.

In particular, such an explanation is plausible in the case of Afro-Peruvian Spanish, a variety spoken mainly in the coastal regions of modern Peru (Sessarego 2014; 2015; Sessarego & Gutiérrez-Rexach 2018). Afro-Peruvians—descendants of slaves who were brought from various parts of Africa and forced to work on plantations, in mines and as servants in cities from the 16th century onwards—amounted to 3.6% of Peru’s population in the most recent, 2017 census (INEI 2018: 222). Although Peru abolished slavery formally in 1854, most of the Afro-Peruvian population lived in relative poverty under a semi-feudal system until as recently as the second half of the 20th century (Sessarego 2015: 79). The probable linguistic consequences of these sociohistorical facts will be discussed in detail in Section 5.

2.3 Desiderata

Previous research thus suggests that the deflexion of the Afrikaans verbal paradigm and the erosion of null subjects in Afro-Peruvian Spanish may both be traced back to an earlier contact situation involving significant amounts of L2 learning. Apart from explicating what a “significant amount” means in this context, thereby providing a unified account of what is common to both cases, a mathematical modelling approach needs to be able to account for the differences between the two situations. In the case of Afrikaans, the loss of verbal morphology is complete: all verbs are reduced to one form throughout the paradigm. In the case of Afro-Peruvian Spanish, and several other AHLAs, the loss of null subjects is incomplete, in the sense that these languages attest features of both NS and non-NS languages. Moreover, in the case of Afro-Peruvian Spanish at least, there is evidence of the younger generations moving in the direction of the NS grammar again, suggesting that the mixed NS status may not be stable; fieldwork interviews suggest that the Afro-Peruvian variety could be lost in favour of standard Peruvian Spanish “in two generations, or maybe only one” (Sessarego 2014: 397). In the ideal situation, a modelling approach would predict in what circumstances—under what combinations of model parameters—each development is likely to unfold.

3 Two learning algorithms

At a very general level, learning can be characterized as a stochastic process of modification of a distribution of probabilities to act in specific ways. These modifications are prompted by the outcomes of previous actions in a learning environment which supplies feedback to the learner (Bush & Mosteller 1955). The variational learning (VL) framework (Yang 2002) constitutes a linguistic interpretation of this general statement. In the simplest case, the learner must make a binary choice between two options, such as between employing null and overt subject pronouns. The learner stores a probability of use of one of these options, p; learning consists of modifications to p in response to previous interactions between learner and environment by way of a set of operators, which mathematically speaking are simply functions applied to p to transform its value.

Most applications of VL to date have assumed Bush and Mosteller’s (1955) one-dimensional linear reward–penalty learning scheme (e.g. Yang 2000; Heycock & Wallenberg 2013; Ingason & Legate & Yang 2013; Danckaert 2017; Kauhanen & Walkden 2018; Simonenko & Crabbé & Prévost 2019). This learning algorithm makes use of two linear operators, f and g, defined as follows:

    1. (11)
    1. f(p) = p + ɣ(1 – p),
    2. g(p) = pɣp,

where p is the probability of grammatical option G1, 1–p is the probability of grammatical option G2, and 0 < ɣ < 1 is a learning rate parameter. Operator f is applied if the learner chooses G1 and this choice manages to parse the input received by the learner at that learning step. From the form of the operator, it is easy to see that application of this operator constitutes a reward to G1: the next time the learner has to make a choice between G1 and G2, they are more likely to choose G1 than before. On the other hand, if G1 is selected but does not parse the input received by the learner, operator g is applied, disfavouring this grammatical option, meaning that the value of p is decreased. Should the learner choose G2, the logic of application of the operators is flipped (with only two options, rewarding G1 is tantamount to punishing G2, and vice versa).

Linear reward–penalty learning has a number of attractive mathematical properties, the most important of which have to do with the expected eventual state attained by the learner. Since learning is a stochastic process, it is impossible to predict the evolution of p over time exactly (unless we have full knowledge of the sequence of the learner’s choices and the learning environment’s responses at each learning step—but this is impossible except in a strict laboratory setting). However, both the expected (mean) value of p and its variance admit explicit solutions (see Bush & Mosteller 1955 and the Appendix). With increasing learning iterations the expectation, denoted by ⟨p⟩, eventually tends to the asymptotic value

    1. (12)

in the limit of infinite learning iterations. Here, π1 and π2 are the probabilities with which the environment punishes G1 and G2; these can be estimated in normal circumstances, as will be discussed in more detail below. Furthermore, the variance of p at an infinity of learning iterations can be made arbitrarily small by assuming that learning is sufficiently slow, i.e. if the learning rate ɣ has a small value. Thus, in the limit of long learning periods and slow learning rates, a population of learners are all expected to behave the same, converging to similar values of p at the end of learning. In effect, the behaviour of an entire generation of learners (subject to the same learning environment) can be condensed into one number, namely the limit of the expectation (12).

The empirical justification for this procedure stems from the fact that language learning typically involves very numerous learning iterations. It has been estimated that people speak around 16,000 words a day on average (Mehl et al. 2007). By conversational symmetry, we would expect people, and language learners in particular, to hear a similar number of words every day. Language acquisition ordinarily takes place over several years; translating the 16,000-words-a-day estimate into the number of tokens relevant for the acquisition of any reasonably frequently occurring linguistic variable then yields an estimate in the millions.3 Although it is possible that adult L2 learners are exposed to less input than L1 learners, the high order of magnitude of this estimate suggests that it is legitimate to focus on the learner’s asymptotic behaviour at large learning iterations, instead of trying to characterize the complex stochastic dynamics of the entire learning trajectory. First pioneered by Yang (2000), this strategy has been used by numerous authors to derive population-level or diachronic consequences of sequential generations of such learners (Yang 2002; Heycock & Wallenberg 2013; Ingason & Legate & Yang 2013; Danckaert 2017; Kauhanen & Walkden 2018; Kauhanen 2019; Simonenko & Crabbé & Prévost 2019).

The majority of previous studies applying the VL framework have applied it to L1 learning. The idea of grammar competition has, however, been used in a handful of L2 acquisition studies (Zobl & Liceras 2005; Rankin 2014; 2022), and it is possible also to harness the mathematics of the linear reward–penalty algorithm for the purposes of modelling L2 trajectories formally. To accommodate the existence of L2-difficult features, we include a bias against successful L2 acquisition of one of the competing grammars. This may be done by replacing the operators f and g with the following pair of operators:

    1. (13)
    1. f'(p) = p + ɣ(1 – p) – δp,
    2. g'(p) = pɣpδp,

where δ is a small positive number that quantifies the difficulty faced by L2 learners in acquiring G1.4 It is important to note that with this definition, the difficulty is endemic to L2 learning, in the sense that it is independent of the learning environment’s responses: no matter how much the environment rewards the use of G1, this option will always suffer some amount of penalty, modulated by the magnitude of δ.

In the Appendix, it is shown that an equivalent asymptotic result holds for this L2 extension of the original linear reward–penalty learning scheme. As learning iteration tends to infinity, the expected value of p with operators f' and g' tends to

    1. (14)

where d = δ/ɣ represents the L2-difficulty of G1 scaled by the learning rate parameter ɣ. Moreover, the variance can again be made arbitrarily small by taking a small enough value of ɣ (and modifying δ so as to keep the ratio d constant). In other words, a population of L2 learners employing operators f' and g' can be described by a single number at the asymptote just as a population of L1 learners employing the standard linear reward–penalty scheme can.

4 Population dynamics

4.1 Motivation

It is conceptually useful to think of language change as an intertwined process of innovation and propagation. In the present model, L2 learners who acquire a lower probability of employing the L2-difficult grammar (as compared to L1 learners) are innovators. Whether and how these innovations propagate across the population depends, roughly, on how prevalent the L2 learner population is in the entire speech community. In general, interactions between individuals in a speech community are the result of a complex combination of factors involving properties of social networks, geographical distance and interaction frequency, to name but a few. Such factors can in principle be encapsulated in stochastic models of social dynamics inspired by techniques borrowed from statistical physics (Helbing 2010); however, inclusion of too many factors usually renders such models analytically intractable. Alternatively, when populations are large, stochastic fluctuations normally average out, and the system can be reduced to a simpler deterministic model, with correspondingly simpler analysis. This is the approach adopted here, developed in detail in Section 4.2. Section 4.3 provides simulation-based support for the deterministic approximation.

4.2 Deterministic approximation

Setting aside the complications arising from the full stochastic complexity of learning trajectories and interaction patterns, let us now focus on a simplified model in which learners behave according to the learning-theoretic expectations (12) and (14) and in which population size is so large that demographic noise cancels out. Technically, we consider an infinite well-mixed population of individuals in which the fraction of L2 speakers is σ and the fraction of L1 speakers is 1 – σ. Let p and q denote the probabilities of grammar G1 in the L1 and L2 speaker populations, respectively (so that the probabilities of G2 are 1–p and 1–q). Following Yang (2000), I assume a fraction α1 of the output of grammar G1 to be incompatible with G2, and similarly a fraction α2 of the output of G2 to be incompatible with G1. The parameters α1 and α2 will be called the (grammatical) advantages5 of G1 and G2 in what follows; numerical estimates will be given in Section 5. Assuming for simplicity that learners sample linguistic input from their environments at random, the penalty probabilities of the two grammars can then be expressed in the following simple forms:

    1. (15)
    1. π1 = (1 – σ)α2(1 – p) + σ α2(1–q),
    2. π2 = (1 – σ1p + σ α1 q.

To unpack this, note that the first term on the right hand side of the first equation, for instance, represents the event of the learner interacting with an L1 speaker (1–σ) who employs grammar G2 (1–p) and utters a sentence which falls among those not compatible with G1 (α2). Similarly, the second term on the right hand side of the second equation represents the case of the learner interacting with an L2 speaker (σ) who employs grammar G1 (q) and utters a sentence not compatible with G2 (α1), and similarly for the remaining two terms.

Following Yang (2000), we can now assume that the input to the language acquisition process of the (n+1)th generation of speakers is constituted by the linguistic output of the nth generation. Assuming learners reach the learning-theoretic asymptote as argued in Section 3, this implies setting

    1. (16)
    1. pn+1 = ⟨pn,
    2. qn+1 = ⟨qn⟩',

with the expectations given by (12) and (14). Expanding these with the help of the penalties (15), we arrive at the pair of equations

    1. (17)

To reduce the number of parameters in these equations, it is useful to divide both the numerators and the denominators on the right hand side by α2 (on the assumption that α2 ≠ 0). This condenses the relation of the two advantage parameters α1 and α2 into a single number, the advantage ratio α = α12:

    1. (18)

The quantity D = d/α2 represents the relative L2-difficulty of grammar G1 scaled by the advantage of grammar G2.

To recap, we have assumed grammatical competition between two options G1 and G2, the first of which is assumed to incur an amount of L2-difficulty, quantified by the ratio d = δ/ɣ of raw L2-difficulty δ to underlying learning rate ɣ. The two learning algorithms of Section 3 lead to a mixed population of L1 and L2 speakers whose dynamics are described by the pair of equations (18), on the assumption that speakers mix randomly. The dynamics depend on three model parameters:

  • α = α12 controls the ratio of grammatical advantages, indicating how much (plain) advantage the L2-difficult grammar G1 has in relation to grammar G2;

  • D = d/α2 represents the L2-difficulty of grammar G1 in relation to the grammatical advantage of G2;

  • σ gives the proportion of L2 speakers in the population.

Intuitively, one expects the (diachronic) loss of the L2-difficult grammar G1 to be more likely for higher values of D and σ, as well as for lower values of α. These expectations are borne out by exact mathematical analysis, as will be described next.

In (18), the variables p and q represent the relative frequency of the L2-difficult grammar G1 in the L1 and L2 speaker populations, respectively. We would now like to understand the range of behaviours this system is capable of. Each pair of values (p,q) with 0 ≤ p,q ≤ 1 is a possible population state, or in other words, the state space of the system consists of the unit square [0,1] × [0,1] = [0,1]2. Of particular interest is the eventual fate of the population under the above modelling assumptions. In general, some points (p,q) of the state space may be expected to be attractors, drawing the population state to themselves over time, while other points may repel the population state. These configurations of the state space correspond to different empirical outcomes, as will be demonstrated next.

In the Appendix, it is proved that the system (18) can display one of two eventual outcomes for any fixed combination of model parameters α, D and σ. These are:

  1. The L2-difficult grammar G1 is used with some probability in one or both speaker groups. Mathematically, the system has two equilibria, the origin (p,q) = (0,0) and a further, non-origin state (p*, q*) ≠ (0,0). The former, however, is unstable, while the latter is asymptotically stable (a sink). The population will be attracted to (p*, q*) over time, with the consequence that the L2-difficult feature is retained in the language—with frequency p* in the L1 speaker population, and with frequency q* in the L2 speaker population.

  2. The L2-difficult grammar G1 vanishes from both groups. Mathematically, only one equilibrium exists, namely the origin (p,q) = (0,0), which is asymptotically stable. Consequently both the L1 and L2 populations eventually speak G2 exclusively.

Of these outcomes, the latter corresponds to the proposed mechanism of L2 mutation followed by L1 nativization. The passage from phase I to phase II is governed by a complex interaction of the three parameters, one that however can be solved analytically. First, if α < 1, so that grammar G1 has less advantage to begin with, phase II is predicted. This is unsurprising: if both grammatical fitness in the VL sense and L2-difficulty in the Trudgillian sense conspire against a grammatical option, no force exists to sustain it, and the option is predicted to disappear.6 The same outcome holds for the special case α = 1, in which the grammatical advantages are equal.

For α > 1, the situation is more complicated. If αD + 2, phase I is predicted for any combination of parameter values. In this case, the pure grammatical advantage enjoyed by the L2-difficult grammar G1 is so high that no amount of L2 learning can suppress it entirely—the L1 speaker population will continue to speak G1 at some non-zero frequency. For 1 < α < D + 2, however, the proportion of L2 speakers σ acts as a bifurcation parameter. For values of σ below a critical threshold

    1. (19)

phase I is predicted. In this case, the proportion of L2 learners in the population is not high enough to completely suppress G1. For values of σ exceeding this threshold, however, phase II is predicted: the non-origin equilibrium (p*,q*) vanishes and the population converges to (p,q)= (0,0), with both L1 and L2 speakers now employing grammar G2 exclusively. In other words, as the proportion of L2 learners in the population grows, the speech community experiences a phase transition from phase I to phase II (Figure 1 and Table 1).7

Figure 1
Figure 1

Orbit diagram of the system showing the values of p (top row) and q (bottom row) at the stable equilibrium, for various values of advantage ratio α (columns), L2-difficulty D (horizontal axis) and proportion of L2 speakers σ (vertical axis). The dashed line supplies the bifurcation boundary σcrit: the L2-difficult grammar G1 is extinct in each speaker population above this line, but coexists with grammar G2 below it.

Table 1

Eventual outcome of contact situation, for different combinations of advantage ratio α = α12, normalized L2-difficulty D = d/α2 and proportion of L2 learners σ. The critical value of the latter is given in (19).

Condition Fate of L2-difficult grammar G1
0 < α ≤ 1 lost
1 < α < D+2 and σ > σcrit lost
1 < α < D+2 and σ < σcrit retained
αD+2 retained

4.3 Finite simulations

The above deterministic model turns on the assumption that learners are well-described by the learning-theoretic asymptotic expectations (12) and (14), as well as on the assumption that interactions between speakers are sufficiently random so that stochastic fluctuations in the population average out. These idealizing assumptions were made deliberately, to unleash the greater explanatory power of analytically soluble models, compared to simulations (see McElreath & Boyd 2007: 4–11). Nevertheless, the assumptions can be debated, and the question of how a finite, stochastic system would behave is certainly an interesting one. Although it is impossible to explore the entirety of the full stochastic model’s parameter space by way of simulations in this paper, I here report the outcome of proof-of-concept simulations which lend tentative support to the deterministic approximation analysed in Section 4.2.

The choice to abstract away from the individual-level stochastic dynamics of language acquisition was defended by way of an argument from the typically slow pace of language acquisition in Section 3; this entails assuming that learners receive a large amount of input during their learning period and also make only small adjustments at each presentation of input token, so that the learning rate parameter ɣ has a small value. In the absence of direct empirical estimates of ɣ, it is legitimate to worry about the effects that high learning rates may have on the ensuing dynamics. To explore this, ten Variational Learners were simulated for varying values of ɣ, exposed to a learning environment in which the relative frequency of grammar G1 was 0.5 and the advantage parameters had the values α1 = 0.25 and α2 = 0.2. It was moreover assumed that grammar G1 incurs an L2-difficulty of d = 2. Of the ten learners, five were randomly assigned to be L1 learners; the other five were L2 learners subject to the L2-difficulty of G1. Each learner received 100,000 input tokens over their learning period and started from a randomly drawn initial value of p or q.

With these assumptions, the learning-theoretic expectations (12) and (14) predict an average probability of p = 0.56 for grammar G1 in the L1 learner population, and an average of q = 0.06 in the L2 learner population. Figure 2A presents the simulation results. As expected, the predicted averages describe the population averages. On the other hand, for higher learning rates (larger values of ɣ), a larger residual variance about this expectation is observed across the population. It turns out, however, that this residual inter-learner variance, even when large, need not affect inter-generational, diachronic developments. Consider Figure 2B, which tracks the inter-generational trajectory of p and q across 15 generations of learners, each generation consisting of 100 learners (half L1, half L2), starting from p = q = 0.99 in generation 0 and subject to the same parameters as above. To channel input from generation n to generation n+1 in a reasonably realistic way, each learner in generation n+1 was randomly drawn two “parents” from generation n; each learner only received input from its parents. The consequence is that, particularly in the intermediate stages of the diachronic development, and particularly when the learning rate parameter ɣ has a high value, there is much variation across speakers in the community. However, since the ultimate force driving the evolution of p and q is deterministic in nature (a difference in the advantages of the competing grammars, combined with L2-difficulty of one of the options), this stochastic noise is not sufficient to disturb the development. With the above parameters, equation (19) implies a critical L2 proportion threshold of σcrit = 0.3; since the actual proportion of L2 learners in the simulation is 0.5, we expect grammar G1 to be driven to extinction over inter-generational time, i.e. the values of p and q to converge toward zero. This is exactly what is observed, even for noisy (high) values of the learning rate parameter. Moreover, the deterministic trajectories computed from (18), shown in Figure 2B as the connected curves, are broadly consistent with the simulated data in each generation. In fact, with higher learning rates, the predicted equilibrium (p,q) = (0,0) is attained quicker.

Figure 2
Figure 2

Finite simulations of the model. (A) Learning trajectories of 10 individual learners, half of them L1 and the other half L2 learners, for different learning rates ɣ. First 10,000 iterations shown only, sampled every 100 iterations for increased clarity; behaviour after this initial transient period is identical. (B) Diachronic trajectory of 15 generations of learners, 100 learners in each generation (half L1, half L2). The boxplots characterize the simulated data, i.e. the distribution of p (or q) across all learners at the end of their learning period. The solid curves give the deterministic prediction using equation (18). For simulation parameters, see text.

5 Application

5.1 Interim summary

To summarize the results of the foregoing sections, our mixed population of L1 and L2 speakers is capable of two qualitatively different kinds of long-term behaviour. The parameter α = α12 expresses the ratio of the advantages of the two competing grammars. If α < 1, the state (p,q) = (0,0), in which both the L1 and L2 speaker populations use G2 to the complete exclusion of the L2-difficult grammar G1, is a stable equilibrium. If α > 1, this state is either stable or unstable depending on whether σ, the proportion of L2 speakers in the population, exceeds a critical threshold σcrit. The value of σcrit, in turn, depends on the magnitudes of α and D = d/α2, the latter expressing the relative L2-difficulty of grammar G1, scaled by the grammatical advantage of G2. In this scenario, two evolutionary outcomes are possible: either total extinction of the L2-difficult grammar G1 (when σ > σcrit), or stable variation between G1 and G2 (when σ < σcrit).

The three model parameters α, σ and D are all estimable in principle from empirical data: α from the frequencies of occurrence of different types of linguistic constructions, σ from population censuses, and D from L2 learning data. Suitable data of the last kind are presently lacking,8 but I will next attempt calibration of the first two parameters in the specific cases of Afrikaans deflexion and the erosion of null subjects in Afro-Peruvian Spanish. Here it should be borne in mind that all parameter estimates can be only approximate: some degree of uncertainty necessarily pertains to corpus estimates of grammatical advantages, and historical population data are wrought with problems well known to historians and demographers. I will therefore offer the parameter estimates as an approximate starting point, with the proviso that they come with an unknown degree of statistical uncertainty. This has important consequences on what we take the very goal of modelling to be. Even though exact quantitative statements about the equilibrium state of the linguistic system may be out of reach, it is still possible to infer something about the qualitative outcome of the contact situation in each empirical test case.

5.2 Calibrating grammatical advantages

In the case of Afrikaans verbal deflexion, the relevant competition is between a grammar that has person and number distinctions in the verbal paradigm (G1, corresponding to Dutch) and a grammar that doesn’t (G2, corresponding to Afrikaans). Parsing failure occurs when the learner–listener attaches the wrong interpretation (wrong person or number) to the surface form uttered by the speaker. Since Dutch is a non-NS language, person and number can always be inferred from the nominal domain, even if verbal inflection is eroded. On the face of it, this implies that the raw grammatical advantages of the two grammars are equal, so that α = 1. In practice, it may be argued that factors such as channel noise may lend the grammar with verbal inflection (G1, i.e. Dutch) a slight advantage over the inflectionless grammar; on the other hand, redundant agreement has been suggested to present difficulty for L2 learners (Kusters 2008). However, the magnitudes of these effects are difficult to estimate outside the context of a strict laboratory study. In what follows, I will assume that such effects are negligible at the population level, and thus proceed with the maximum-parsimony assumption that α = 1, i.e. that neither grammar is more advantageous than its competitor.

The case of null subjects in Afro-Peruvian Spanish is markedly different. The NS grammar (G1) is incompatible with overt expletive subjects (20), while the non-NS grammar (G2) is incompatible with null thematic subjects (21):

    1. (20)
    1. Overt expletive subject pronouns (* for G1)
    1. Ello
    2. it
    1. llueve.
    2. rains
    1. ‘It’s raining.’
    1. (21)
    1. Null thematic subject pronouns (* for G2)
    1. Hablo
    2. speak
    1. español.
    2. Spanish
    1. ‘I speak Spanish.’

As one might expect, thematic subjects far outnumber expletive subjects in discourse. Drawing on data on their frequencies of occurrence from multiple sources, Simonenko & Crabbé & Prévost (2019: 296, 299) estimate the grammatical advantage of a NS grammar to be about α1 = 0.7 and that of the corresponding non-NS grammar to be only about α2 = 0.05; that is to say, a difference of over an order of magnitude, implying an advantage ratio of α = 0.7/0.05 = 14 in favour of the NS grammar.9

This difference between the two case studies is important. In the case of Afrikaans, the fact that the two grammars are equiadvantageous means that, in the absence of any L2 learning, the population of speakers is stable (though not asymptotically) at any value of p: if σ = 0 and α = 1, then the value of p is always at equilibrium.10 Thus the only forces that could shift the state of the speech community (again, assuming the absence of L2 learners in the population) would have to be stochastic in nature. Once L2 learners facing an L2-difficulty with one of the grammars are introduced into the population, the equilibrium will shift: any amount of L2 learning implies that the origin (p,q) = (0,0) becomes the system’s only attractor (Table 1). Were those L2 learners to be removed from the population at a later stage, the speech community would then again assume its previous quasistable nature and come to rest at whichever value of p the L2 learning situation brought it to (this will correspond to p ≈ 0 if enough time has elapsed; see Section 5.4).

The case of null subjects is radically different. Here the original, L2-difficult grammar G1 is much more advantageous than its competitor, G2. Not only does this mean that the proportion of L2 learners in the population would have to be relatively high for G1 to be completely replaced by G2; the long-term prediction is also that if those L2 learners were to be removed, the population would drift back to the equilibrium p = 1 whose stability is guaranteed by the asymmetry in grammatical advantages in the absence of any L2 learning.

5.3 Calibrating demographics

Table 2, from Giliomee & Elphick (1979), gives an overview of the population of the Dutch Cape Colony from 1670 to 1820, the period relevant for the emergence of Afrikaans. Although it is evident from these figures that slaves, freed slaves and the indigenous peoples represented a significant fraction of the population at each stage, translating these data into figures of the likely proportion of L2 learners at the different time points is non-trivial. First, no data exist for the native population before 1798. Secondly, not all of the non-European population would be L2 learners: perhaps some of them would not have needed to learn Dutch, but even more importantly, at some point part of this population will have begun to acquire Dutch as an L1, or at least as a bilingual L2 in childhood. For these reasons, I have computed fairly wide interval estimates for σ, the proportion of (adult) L2 speakers in the colony, given in the rightmost column of Table 2. The left endpoint of each interval represents the conservative estimate that half of the non-European population would have spoken Dutch as an L2; the right endpoint gives the absolute maximum, on the (quite certainly unrealistic) assumption that the entire non-European population spoke Dutch as an L2.

Table 2

Demographics of Dutch Cape Colony, 1670–1820 (Giliomee & Elphick 1979: 360), together with estimated range for the fraction of people speaking Dutch as L2, σ (see text). Column for Khoekhoe includes people of mixed background; these data are not available prior to 1798.

Year Europeans Free Blacks Slaves Khoekhoe Prop. L2 (σ)
1670 125 13 52 [0.17, 0.34]
1690 788 48 381 [0.18, 0.35]
1711 1,693 63 1,771 [0.26, 0.52]
1730 2,540 221 4,037 [0.31, 0.63]
1750 4,511 349 5,327 [0.28, 0.56]
1770 7,736 352 8,220 [0.26, 0.53]
1798 c. 20,000 c. 1,700 25,754 14,447 [0.34, 0.68]
1820 42,975 1,932 31,779 26,975 [0.29, 0.59]

Population figures for Colonial Peru are hard to come by; however, useful data exists for Lima in the early period, specifically the first half of the 17th century. Interpretation of the demographics in Table 3, from Bowser (1974), is complicated by the fact that the various sources from which the demographic data are drawn employ different classificatory principles, sometimes pooling Spaniards and people of mixed European–American background, or people of African and people of mixed African–European background, together. Sessarego (2015: 93) argues that people of mixed background were most probably bilingual in Spanish from childhood (as one of their parents would have been Spanish). They should therefore be exempted from any estimates of the proportion of adult L2 learners in the population. The interval estimates given in Table 3 were computed with the assumption that only the Black and Indigenous groups would contribute adult L2 learners to the population; the left endpoints of these intervals again correspond to the conservative assumption that 50% of these people spoke Spanish as L2, while the right endpoints correspond to the upper bound assumption that 100% of them did.11 The resulting estimates of σ suggest a steadily, if slowly growing proportion of L2 speakers in the population, with figures roughly similar to those found in the Dutch Cape Colony.

Table 3

Demographics of Lima, 1600–1636 (Bowser 1974: 339–341), together with estimated range for the fraction of people speaking Spanish as L2, σ (see text). Mixed AmE = mixed American–European background; Mixed AfE = mixed African–European background.

Year Spanish Mixed AmE Black Mixed AfE Indigenous Prop. L2 (σ)
1600 7,193 6,621 438 [0.23, 0.46]
1614 11,867 192 10,386 744 1,978 [0.25, 0.49]
1619 9,706 11,997 1,166 1,406 [0.28, 0.55]
1636 10,758 377 13,620 861 1,426 [0.28, 0.56]

I am not aware of useful demographic data for later stages of Colonial Peru; a census was carried out in 1725–1740, but it contains no information on slaves (Pearce 2001). Useful information is available, however, on the population dynamics of the slave population on individual plantations or haciendas in the later period; while such data can tell us little about the relative proportion of L2 speakers in the population, they are useful in illuminating further developments following the initial stages of colonization. Drawing upon data in Cushner (1980), Sessarego (2015: 107) reports an estimated yearly net growth of 1.4 slaves per hacienda for the period 1710–1767; yet at the same time, the yearly natural growth (estimated from records of slaves’ births and deaths, which were kept on haciendas run by Jesuits), is negative at –2.7 per hacienda. In other words, an average of 4.1 slaves must have been imported yearly, per hacienda, in this period to match the net growth rate. It follows that, as late as the second half of the 18th century, there must have been a steady influx of new slaves, and hence of potential adult L2 learners of Spanish, into these regions.

5.4 Predictions: Afrikaans

The main linguistic features of Afrikaans are estimated (based on extant written records) to have been in place by the end of the 18th century (Roberge 2002: 83). Since the Dutch founded their permanent colony in 1652 (Guelke 1979), this leaves around 150 years for the development of Afrikaans as a language separate from Dutch. Assuming one generation to correspond to roughly 30 years (Tremblay & Vézina 2000), the development is thus estimated to have occurred over about five generations of speakers. More precisely, five generations supplies an upper bound for the development; the linguistic change may have happened faster, too, but this is impossible to determine given the scant textual record in the earlier phases.

In the well-mixing, infinite-population setup adopted in Section 4.2, the fact that the grammatical advantages are in balance in this case (α = 1) means that, in theory, even one adult L2 learner would suffice to drive the L2-difficult grammar to extinction, given enough time, irrespective of the absolute size of the population. The crucial question then concerns the time scale of the development: for fixed grammatical advantage ratio α and L2-difficulty d, the proportion of L2 speakers σ directly controls the time to extinction of G1, i.e. how long it takes for the population to converge to the attractor (p,q) = (0,0) from some initial state (p0, q0) = (1, q0). (The initial probability in the L2 population, q0, is of course an unknown.) Since the strength of L2-difficulty d is also unknown, the best one can do is to compute these times-to-extinction for several combinations of the parameter values, thereby hopefully establishing at least a subset of the parameter space that predicts the empirically observed facts.

Iterating the pair of equations (18) repeatedly, one can find the number of generations it takes for the population to converge to the attractor (p, q) = (0,0) from various initial conditions (p0,q0) = (1, q0), for various selections of d and for σ = 0.2 and σ = 0.6 (roughly corresponding to the endpoints of the interval estimates in Table 2). Convergence to the attractor is here defined, somewhat arbitrarily, as the values of both p and q being below 0.001 (0.1% use of G1).

Figure 3 plots the passage times found using this procedure. From these results, it is clear that convergence to the attractor in 5 generations is possible. For proportions of L2 speakers on the order of σ = 0.6, a strength of L2-difficulty greater than about d = 0.5 is sufficient; for lower proportions on the order of σ = 0.2, L2-difficulties in excess of about d = 5 guarantee convergence in up to 5 generations. In future work, it would be important to estimate plausible values of d using independent evidence, so that model fit can be evaluated in a more principled manner (subject to fewer researcher degrees of freedom). Having said that, the above demonstration shows that the model makes the development of Afrikaans possible (even if not necessary).

Figure 3
Figure 3

Time of convergence to attractor in the Afrikaans case study, for various values of L2-difficulty d, proportion of L2 speakers σ and initial probability of L2-difficult grammar in the L2 speaker population q0.

5.5 Predictions: Afro-Peruvian Spanish

In the case of Afro-Peruvian Spanish, the situation is radically different: in this case, it needs to be explained why the null subject grammar was never completely overthrown by the corresponding grammar without null subjects, which would be favoured by L2 learning. Recall that the critical value of the bifurcation parameter σ, the proportion of adult L2 learners in the population, was found in Section 4 to be

    1. (22)

where α = α12 is the ratio of the advantages of the two grammars and D = d/α2 represents the relative L2-difficulty of G1, scaled by the advantage of G2. The lower bound of σcrit occurs as D → ∞, namely

    1. (23)

On the other hand, in Section 5.2 the value α = 14 was estimated in this particular case. This corresponds to a of 13/14 ≈ 0.93. In other words, the proportion of L2 speakers in the population would need to be at least about 0.93—and possibly higher, if D turned out to have a small value—for the grammatical advantage enjoyed by the L2-difficult grammar G1 to be overcome and for the origin (p,q) = (0,0) to be an attractor. Since based on the available demographic data such a high proportion of L2 speakers never obtained in Colonial Peru (Section 5.3), we infer that null subjects were never about to be completely lost in the Afro-Peruvian variety of Spanish.

For any given non-zero σ, however, a stable, attracting state is implied in the interior of the state space (see again Section 4.2). This would appear to correspond, qualitatively, to the linguistic classification of Afro-Peruvian Spanish as a mixed NS language (Section 2.2)—that is, as a variety sometimes employing, sometimes not employing, null subjects. It is worth pointing out, however, that as the proportion of L2 speakers decreases over time, this interior equilibrium tends to a stable rest point on the boundary of the state space, crucially with the property p = 1 (so that L1 speakers have reverted to using null subjects all the time). This may help to explain the reported instability of the mixed NS status of Afro-Peruvian Spanish—the observation that the younger speakers in these communities are turning towards the standard Spanish grammar, with full null subjects (Sessarego 2014). Although Sessarego (2014) attributes this development mostly to sociolinguistic causes—to the prestige enjoyed by standard coastal Peruvian Spanish—the above considerations suggest an alternative, or at least complementary analysis: the loss of overt subjects results from the fact that there are fewer L2 speakers of the variety in the speech community, leading to fewer constructions totally lacking null subjects in the input data based on which L1 learners acquire their variety.

6 Conclusion and outlook

This paper has presented a mathematical model of contact-induced linguistic changes in which adult L2 learning plays a leading role. Fairly lenient assumptions about language learning were used to characterize the terminal state of learners exposed to a given learning environment; this terminal state was then used to derive a model of the population dynamics of a mixed population of L1 and L2 speakers. Focusing on a deterministic approximation of the full stochastic model allowed us to derive formal results about the equilibria and bifurcations of the system in relation to three main model parameters: α, a measure of how much grammatical advantage the L2-difficult grammar G1 has over its competitor G2; D, a measure of the strength of the L2-difficulty of G1; and σ, the proportion of L2 speakers present in the population. It was shown that the system has either one or two equilibria: either the point (p,q) = (0,0), at which both the L1 and L2 populations speak G2 to the complete exclusion of G1, is asymptotically stable; or else this point is unstable and another asymptotically stable rest point exists in the state space. Passage from one of these phases to the other is controlled by the bifurcation parameter σ with critical value

    1. (24)

whenever α < D + 2. The origin (p,q) = (0,0) is the only equilibrium when σ > σcrit.

Estimating empirical values of quantities such as grammatical advantages is now routine in variationist approaches to linguistic history (Yang 2000; Heycock & Wallenberg 2013; Danckaert 2017; Simonenko & Crabbé & Prévost 2019). Estimation of demographic parameters such as the proportion of L2 speakers is also straightforward in principle (though by no means necessarily so in practice). By contrast, estimating the L2-difficulty suffered by a grammar is less trivial. Future work could potentially attempt to infer such parameters either from longitudinal data on L2 learning trajectories or from the terminal states attained by a larger sample of learners. In many cases, estimation of all three parameters may be unnecessary, however. As D tends to infinity, σcrit tends to (α – 1). This supplies a lower bound for the critical value of the bifurcation parameter. Thus, if in a given case it is possible to show that the empirical value of σ never exceeded this lower bound, then the prediction from the modelling is that the L2-difficult grammar should coexist with its competitor in the speech community. Conversely, as D approaches 0, σcrit grows without bound. However, since D = d/α2 and 0 < α2 < 1, it is often possible to argue that D > 1 as long as the unscaled L2-difficulty d = δ/ɣ can reasonably be assumed to be more than unity. This, then, provides an upper bound on the critical value of σ (evaluated at D = 1), namely 2(α – 1). If it is possible to show that the empirical value of σ exceeds this upper bound, then the prediction from the modelling is that the L2-difficult grammar will be driven to complete extinction by the contact situation, given enough time.

The above results provide modelling-based evidence for the feasibility of the hypothesis of contact-induced change caused by imperfect L2 learning followed by L1 nativization (Weerman 1993; Trudgill 2004; 2010; 2011): it is possible for features to be lost from a language as a consequence of L2-difficulty. However, the results simultaneously suggest that extra-linguistic, population-dynamic parameters such as σ may stand in complex relationships to structural, linguistic parameters such as α and parameters to do with the psychology of learning such as d. Thus it would be unrealistic to expect to find a simple and constant threshold of the proportion of L2 learners sufficient and necessary to cause change in the overall population. Nor is it realistic to expect all linguistic features or variables to respond identically to identical population-dynamic situations. The modelling results here suggest that such a view would be too simplistic. In fact, there may be empirical configurations of key system parameters which predict that adult L2 learning is insufficient to drive certain features to extinction, in certain population settings. This seems likely, for instance, for null subjects in cases such as that of Afro-Peruvian Spanish. On the other hand, in the case of variables where little to no advantage is enjoyed by either competing variant for purely linguistic reasons (such as verbal inflection in non-NS languages), those variables may be very sensitive to external factors such as L2 learning, with fairly low values of σ sometimes sufficing to set the language on the course to lose the feature.

This paper has discussed one potential mechanism of contact-induced change. It assumes that L2 learners introduce a “linguistic mutation” to a speech community; this mutation then spreads as the primary linguistic data of L1 learners changes. Changes are cumulative and take place, typically, over extended inter-generational timescales. It is important to note that this is, however, only one possible mechanism that may generate the types of diachronic developments here studied. An alternative view holds that changes propagate as L1 speakers accommodate to the language use of L2 speakers in an act of foreigner-talk (Valdman 1981; Atkinson & Smith & Kirby 2018). It is likely, in fact, that both mechanisms are at play at least to some extent in the real world. What is clear in any case is that the primary linguistic data of L1 learners will change regardless of the specific mechanism; it is this shift in input which, ultimately, secures propagation of innovations.

Future modelling work can, naturally, look into the effects of introducing more intricate mechanisms of skewed input. In a similar vein, it would be desirable in future work to model other aspects of L2 acquisition in more detail. In the present paper, universal L2-difficulty is the driving force of innovation. This difficulty is assumed to apply to all L2 learners in a similar way, regardless of the learner’s L1. Although evidence for such a universal psychological difficulty was cited above, the existence of transfer effects in L2 acquisition is also well-documented in the literature (Schwartz & Sprouse 1996). In extensions of modelling work of this kind, such effects may be studied alongside universal biases. A related point is that, in the current framework, L1 and L2 acquisition are identical up to the effect of L2-difficulty; there is, for instance, no way in which L2 learners might sometimes outperform L1 learners (“positive transfer”). Such extensions could be considered in future research.

At the population-dynamic level, future work should explore in detail the effects of modelling the stochastic dynamics of interacting populations explicitly—beyond the proof-of-concept simulations reported in Section 4.3. The great benefit of studying the deterministic limit of the model, in abstraction of stochastic fluctuations, is that strong analytical results become available. The bifurcation threshold identified in this paper is one such result: for any given combination of model parameters—proportion of L2 learners, extent of L2-difficulty, and advantage ratio—the model predicts either complete change or only partial change or no change at all. This increases the falsifiability of our theory, as all parameters can in principle be estimated from data and prediction compared against historical outcome. Should it turn out that the model makes the wrong predictions, some of its assumptions can be modified. For this reason, acquisition of further empirical test cases is an important desideratum. Although much current work into the role of demographic factors in language change is typological, synchronic and correlational in nature, diachronic data are essential in evaluating mechanistic models of change.

The results of this paper also add to a growing body of literature illustrating the utility of the Variational Learning model of linguistic variation and change. In the future, it will be important to move in the direction of more direct tests of key model ingredients, such as the learning rate parameter. Equally important is the balanced consideration and development of alternative formal models and, once the empirical predictions of each model have been figured out, the rigorous statistical comparison of competing models vis-à-vis empirical data.


AHLA = Afro-Hispanic Language of the Americas, L1 = first language, L2 = second language, NS = null subject, VL = variational learning


  1. On the loose-knit nature of early Dutch Cape society, Ponelis (1993: 3) writes: “there was a lack of community structure such as that which characterised the early New England settlement, where not only families but entire communities crossed from England to America. Cape society reconstituted itself beyond the purview of the VOC [Vereenigde Oostindische Compagnie, the Dutch East India Company]: new, extended families crystallised from what was at first a motley crowd of men with only a few women, and a speech community came into being where Dutch was in competition with other languages.” Three observations are particularly important to understand the sociolinguistic situation of early Cape Colony. First, the Dutch East India Company explicitly required Dutch (and not, for instance, Portuguese) to be used in communication with slaves (Ponelis 1993: 25). Secondly, it is suspected that childcare was in many cases left to the responsibility of slave or Khoekhoe women, who “transferred to them their own approximate (broken) Dutch” (Ponelis 1993: 8). Finally, the Cape was not a plantation colony but rather one in which slaves were distributed relatively uniformly across the population (Ponelis 1993: 12). [^]
  2. I use the term mixed rather than “partial” (Sessarego & Gutiérrez-Rexach 2018) here, as the latter term has taken on a specific theoretical meaning in the generative literature on null subjects (see Roberts & Holmberg 2010). It is unclear, and from the point of view of the present study of secondary interest, whether Afro-Peruvian Spanish is a partial NS language in the latter sense; the relevant fact is that the language lies, in some sense, between the polar extremes of being a consistent NS language and being a (consistent) non-NS language. [^]
  3. Note that the two empirical variables considered in this paper, verbal inflection (or lack thereof) and subject expression (or lack thereof), will figure in the vast majority of utterances heard by the learner. [^]
  4. To guarantee that p always remains in the interval [0,1] and thus a probability, one must require that 0 < δ < 1 – ɣ. See the Appendix for details. [^]
  5. Nomenclature is sometimes confusing in the literature, with some authors using the term “fitness” for these quantities and reserving the term “advantage” for derived notions such as difference in fitnesses. I here follow the original terminology put forward in Yang (2000). Ultimately (see below), the relative fitness of the competing grammatical options will be measured using the ratio α = α12, with greater (smaller) than unity values signifying that G1 (G2) has more advantage than its competitor. [^]
  6. The case α < 1 is also empirically uninteresting in most situations, as it begs the question how grammar G1 could have established itself in the L1 population in the first place, if it has less grammatical advantage than its competitor G2. [^]
  7. A reviewer asks why the proportion of L2 speakers σ is chosen as the critical parameter, rather than α or D. Indeed, from a mathematical point of view the choice is arbitrary: as long as the inequality σ > σcrit is satisfied, loss of the L2-difficult feature is predicted. So, for instance, we may hold α and D fixed and vary σ to cross from one phase to the other, but we may equally well hold α and σ fixed and vary D to undergo that phase transition. However, from a substantive point of view, the proportion of L2 speakers σ is the only parameter that routinely changes its value—the advantage ratio α stays fixed as long as no other grammatical changes occur in the language, and δ (and, by extension, D whenever α2 is fixed) is presumably constant as it reflects a universal psychological bias. Hence it is natural to treat σ as the critical control parameter. [^]
  8. What is required is access to individual-level longitudinal production data from L2 learners over a substantial stretch of their learning trajectory, against which theoretically predicted learning curves could be fit. If learning curves with δ > 0 fit such data better than curves with δ = 0, we would have positive evidence for the L2-difficulty. [^]
  9. If learners of a NS grammar are learning a syntactic parameter that controls a cluster of surface features, as classical formulations of the null subject parameter expect (Roberts & Holmberg 2010; but see Simonenko & Crabbé & Prévost 2019 for evidence to the contrary), then there will be further cues in the learner’s input that either reward or penalize the two competing settings (such as rich vs. poor agreement inflection, or the possibility vs. impossibility of free inversion). Such complications fall outside the scope of the present paper but should be explored in future work. [^]
  10. To see this, we take the first equation from (18) and set σ = 0 and α = 1; the resulting expression pn+1 = pn means that the value of p will not change from generation to generation. [^]
  11. To compute the estimates for the year 1600, for which Bowser’s (1974) data pools people of African and mixed African–European heritage together, I have used the overall fraction of African to African–European for the remaining years (0.92). This yields an estimate of 6,091 African and 530 mixed African–European in the year 1600. [^]

Data and code accessibility

The programs required for replicating the simulations described in this paper can be obtained from https://doi.org/10.5281/zenodo.7004002.

Supplementary files

Supplementary file 1: Appendix

Mathematical derivations. DOI: https://doi.org/10.16995/glossa.8211.s1

Funding information

The major part of the research here reported was funded by the European Research Council as part of project STARFISH (851423). The work was begun during the author’s fellowship at the Zukunftskolleg of the University of Konstanz, funded by the Federal Ministry of Education and Research (BMBF) and the Baden-Württemberg Ministry of Science as part of the Excellence Strategy of the German Federal and State Governments. The Zukunftskolleg additionally provided funding for computing equipment. All this support is gratefully acknowledged.


I am indebted to Fernanda Barrientos, Sarah Einhaus, Gemma McCarley, Raquel Montero, Molly Rolf, Joel Wallenberg and George Walkden for numerous discussions which have influenced my thinking on language learning, language change and population dynamics generally, as well as for specific comments pertaining to manuscript versions of the present work. I also wish to thank four anonymous reviewers whose comments helped to sharpen various aspects of the paper. Any remaining errors are, naturally, my sole responsibility.

Competing interests

The author has no competing interests to declare.


Atkinson, Mark & Smith, Kenny & Kirby, Simon. 2018. Adult learning and language simplification. Cognitive Science 42(8). 2812–2854. DOI:  http://doi.org/10.1111/cogs.12686

Bentz, Christian & Winter, Bodo. 2013. Languages with more second language learners tend to lose nominal case. Language Dynamics and Change 3. 1–27. DOI:  http://doi.org/10.1163/22105832-13030105

Berdicevskis, Aleksandrs & Semenuks, Arturs. 2022. Imperfect language learning reduces morphological overspecification: experimental evidence. PLoS ONE 17(1). e0262876. DOI:  http://doi.org/10.1371/journal.pone.0262876

Bini, Milena. 1993. La adquisición del italiano: más allá de las propiedades sintácticas del parámetro pro-drop. In Liceras, Juana M. (ed.), La lingüística y el análisis de los sistemas no nativos, 126–139. Ottawa: Dovehouse Editions Canada.

Blom, Elma. 2006. Agreement inflection in child L2 Dutch. In Belletti, Adriana & Bennati, Elisa & Chesi, Cristiano & Di Domenico, Elisa & Ferrari, Ida (eds.), Language acquisition and development: proceedings of GALA 2005, 49–61. Newcastle: Cambridge Scholars Press.

Bowser, Frederick P. 1974. The African slave in Colonial Peru 1524–1650. Stanford, CA: Stanford University Press.

Bush, Robert R. & Mosteller, Frederick. 1955. Stochastic models for learning. New York, NY: Wiley. DOI:  http://doi.org/10.1037/14496-000

Cushner, N. 1980. Lords of the land: sugar, wine, and the Jesuit estates of Coastal Peru. Albany, NY: State University of New York Press.

Danckaert, Lieven. 2017. The loss of Latin OV: steps towards an analysis. In Aboh, Enoch & Haeberli, Eric & Puskás, Genoveva & Schönenberger, Manuela (eds.), Elements of comparative syntax: theory and description, 401–446. Berlin: De Gruyter Mouton. DOI:  http://doi.org/10.1515/9781501504037-015

DeKeyser, Robert M. 2005. What makes learning second-language grammar difficult? A review of issues. Language Learning 55(S1). 1–25. DOI:  http://doi.org/10.1111/j.0023-8333.2005.00294.x

Fagyal, Zsuzsanna & Swarup, Samarth & Escobar, Anna María & Gasser, Les & Lakkaraju, Kiran. 2010. Centers and peripheries: network roles in language change. Lingua 120. 2061–2079. DOI:  http://doi.org/10.1016/j.lingua.2010.02.001

Giliomee, Hermann & Elphick, Richard. 1979. The structure of European domination at the Cape, 1652–1820. In Elphick, Richard & Giliomee, Hermann (eds.), The shaping of South African society, 1652–1820, 359–390. Cape Town: Longman.

Guelke, Leonard. 1979. The white settlers, 1652–1780. In Elphick, Richard & Giliomee, Hermann (eds.), The shaping of South African society, 1652–1820, 41–74. Cape Town: Longman.

Helbing, Dirk. 2010. Quantitative sociodynamics: stochastic methods and models of social interaction processes. 2nd edition. Berlin: Springer. DOI:  http://doi.org/10.1007/978-3-642-11546-2

Heycock, Caroline & Wallenberg, Joel. 2013. How variational acquisition drives syntactic change: the loss of verb movement in Scandinavian. Journal of Comparative Germanic Linguistics 16. 127–157. DOI:  http://doi.org/10.1007/s10828-013-9056-0

INEI. 2018. Perú: perfil sociodemográfico. Lima: Instituto Nacional de Estadística e Informática. https://www.inei.gob.pe/media/MenuRecursivo/publicaciones_digitales/Est/Lib1539/index.html.

Ingason, Anton Karl & Legate, Julie Anne & Yang, Charles. 2013. The evolutionary trajectory of the Icelandic New Passive. University of Pennsylvania Working Papers in Linguistics 19. 91–100. https://repository.upenn.edu/pwpl/vol19/iss2/11.

Josserand, Mathilde & Allassonnière-Tang, Marc & Pellegrino, François & Dediu, Dan. 2021. Interindividual variation refuses to go away: a Bayesian computer model of language change in communicative networks. Frontiers in Psychology 12. 626118. DOI:  http://doi.org/10.3389/fpsyg.2021.626118

Kauhanen, Henri. 2017. Neutral change. Journal of Linguistics 53(2). 327–358. DOI:  http://doi.org/10.1017/S0022226716000141

Kauhanen, Henri. 2019. Stable variation in multidimensional competition. In Breitbarth, Anne & Bouzouita, Miriam & Danckaert, Lieven & Farasyn, Melissa (eds.), The determinants of diachronic stability, 263–290. Amsterdam: Benjamins. DOI:  http://doi.org/10.1075/la.254.11kau

Kauhanen, Henri & Walkden, George. 2018. Deriving the constant rate effect. Natural Language & Linguistic Theory 36(2). 483–521. DOI:  http://doi.org/10.1007/s11049-017-9380-1

Ke, Jinyun & Gong, Tao & Wang, William S.-Y. 2008. Language change and social networks. Communications in Computational Physics 3(4). 935–949.

Koplenig, Alexander. 2019. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. Royal Society Open Science 6. 181274. DOI:  http://doi.org/10.1098/rsos.181274

Kroch, Anthony S. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1(3). 199–244. DOI:  http://doi.org/10.1017/S0954394500000168

Kusters, Wouter. 2003. Linguistic complexity: the influence of social change on verbal inflection. Utrecht: LOT.

Kusters, Wouter. 2008. Complexity in linguistic theory, language learning and language change. In Miestamo, Matti & Sinnemäki, Kaius & Karlsson, Fred (eds.), Language complexity: typology, contact, change, 3–22. Amsterdam: Benjamins. DOI:  http://doi.org/10.1075/slcs.94.03kus

Lupyan, Gary & Dale, Rick. 2010. Language structure is partly determined by social structure. PLoS ONE 5(1). e8559. DOI:  http://doi.org/10.1371/journal.pone.0008559

Margaza, Panagiota & Bel, Aurora. 2006. Null subjects at the syntax–pragmatics interface: evidence from Spanish interlanguage of Greek speakers. In Grantham O’Brien, Mary & Shea, Christine & Archibald, John (eds.), Proceedings of the 8th Generative Approaches to Second Language Acquisition Conference (GASLA 2006), 88–97. Somerville, MA: Cascadilla Proceedings Project.

Martínez-Sanz, Cristina. 2011. Null and overt subjects in a variable system: the case of Dominican Spanish. Ottawa: University of Ottawa dissertation.

McElreath, Richard & Boyd, Robert. 2007. Modeling the evolution of social behaviour: a guide for the perplexed. Chicago, IL: The University of Chicago Press.

McWhorter, John H. 2011. Linguistic simplicity and complexity: why do languages undress? Berlin: De Gruyter Mouton. DOI:  http://doi.org/10.1515/9781934078402

Mehl, Matthias R. & Vazire, Simine & Ramírez-Esparza, Nairán & Slatcher, Richard B. & Pennebaker, James W. 2007. Are women really more talkative than men? Science 317. 82. DOI:  http://doi.org/10.1126/science.1139940

Montrul, Silvina A. 2004. The acquisition of Spanish: morphosyntactic development in monolingual and bilingual L1 acquisition and adult L2 acquisition. Amsterdam: Benjamins. DOI:  http://doi.org/10.1075/lald.37

Nettle, Daniel. 2012. Social scale and structural complexity in human languages. Philosophical Transactions of the Royal Society B 367. 1829–1836. DOI:  http://doi.org/10.1098/rstb.2011.0216

Pearce, Adrian J. 2001. The Peruvian population census of 1725–1740. Latin American Research Review 36(3). 69–104.

Pérez-Leroux, Ana T. & Glass, William R. 1999. Null anaphora in Spanish second language acquisition: probabilistic versus generative approaches. Second Language Research 15(2). 220–249. DOI:  http://doi.org/10.1191/026765899676722648

Ponelis, Fritz. 1993. The development of Afrikaans. Frankfurt am Main: Peter Lang.

Rankin, Tom. 2014. Variational learning in L2: the transfer of L1 syntax and parsing strategies in the interpretation of wh-questions by L1 German learners of L2 English. Linguistic Approaches to Bilingualism 4(4). 432–461. DOI:  http://doi.org/10.1075/lab.4.4.02ran

Rankin, Tom. 2022. Input and competing grammars in L2 syntax. Second Language Research. DOI:  http://doi.org/10.1177/02676583221091389

Roberge, Paul T. 2002. Afrikaans: considering origins. In Mesthrie, Rajend (ed.), Language in South Africa, 79–103. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511486692.005

Roberts, Ian & Holmberg, Anders. 2010. Introduction: parameters in minimalist theory. In Biberauer, Theresa & Holmberg, Anders & Roberts, Ian & Sheehan, Michelle (eds.), Parametric variation: null subjects in minimalist theory, 1–57. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511770784.001

Schwartz, Bonnie D. & Sprouse, Rex A. 1996. L2 cognitive states and the Full Transfer/Full Access model. Second Language Research 12(1). 40–72. DOI:  http://doi.org/10.1177/026765839601200103

Sessarego, Sandro. 2014. Afro-Peruvian Spanish in the context of Spanish creole genesis. Spanish in Context 11(3). 381–401. DOI:  http://doi.org/10.1075/sic.11.3.04ses

Sessarego, Sandro. 2015. Afro-Peruvian Spanish: Spanish slavery and the legacy of Spanish creoles. Amsterdam: Benjamins. DOI:  http://doi.org/10.1075/cll.51

Sessarego, Sandro & Gutiérrez-Rexach, Javier. 2018. Afro-Hispanic contact varieties at the syntax/pragmatics interface: pro-drop phenomena in Chinchano Spanish. In King, Jeremy & Sessarego, Sandro (eds.), Language variation and contact-induced change: Spanish across space and time. Amsterdam: Benjamins. DOI:  http://doi.org/10.1075/cilt.340.05ses

Simonenko, Alexandra & Crabbé, Benoit & Prévost, Sophie. 2019. Agreement syncretization and the loss of null subjects: quantificational models for Medieval French. Language Variation and Change 31(3). 275–301. DOI:  http://doi.org/10.1017/S0954394519000188

Sinnemäki, Kaius. 2020. Linguistic system and sociolinguistic environment as competing factors in linguistic variation: a typological approach. Journal of Historical Sociolinguistics 6(2). 20191010. DOI:  http://doi.org/10.1515/jhsl-2019-1010

Sinnemäki, Kaius & Di Garbo, Francesca. 2018. Language structures may adapt to the sociolinguistic environment, but it matters what and how you count: a typological study of verbal and nominal complexity. Frontiers in Psychology 9. 1141. DOI:  http://doi.org/10.3389/fpsyg.2018.01141

Sorace, Antonella & Serratrice, Ludovica. 2009. Internal and external interfaces in bilingual language development: beyond structural overlap. International Journal of Bilingualism 13(2). 195–210. DOI:  http://doi.org/10.1177/1367006909339810

Toribio, Almeida Jacqueline. 2000. Setting parametric limits on dialectal variation in Spanish. Lingua 110. 315–341. DOI:  http://doi.org/10.1016/S0024-3841(99)00044-3

Torres Cacoullos, Rena & Travis, Catherine E. 2018. Bilingualism in the community: codeswitching and grammars in contact. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/9781108235259

Tremblay, Marc & Vézina, Hélène. 2000. New estimates of intergenerational time intervals for the calculation of age and origins of mutations. The American Journal of Human Genetics 66(2). 651–658. DOI:  http://doi.org/10.1086/302770

Trudgill, Peter. 2004. Linguistic and social typology: the Austronesian migrations and phoneme inventories. Linguistic Typology 8. 305–320. DOI:  http://doi.org/10.1515/lity.2004.8.3.305

Trudgill, Peter. 2010. Contact and sociolinguistic typology. In Hickey, Raymond (ed.), The handbook of language contact, 299–319. Chichester: Wiley–Blackwell. DOI:  http://doi.org/10.1002/9781444318159.ch15

Trudgill, Peter. 2011. Sociolinguistic typology: social determinants of linguistic complexity. Oxford: Oxford University Press.

Valdman, Albert. 1981. Sociolinguistic aspects of foreigner talk. International Journal of the Sociology of Language 28. 41–52. DOI:  http://doi.org/10.1515/ijsl.1981.28.41

Walkden, George & Breitbarth, Anne. 2019. Complexity as L2-difficulty: implications for syntactic change. Theoretical Linguistics 45(3–4). 183–209. DOI:  http://doi.org/10.1515/tl-2019-0012

Weerman, Fred. 1993. The diachronic consequences of first and second language acquisition: the change from OV to VO. Linguistics 31. 903–931. DOI:  http://doi.org/10.1515/ling.1993.31.5.903

Yang, Charles D. 2000. Internal and external forces in language change. Language Variation and Change 12. 231–250. DOI:  http://doi.org/10.1017/S0954394500123014

Yang, Charles D. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press.

Zobl, Helmut & Liceras, Juana M. 2005. Accounting for optionality in nonnative grammars: parametric change in diachrony and L2 development as instances of internalized diglossia. In Dekydtspotter, Laurent & Sprouse, Rex A. & Liljestrand, Audrey (eds.), Proceedings of the 7th Generative Approaches to Second Language Acquisition Conference (GASLA 2004). Somerville, MA: Cascadilla Proceedings Project.