A bifurcation threshold for contact-induced language change

One proposed mechanism of language change concerns the role played by second-language (L2) learners in situations of language contact. If sufficiently many L2 speakers are present in a speech community in relation to the number of first-language (L1) speakers, then those features which present a difficulty in L2 acquisition may be prone to disappearing from the language. This paper presents a mathematical account of such contact situations based on a stochastic model of learning and nonlinear population dynamics. The equilibria of a deterministic reduction of the model, describing a mixed population of L1 and L2 speakers, are fully characterized. Whether or not the language changes in response to the introduction of L2 learners turns out to depend on three factors: the overall proportion of L2 learners in the population, the strength of the difficulty speakers face in acquiring the language as an L2, and the language-internal utilities of the competing linguistic variants. These factors are related by a mathematical formula describing a phase transition from retention of the L2-difficult feature to its loss from both speaker populations. This supplies predictions that can be tested against empirical data. Here, the model is evaluated with the help of two case studies, morphological levelling in Afrikaans and the erosion of null subjects in Afro-Peruvian Spanish; the model is found to be broadly in agreement with the historical development in both cases.


Introduction
The role played by second-language (L2) speakers in language change has attracted increasing interest in recent years, uniting research traditions in historical linguistics (Weerman 1993), sociolinguistic typology (Trudgill 2004;2010;2011), language complexity (Kusters 2003;2008; McWhorter 2011) and second-language acquisition (Sorace & Serratrice 2009;Berdicevskis & Semenuks 2022).If L2 learners generally struggle to acquire certain linguistic features in a target-like fashion, and if L2 learners are sufficiently prevalent in a speech community in a particular historical situation, then features of this kind, namely L2-difficult features, may be expected to be lost from the language over extended time.This mechanism-change triggered by L2-difficulty-is supported both by empirical studies in L2 acquisition and artificial language learning (e.g.Margaza & Bel 2006;Berdicevskis & Semenuks 2022) and by large-scale typological studies which suggest a negative correlation between the proportion of L2 speakers and language complexity (Lupyan & Dale 2010;Bentz & Winter 2013;Sinnemäki & Di Garbo 2018;Sinnemäki 2020).
Although quantitative laboratory and typological studies are essential in supplying the empirical content of this hypothesis, many open questions remain about the large-scale population dynamics of this proposed mechanism of contact-induced change.For instance, just how many L2 speakers does it take to tip the balance in a contact situation?Clearly, a small minority of L2 learners is generally insufficient to lead to wholesale change.Under what conditions, then, do those who speak the language as a first language (L1) also adopt the new (non-L2-difficult) variety?Such questions can be tackled in a mathematical modelling approach, in which predictions can be generated and the range of possible behaviours of dynamical systems understood in full detail.
This paper presents a first sketch of such an approach, beginning from simple and general assumptions about language learning and social interaction.Building on the variational model of language acquisition and change (Yang 2002), I propose an extension of this model which subsumes both L1 and L2 learning under the same general mechanism.The extended model assumes that L1 and L2 learning are fundamentally alike processes, in the sense that they both depend on general mechanisms of reinforcement (Bush & Mosteller 1955).In the extended model, however, adult L2 learning is potentially disadvantaged in the sense that these learners face a difficulty in the case of certain linguistic features acquired unproblematically by children.To study the population-dynamic consequences of L1 and L2 learners interacting in the same population, the extended model considers a mixed speech community.This yields testable predictions about the eventual fates of linguistic features exposed to contact situations of different kinds, including predictions about the stable states attained, conditioned on model parameters such as the proportion of L2 speakers in the population and the relative strength of the L2-difficulty of the linguistic variable involved.
The model allows a number of qualitatively different evolutionary outcomes: complete extinction of the L2-difficult feature, temporary loss of the L2-difficult feature followed by its reacquisition, or stable variation between two grammars, one of which carries the L2-difficult feature and one which does not.These theoretically predicted outcomes correspond to different patterns observed-to varying degrees of detail-in the empirical record.To test the model, I apply it to two empirical case studies: the deflexion of the verbal paradigm in Afrikaans (Trudgill 2011) and the erosion of null subjects in Afro-Peruvian Spanish (Sessarego & Gutiérrez-Rexach 2018).
Model parameters are estimated from empirical data where possible, and intuitive arguments for plausible orders of magnitude are provided where direct empirical calibration is impossible.
The model is found to correctly characterize the broad outlines of both empirical developments, although challenges arise both from the lack of relevant empirical data and from a number of idealizing assumptions that must be made in order to simplify the model's mathematics in the interest of retaining analytical tractability.
It should be emphasized that the model presented and analysed in this paper is concerned with only one potential external ("environmental") predictor of contact-induced change, namely the proportion of L2 speakers in a speech community.Other possible external factors conditioning language change have been considered in the literature, ranging from population size (Lupyan & Dale 2010;Nettle 2012;Koplenig 2019) to social network geometries (Ke & Gong & Wang 2008;Fagyal et al. 2010;Kauhanen 2017;Josserand et al. 2021) or combinations of such parameters (Trudgill 2004).Although the present model sets such effects aside, this is not to deny their importance.Indeed, further variables may be introduced to the modelling framework whose broad outlines are laid down in this paper, in future work.It is quite likely that in a complex process such as language change, multiple factors interact in intricate ways.In fact, this is already the case for the factors considered in the present paper, L2 speaker proportion (a demographic parameter), strength of L2-difficulty (a psychological parameter) and parsing advantage (a linguistic parameter), as will be discussed at length below.
The paper is structured as follows.Section 2 provides a brief description of the two empirical test cases.Language learning, both L1 and L2, is discussed in Section 3, with the goal of characterizing the terminal state of a population of learners after a long learning period in a stationary random learning environment.This terminal state is then utilized to define intergenerational population dynamics in Section 4, in which the equilibria of the resulting nonlinear system of equations are studied.Relevant linguistic and demographic parameters are calibrated in Section 5 using available empirical data, followed by application of the mathematical model.
Section 6 concludes, discusses some of the limitations of the approach, and provides a few directions for future research.To streamline discussion throughout the paper, most mathematical derivations are collected in a separate Appendix, available as a supplementary file.

Afrikaans verbal deflexion
Germanic languages typically exhibit complex verbal morphology, with numerous forms found throughout the verbal paradigms, often without a transparent one-to-one mapping between form and meaning: (1) This deflexion of verbal paradigms is just one manifestation of a larger-scale morphological regularization that separates Afrikaans from its ancestor (see Ponelis 1993).
The specific and rather unique setting in which Afrikaans arose has prompted scholars to discuss its origin extensively.There are three major competing theories: the superstratist hypothesis, according to which the structural features of Afrikaans arose from Dutch in a process of internal development; the interlectalist hypothesis, according to which the structure of Afrikaans is explained by competition between multiple different dialects of Dutch that were brought to the Cape; and the creolist hypothesis, according to which Afrikaans is a creole or semicreole that arose from the interaction between the colonizers, the native Khoekhoe population, and slaves brought by the Europeans from other parts of Africa and parts of Asia (Roberge 2002).It is clear, however, that contact must have played some role in the formation of the language; the specific classification of Afrikaans as a creole, semicreole or other kind of contact variety is less relevant here.
If the three main population constituents of the Dutch Cape Colony were in extensive linguistic contact, as seems likely (Ponelis 1993;Roberge 2002), then an amount of L2 learning will have taken place: both the native Khoekhoe and the imported slaves would have had to learn at least a limited amount of Dutch to communicate with the colonizers.The (adult) L2 Dutch spoken in the colony would form part of the input of the following generations of L1 learners, and through this mechanism of nativization of L2 output, changes could stabilize as part of the language that was in the process of formation (Trudgill 2011).overall accuracy.This suggests that verbal morphology is an L2-difficult feature to acquire (irrespective of the learner's L1) and may thus favour paradigmatic levelling (regularization) when a significant number of adult L2 learners are involved in the contact situation.

Erosion of null subjects in Afro-Peruvian Spanish
Consistent null subject (NS) languages such as standard forms of Spanish and Italian exhibit the omission of non-emphatic, non-contrastive referential subject pronouns in finite clauses (see Roberts & Holmberg 2010), illustrated here for Spanish: 1 On the loose-knit nature of early Dutch Cape society, Ponelis (1993: 3) (Ponelis 1993: 25).Secondly, it is suspected that childcare was in many cases left to the responsibility of slave or Khoekhoe women, who "transferred to them their own approximate (broken) Dutch" (Ponelis 1993: 8).Finally, the Cape was not a plantation colony but rather one in which slaves were distributed relatively uniformly across the population (Ponelis 1993: 12).
(4) ∅ Hablo español.speak Spanish 'I speak Spanish.' In some contexts, such as with expletive subjects, a null pronoun is strictly obligatory: ( Rates of subject expression differ across varieties of Spanish, with American varieties exhibiting significantly higher rates of overt subjects compared to Peninsular Spanish: Martínez-Sanz (2011: 195) reviews the available literature and shows that these rates range from as low as 12% in Valladolid, Spain, to 60% in San Juan, Puerto Rico.In particular, it has been suggested that a number of Afro-Hispanic Languages of the Americas (AHLAs)-languages that emerged from contact between Spanish and African languages in the Americas in colonial settings-attest mixed 2 pro-drop systems (Sessarego & Gutiérrez-Rexach 2018).These languages not only employ overt subjects where Peninsular Spanish uses null subjects (9), they may also exhibit features of both NS and non-NS languages simultaneously (10): (9) Afro-Peruvian Spanish (Sessarego 2014: 384) Paco fue a casa.Él se tomó una botella de cerveza y después él Paco went to home he himself took a bottle of beer and afterwards he se fue al bar de fiesta.himself went to the bar of party 'Paco went home.He drunk a bottle of beer and then went to the bar to have fun.' (10) Dominican Spanish (Toribio 2000: 319) Yo no lo vi, él estaba en Massachusetts, ∅ acababa de llegar.I not him saw he was in Massachusetts finished of arrive 'I did not see him, he was in Massachusetts, he had just arrived.'One way of making sense of such data is to hypothesize that speakers have access to more than one grammar simultaneously (Kroch 1989;Yang 2002)-in this case, to both a NS and a non-NS version of Spanish-and employ these competing grammars variably and probabilistically (Toribio 2000;Sessarego & Gutiérrez-Rexach 2018; for an extensive variationist account of subject pronoun expression in Spanish-English bilinguals in New Mexico, see Torres Cacoullos & Travis 2018).
Empirical learning data are again relevant in suggesting explanations for the observed patterns.Margaza & Bel (2006) tested Greek adult L2 learners of Spanish on the expression of NS pronouns; the results demonstrate that L2 learners tend to overuse overt pronouns compared to native speakers, even when the L1 is a NS language.Specifically, in a cloze task in which participants had to either express or omit a subject pronoun, intermediate L2 learners employed null subjects 52% of the time in matrix clauses and 81.66% of the time in subordinate clauses, compared to 85.50% and 98.13% in advanced learners, and 96.00% and 100% in native controls (Margaza & Bel 2006: 92).These findings suggest that L2 learners-with the possible exception of those at an advanced proficiency level-struggle with the precise pragmatic conditioning of the null/overt contrast in a NS language (for converging evidence from different language pairings, see Bini 1993;Pérez-Leroux & Glass 1999).They thus support the notion that a contact 2 I use the term mixed rather than "partial" (Sessarego & Gutiérrez-Rexach 2018) here, as the latter term has taken on a specific theoretical meaning in the generative literature on null subjects (see Roberts & Holmberg 2010).It is unclear, and from the point of view of the present study of secondary interest, whether Afro-Peruvian Spanish is a partial NS language in the latter sense; the relevant fact is that the language lies, in some sense, between the polar extremes of being a consistent NS language and being a (consistent) non-NS language.situation involving (imperfect) L2 learning followed by L1 nativization may help to explain the diachronic erosion of null subjects.
In particular, such an explanation is plausible in the case of Afro-Peruvian Spanish, a variety spoken mainly in the coastal regions of modern Peru (Sessarego 2014;2015;Sessarego & Gutiérrez-Rexach 2018).Afro-Peruvians-descendants of slaves who were brought from various parts of Africa and forced to work on plantations, in mines and as servants in cities from the 16th century onwards-amounted to 3.6% of Peru's population in the most recent, 2017 census (INEI 2018: 222).Although Peru abolished slavery formally in 1854, most of the Afro-Peruvian population lived in relative poverty under a semi-feudal system until as recently as the second half of the 20th century (Sessarego 2015: 79).The probable linguistic consequences of these sociohistorical facts will be discussed in detail in Section 5.

Desiderata
Previous research thus suggests that the deflexion of the Afrikaans verbal paradigm and the erosion of null subjects in Afro-Peruvian Spanish may both be traced back to an earlier contact situation involving significant amounts of L2 learning.Apart from explicating what a "significant amount" means in this context, thereby providing a unified account of what is common to both cases, a mathematical modelling approach needs to be able to account for the differences between the two situations.In the case of Afrikaans, the loss of verbal morphology is complete: all verbs are reduced to one form throughout the paradigm.In the case of Afro-Peruvian Spanish, and several other AHLAs, the loss of null subjects is incomplete, in the sense that these languages attest features of both NS and non-NS languages.Moreover, in the case of Afro-Peruvian Spanish at least, there is evidence of the younger generations moving in the direction of the NS grammar again, suggesting that the mixed NS status may not be stable; fieldwork interviews suggest that the Afro-Peruvian variety could be lost in favour of standard Peruvian Spanish "in two generations, or maybe only one" (Sessarego 2014: 397).In the ideal situation, a modelling approach would predict in what circumstances-under what combinations of model parameterseach development is likely to unfold.

Two learning algorithms
At a very general level, learning can be characterized as a stochastic process of modification of a distribution of probabilities to act in specific ways.These modifications are prompted by the outcomes of previous actions in a learning environment which supplies feedback to the learner (Bush & Mosteller 1955).The variational learning (VL) framework (Yang 2002) constitutes a linguistic interpretation of this general statement.In the simplest case, the learner must make a binary choice between two options, such as between employing null and overt subject pronouns.
The learner stores a probability of use of one of these options, p; learning consists of modifications to p in response to previous interactions between learner and environment by way of a set of operators, which mathematically speaking are simply functions applied to p to transform its value.
Most applications of VL to date have assumed Bush and Mosteller's (1955)  ).This learning algorithm makes use of two linear operators, f and g, defined as follows: where p is the probability of grammatical option G 1 , 1-p is the probability of grammatical option G 2 , and 0 < ɣ < 1 is a learning rate parameter.Operator f is applied if the learner chooses G 1 and this choice manages to parse the input received by the learner at that learning step.From the form of the operator, it is easy to see that application of this operator constitutes a reward to G 1 : the next time the learner has to make a choice between G 1 and G 2 , they are more likely to choose G 1 than before.On the other hand, if G 1 is selected but does not parse the input received by the learner, operator g is applied, disfavouring this grammatical option, meaning that the value of p is decreased.Should the learner choose G 2 , the logic of application of the operators is flipped (with only two options, rewarding G 1 is tantamount to punishing G 2 , and vice versa).
Linear reward-penalty learning has a number of attractive mathematical properties, the most important of which have to do with the expected eventual state attained by the learner.
Since learning is a stochastic process, it is impossible to predict the evolution of p over time exactly (unless we have full knowledge of the sequence of the learner's choices and the learning environment's responses at each learning step-but this is impossible except in a strict laboratory setting).However, both the expected (mean) value of p and its variance admit explicit solutions (see Bush & Mosteller 1955 and the Appendix).With increasing learning iterations the expectation, denoted by ⟨p⟩, eventually tends to the asymptotic value in the limit of infinite learning iterations.Here, π 1 and π 2 are the probabilities with which the environment punishes G 1 and G 2 ; these can be estimated in normal circumstances, as will be discussed in more detail below.Furthermore, the variance of p at an infinity of learning iterations can be made arbitrarily small by assuming that learning is sufficiently slow, i.e. if the learning rate ɣ has a small value.Thus, in the limit of long learning periods and slow learning rates, a population of learners are all expected to behave the same, converging to similar values of p at the end of learning.In effect, the behaviour of an entire generation of learners (subject to the same learning environment) can be condensed into one number, namely the limit of the expectation (12).
The empirical justification for this procedure stems from the fact that language learning typically involves very numerous learning iterations.It has been estimated that people speak around 16,000 words a day on average (Mehl et al. 2007).By conversational symmetry, we would expect people, and language learners in particular, to hear a similar number of words every day.
Language acquisition ordinarily takes place over several years; translating the 16,000-words-aday estimate into the number of tokens relevant for the acquisition of any reasonably frequently occurring linguistic variable then yields an estimate in the millions. 3Although it is possible that adult L2 learners are exposed to less input than L1 learners, the high order of magnitude of this estimate suggests that it is legitimate to focus on the learner's asymptotic behaviour at large learning iterations, instead of trying to characterize the complex stochastic dynamics of the entire learning trajectory.First pioneered by Yang (2000), this strategy has been used by numerous authors to derive population-level or diachronic consequences of sequential generations of such learners (Yang 2002 The majority of previous studies applying the VL framework have applied it to L1 learning. The idea of grammar competition has, however, been used in a handful of L2 acquisition studies (Zobl & Liceras 2005;Rankin 2014;2022), and it is possible also to harness the mathematics of the linear reward-penalty algorithm for the purposes of modelling L2 trajectories formally.
To accommodate the existence of L2-difficult features, we include a bias against successful L2 acquisition of one of the competing grammars.This may be done by replacing the operators f and g with the following pair of operators: where δ is a small positive number that quantifies the difficulty faced by L2 learners in acquiring G 1 . 4It is important to note that with this definition, the difficulty is endemic to L2 learning, in the sense that it is independent of the learning environment's responses: no matter how much the environment rewards the use of G 1 , this option will always suffer some amount of penalty, modulated by the magnitude of δ.
In the Appendix, it is shown that an equivalent asymptotic result holds for this L2 extension of the original linear reward-penalty learning scheme.As learning iteration tends to infinity, the expected value of p with operators f' and g' tends to where d = δ/ɣ represents the L2-difficulty of G 1 scaled by the learning rate parameter ɣ.
Moreover, the variance can again be made arbitrarily small by taking a small enough value of ɣ (and modifying δ so as to keep the ratio d constant).In other words, a population of L2 learners employing operators f' and g' can be described by a single number at the asymptote just as a population of L1 learners employing the standard linear reward-penalty scheme can.

Population dynamics 4.1 Motivation
It is conceptually useful to think of language change as an intertwined process of innovation and propagation.In the present model, L2 learners who acquire a lower probability of employing the L2-difficult grammar (as compared to L1 learners) are innovators.Whether and how these innovations propagate across the population depends, roughly, on how prevalent the L2 learner population is in the entire speech community.In general, interactions between individuals in a speech community are the result of a complex combination of factors involving properties of social networks, geographical distance and interaction frequency, to name but a few.Such factors can in principle be encapsulated in stochastic models of social dynamics inspired by techniques borrowed from statistical physics (Helbing 2010); however, inclusion of too many factors usually renders such models analytically intractable.Alternatively, when populations are large, stochastic fluctuations normally average out, and the system can be reduced to a simpler deterministic model, with correspondingly simpler analysis.This is the approach adopted here, developed in detail in Section 4.2.Section 4.3 provides simulation-based support for the deterministic approximation.

Deterministic approximation
Setting aside the complications arising from the full stochastic complexity of learning trajectories and interaction patterns, let us now focus on a simplified model in which learners behave according to the learning-theoretic expectations ( 12) and ( 14) and in which population size is so large that demographic noise cancels out.Technically, we consider an infinite well-mixed population of individuals in which the fraction of L2 speakers is σ and the fraction of L1 speakers is 1 -σ.Let p and q denote the probabilities of grammar G 1 in the L1 and L2 speaker populations, respectively (so that the probabilities of G 2 are 1-p and 1-q).Following Yang (2000), I assume a fraction α 1 of the output of grammar G 1 to be incompatible with G 2 , and similarly a fraction α 2 of the output of G 2 to be incompatible with G 1 .The parameters α 1 and α 2 will be called the (grammatical) advantages 5 of G 1 and G 2 in what follows; numerical estimates will be given in Section 5. Assuming for simplicity that learners sample linguistic input from their environments at random, the penalty probabilities of the two grammars can then be expressed in the following simple forms: (15) To unpack this, note that the first term on the right hand side of the first equation, for instance, represents the event of the learner interacting with an L1 speaker (1-σ) who employs grammar G 2 (1-p) and utters a sentence which falls among those not compatible with G 1 (α 2 ).Similarly, the second term on the right hand side of the second equation represents the case of the learner interacting with an L2 speaker (σ) who employs grammar G 1 (q) and utters a sentence not compatible with G 2 (α 1 ), and similarly for the remaining two terms.
Following Yang (2000), we can now assume that the input to the language acquisition process of the (n+1)th generation of speakers is constituted by the linguistic output of the nth generation.Assuming learners reach the learning-theoretic asymptote as argued in Section 3, this implies setting ( 16) with the expectations given by ( 12) and ( 14).Expanding these with the help of the penalties (15), we arrive at the pair of equations ( 17) To reduce the number of parameters in these equations, it is useful to divide both the numerators and the denominators on the right hand side by α 2 (on the assumption that α 2 ≠ 0).This condenses 5 Nomenclature is sometimes confusing in the literature, with some authors using the term "fitness" for these quantities and reserving the term "advantage" for derived notions such as difference in fitnesses.I here follow the original terminology put forward in Yang (2000).Ultimately (see below), the relative fitness of the competing grammatical options will be measured using the ratio α = α 1 /α 2 , with greater (smaller) than unity values signifying that G 1 (G 2 ) has more advantage than its competitor.
the relation of the two advantage parameters α 1 and α 2 into a single number, the advantage ratio The quantity D = d/α 2 represents the relative L2-difficulty of grammar G 1 scaled by the advantage of grammar G 2 .
To recap, we have assumed grammatical competition between two options G 1 and G 2 , the first of which is assumed to incur an amount of L2-difficulty, quantified by the ratio d = δ/ɣ of raw L2-difficulty δ to underlying learning rate ɣ.The two learning algorithms of Section 3 lead to a mixed population of L1 and L2 speakers whose dynamics are described by the pair of equations ( 18), on the assumption that speakers mix randomly.The dynamics depend on three model parameters: • α = α 1 /α 2 controls the ratio of grammatical advantages, indicating how much (plain) advantage the L2-difficult grammar G 1 has in relation to grammar G 2 ; • D = d/α 2 represents the L2-difficulty of grammar G 1 in relation to the grammatical advantage of G 2 ; • σ gives the proportion of L2 speakers in the population.
Intuitively, one expects the (diachronic) loss of the L2-difficult grammar G 1 to be more likely for higher values of D and σ, as well as for lower values of α.These expectations are borne out by exact mathematical analysis, as will be described next.
In (18), the variables p and q represent the relative frequency of the L2-difficult grammar G 1 in the L1 and L2 speaker populations, respectively.We would now like to understand the range of behaviours this system is capable of.Each pair of values (p,q) with 0 ≤ p,q ≤ 1 is a possible population state, or in other words, the state space of the system consists of the unit square [0,1] Of particular interest is the eventual fate of the population under the above modelling assumptions.In general, some points (p,q) of the state space may be expected to be attractors, drawing the population state to themselves over time, while other points may repel the population state.These configurations of the state space correspond to different empirical outcomes, as will be demonstrated next.
In the Appendix, it is proved that the system (18) can display one of two eventual outcomes for any fixed combination of model parameters α, D and σ.These are: I. The L2-difficult grammar G 1 is used with some probability in one or both speaker groups.
Mathematically, the system has two equilibria, the origin (p,q) = (0,0) and a further, non-origin state (p * , q * ) ≠ (0,0).The former, however, is unstable, while the latter is asymptotically stable (a sink).The population will be attracted to (p * , q * ) over time, with the consequence that the L2-difficult feature is retained in the language-with frequency p * in the L1 speaker population, and with frequency q * in the L2 speaker population.
Consequently both the L1 and L2 populations eventually speak G 2 exclusively.
Of these outcomes, the latter corresponds to the proposed mechanism of L2 mutation followed by L1 nativization.The passage from phase I to phase II is governed by a complex interaction of the three parameters, one that however can be solved analytically.First, if α < 1, so that grammar G 1 has less advantage to begin with, phase II is predicted.This is unsurprising: if both grammatical fitness in the VL sense and L2-difficulty in the Trudgillian sense conspire against a grammatical option, no force exists to sustain it, and the option is predicted to disappear. 6The same outcome holds for the special case α = 1, in which the grammatical advantages are equal.
For α > 1, the situation is more complicated.If α ≥ D + 2, phase I is predicted for any combination of parameter values.In this case, the pure grammatical advantage enjoyed by the L2-difficult grammar G 1 is so high that no amount of L2 learning can suppress it entirely-the L1 speaker population will continue to speak G 1 at some non-zero frequency.For 1 < α < D + 2, however, the proportion of L2 speakers σ acts as a bifurcation parameter.For values of σ below a critical threshold phase I is predicted.In this case, the proportion of L2 learners in the population is not high enough to completely suppress G 1 .For values of σ exceeding this threshold, however, phase II is predicted: the non-origin equilibrium (p * ,q * ) vanishes and the population converges to (p,q)= (0,0), with both L1 and L2 speakers now employing grammar G 2 exclusively.In other words, as the proportion of L2 learners in the population grows, the speech community experiences a phase transition from phase I to phase II (Figure 1 and Table 1). 7 6 The case α < 1 is also empirically uninteresting in most situations, as it begs the question how grammar G 1 could have established itself in the L1 population in the first place, if it has less grammatical advantage than its competitor G 2 . 7A reviewer asks why the proportion of L2 speakers σ is chosen as the critical parameter, rather than α or D. Indeed, from a mathematical point of view the choice is arbitrary: as long as the inequality σ > σ crit is satisfied, loss of the L2-difficult feature is predicted.So, for instance, we may hold α and D fixed and vary σ to cross from one phase to the other, but we may equally well hold α and σ fixed and vary D to undergo that phase transition.However, from a

Condition
Fate of L2-difficult grammar G 1 substantive point of view, the proportion of L2 speakers σ is the only parameter that routinely changes its value-the advantage ratio α stays fixed as long as no other grammatical changes occur in the language, and δ (and, by extension, D whenever α 2 is fixed) is presumably constant as it reflects a universal psychological bias.Hence it is natural to treat σ as the critical control parameter.

Finite simulations
The above deterministic model turns on the assumption that learners are well-described by the learning-theoretic asymptotic expectations ( 12) and ( 14), as well as on the assumption that interactions between speakers are sufficiently random so that stochastic fluctuations in the population average out.These idealizing assumptions were made deliberately, to unleash the greater explanatory power of analytically soluble models, compared to simulations (see McElreath & Boyd 2007: 4-11).Nevertheless, the assumptions can be debated, and the question of how a finite, stochastic system would behave is certainly an interesting one.Although it is impossible to explore the entirety of the full stochastic model's parameter space by way of simulations in this paper, I here report the outcome of proof-of-concept simulations which lend tentative support to the deterministic approximation analysed in Section 4.2.
The choice to abstract away from the individual-level stochastic dynamics of language acquisition was defended by way of an argument from the typically slow pace of language acquisition in Section 3; this entails assuming that learners receive a large amount of input during their learning period and also make only small adjustments at each presentation of input token, so that the learning rate parameter ɣ has a small value.In the absence of direct empirical estimates of ɣ, it is legitimate to worry about the effects that high learning rates may have on the ensuing dynamics.To explore this, ten Variational Learners were simulated for varying values of ɣ, exposed to a learning environment in which the relative frequency of grammar G 1 was 0.5 and the advantage parameters had the values α 1 = 0.25 and α 2 = 0.2.It was moreover assumed that grammar G 1 incurs an L2-difficulty of d = 2.Of the ten learners, five were randomly assigned to be L1 learners; the other five were L2 learners subject to the L2-difficulty of G 1 .Each learner received 100,000 input tokens over their learning period and started from a randomly drawn initial value of p or q.
With these assumptions, the learning-theoretic expectations ( 12) and ( 14) predict an average probability of p = 0.56 for grammar G 1 in the L1 learner population, and an average of q = 0.06 in the L2 learner population.Figure 2A presents the simulation results.As expected, the predicted averages describe the population averages.On the other hand, for higher learning rates (larger values of ɣ), a larger residual variance about this expectation is observed across the population.It turns out, however, that this residual inter-learner variance, even when large, need not affect inter-generational, diachronic developments.Consider Figure 2B, which tracks the inter-generational trajectory of p and q across 15 generations of learners, each generation consisting of 100 learners (half L1, half L2), starting from p = q = 0.99 in generation 0 and subject to the same parameters as above.To channel input from generation n to generation n+1 in a reasonably realistic way, each learner in generation n+1 was randomly drawn two "parents" from generation n; each learner only received input from its parents.The consequence is that, particularly in the intermediate stages of the diachronic development, and particularly when the learning rate parameter ɣ has a high value, there is much variation across speakers in the community.However, since the ultimate force driving the evolution of p and q is deterministic in nature (a difference in the advantages of the competing grammars, combined with L2-difficulty of one of the options), this stochastic noise is not sufficient to disturb the development.With the above parameters, equation ( 19) implies a critical L2 proportion threshold of σ crit = 0.3; since the actual proportion of L2 learners in the simulation is 0.5, we expect grammar G 1 to be driven to extinction over inter-generational time, i.e. the values of p and q to converge toward zero.This is exactly what is observed, even for noisy (high) values of the learning rate parameter.Moreover, the deterministic trajectories computed from ( 18), shown in Figure 2B as the connected curves, are broadly consistent with the simulated data in each generation.In fact, with higher learning rates, the predicted equilibrium (p,q) = (0,0) is attained quicker.).The boxplots characterize the simulated data, i.e. the distribution of p (or q) across all learners at the end of their learning period.The solid curves give the deterministic prediction using equation ( 18).For simulation parameters, see text.

Interim summary
To summarize the results of the foregoing sections, our mixed population of L1 and L2 speakers is capable of two qualitatively different kinds of long-term behaviour.The parameter α = α 1 /α 2 expresses the ratio of the advantages of the two competing grammars.If α < 1, the state (p,q) = (0,0), in which both the L1 and L2 speaker populations use G 2 to the complete exclusion of the L2-difficult grammar G 1 , is a stable equilibrium.If α > 1, this state is either stable or unstable depending on whether σ, the proportion of L2 speakers in the population, exceeds a critical threshold σ crit .The value of σ crit , in turn, depends on the magnitudes of α and D = d/α 2 , the latter expressing the relative L2-difficulty of grammar G 1 , scaled by the grammatical advantage of G 2 .
In this scenario, two evolutionary outcomes are possible: either total extinction of the L2-difficult grammar G 1 (when σ > σ crit ), or stable variation between G 1 and G 2 (when σ < σ crit ).
The three model parameters α, σ and D are all estimable in principle from empirical data: α from the frequencies of occurrence of different types of linguistic constructions, σ from population censuses, and D from L2 learning data.Suitable data of the last kind are presently lacking, 8 but I will next attempt calibration of the first two parameters in the specific cases of Afrikaans deflexion and the erosion of null subjects in Afro-Peruvian Spanish.Here it should be borne in mind that all parameter estimates can be only approximate: some degree of uncertainty necessarily pertains to corpus estimates of grammatical advantages, and historical population data are wrought with problems well known to historians and demographers.I will therefore offer the parameter estimates as an approximate starting point, with the proviso that they come with an unknown degree of statistical uncertainty.This has important consequences on what we take the very goal of modelling to be.Even though exact quantitative statements about the equilibrium state of the linguistic system may be out of reach, it is still possible to infer something about the qualitative outcome of the contact situation in each empirical test case.

Calibrating grammatical advantages
In the case of Afrikaans verbal deflexion, the relevant competition is between a grammar that has person and number distinctions in the verbal paradigm (G 1 , corresponding to Dutch) and a grammar that doesn't (G 2 , corresponding to Afrikaans).Parsing failure occurs when the learnerlistener attaches the wrong interpretation (wrong person or number) to the surface form uttered by the speaker.Since Dutch is a non-NS language, person and number can always be inferred from the nominal domain, even if verbal inflection is eroded.On the face of it, this implies that 8 What is required is access to individual-level longitudinal production data from L2 learners over a substantial stretch of their learning trajectory, against which theoretically predicted learning curves could be fit.If learning curves with δ > 0 fit such data better than curves with δ = 0, we would have positive evidence for the L2-difficulty.
the raw grammatical advantages of the two grammars are equal, so that α = 1.In practice, it may be argued that factors such as channel noise may lend the grammar with verbal inflection (G 1 , i.e.Dutch) a slight advantage over the inflectionless grammar; on the other hand, redundant agreement has been suggested to present difficulty for L2 learners (Kusters 2008).However, the magnitudes of these effects are difficult to estimate outside the context of a strict laboratory study.In what follows, I will assume that such effects are negligible at the population level, and thus proceed with the maximum-parsimony assumption that α = 1, i.e. that neither grammar is more advantageous than its competitor.
The case of null subjects in Afro-Peruvian Spanish is markedly different.The NS grammar (G 1 ) is incompatible with overt expletive subjects (20), while the non-NS grammar (G 2 ) is incompatible with null thematic subjects ( 21): (20) Overt expletive subject pronouns (* for G 1 ) Ello llueve.it rains 'It's raining.' (2019: 296, 299) estimate the grammatical advantage of a NS grammar to be about α 1 = 0.7 and that of the corresponding non-NS grammar to be only about α 2 = 0.05; that is to say, a difference of over an order of magnitude, implying an advantage ratio of α = 0.7/0.05= 14 in favour of the NS grammar. 9 This difference between the two case studies is important.In the case of Afrikaans, the fact that the two grammars are equiadvantageous means that, in the absence of any L2 learning, the population of speakers is stable (though not asymptotically) at any value of p: if σ = 0 and α = 1, then the value of p is always at equilibrium. 10Thus the only forces that could shift the state of the 9 If learners of a NS grammar are learning a syntactic parameter that controls a cluster of surface features, as classical formulations of the null subject parameter expect (Roberts & Holmberg 2010; but see Simonenko & Crabbé & Prévost 2019 for evidence to the contrary), then there will be further cues in the learner's input that either reward or penalize the two competing settings (such as rich vs. poor agreement inflection, or the possibility vs. impossibility of free inversion).Such complications fall outside the scope of the present paper but should be explored in future work. 10To see this, we take the first equation from (18) and set σ = 0 and α = 1; the resulting expression p n+1 = p n means that the value of p will not change from generation to generation.speech community (again, assuming the absence of L2 learners in the population) would have to be stochastic in nature.Once L2 learners facing an L2-difficulty with one of the grammars are introduced into the population, the equilibrium will shift: any amount of L2 learning implies that the origin (p,q) = (0,0) becomes the system's only attractor (Table 1).Were those L2 learners to be removed from the population at a later stage, the speech community would then again assume its previous quasistable nature and come to rest at whichever value of p the L2 learning situation brought it to (this will correspond to p ≈ 0 if enough time has elapsed; see Section 5.4).Column for Khoekhoe includes people of mixed background; these data are not available prior to 1798.

Year
The case of null subjects is radically different.Here the original, L2-difficult grammar G 1 is much more advantageous than its competitor, G 2 .Not only does this mean that the proportion of L2 learners in the population would have to be relatively high for G 1 to be completely replaced by G 2 ; the long-term prediction is also that if those L2 learners were to be removed, the population would drift back to the equilibrium p=1 whose stability is guaranteed by the asymmetry in grammatical advantages in the absence of any L2 learning.(Bowser 1974: 339-341), together with estimated range for the fraction of people speaking Spanish as L2, σ (see text).Mixed AmE = mixed American-European background; Mixed AfE = mixed African-European background.

Calibrating demographics
Population figures for Colonial Peru are hard to come by; however, useful data exists for Lima in the early period, specifically the first half of the 17th century.Interpretation of the demographics in Table 3, from Bowser (1974), is complicated by the fact that the various sources from which the demographic data are drawn employ different classificatory principles, sometimes pooling Spaniards and people of mixed European-American background, or people of African and people of mixed African-European background, together.Sessarego (2015: 93) argues that people of mixed background were most probably bilingual in Spanish from childhood (as one of their parents would have been Spanish).They should therefore be exempted from any estimates of the proportion of adult L2 learners in the population.The interval estimates given in Table 3 were computed with the assumption that only the Black and Indigenous groups would contribute adult L2 learners to the population; the left endpoints of these intervals again correspond to the conservative assumption that 50% of these people spoke Spanish as L2, while the right endpoints correspond to the upper bound assumption that 100% of them did. 11The resulting estimates of σ suggest a steadily, if slowly growing proportion of L2 speakers in the population, with figures roughly similar to those found in the Dutch Cape Colony.
I am not aware of useful demographic data for later stages of Colonial Peru; a census was carried out in 1725-1740, but it contains no information on slaves (Pearce 2001).Useful information is 11 To compute the estimates for the year 1600, for which Bowser's (1974) data pools people of African and mixed African-European heritage together, I have used the overall fraction of African to African-European for the remaining years (0.92).This yields an estimate of 6,091 African and 530 mixed African-European in the year 1600.
available, however, on the population dynamics of the slave population on individual plantations or haciendas in the later period; while such data can tell us little about the relative proportion of L2 speakers in the population, they are useful in illuminating further developments following the initial stages of colonization.Drawing upon data in Cushner (1980), Sessarego (2015: 107) reports an estimated yearly net growth of 1.4 slaves per hacienda for the period 1710-1767; yet at the same time, the yearly natural growth (estimated from records of slaves' births and deaths, which were kept on haciendas run by Jesuits), is negative at -2.7 per hacienda.In other words, an average of 4.1 slaves must have been imported yearly, per hacienda, in this period to match the net growth rate.It follows that, as late as the second half of the 18th century, there must have been a steady influx of new slaves, and hence of potential adult L2 learners of Spanish, into these regions.

Predictions: Afrikaans
The main linguistic features of Afrikaans are estimated (based on extant written records) to have been in place by the end of the 18th century (Roberge 2002: 83).Since the Dutch founded their permanent colony in 1652 (Guelke 1979), this leaves around 150 years for the development of Afrikaans as a language separate from Dutch.Assuming one generation to correspond to roughly 30 years (Tremblay & Vézina 2000), the development is thus estimated to have occurred over about five generations of speakers.More precisely, five generations supplies an upper bound for the development; the linguistic change may have happened faster, too, but this is impossible to determine given the scant textual record in the earlier phases.
In the well-mixing, infinite-population setup adopted in Section 4.2, the fact that the grammatical advantages are in balance in this case (α = 1) means that, in theory, even one adult L2 learner would suffice to drive the L2-difficult grammar to extinction, given enough time, irrespective of the absolute size of the population.The crucial question then concerns the time scale of the development: for fixed grammatical advantage ratio α and L2-difficulty d, the proportion of L2 speakers σ directly controls the time to extinction of G 1 , i.e. how long it takes for the population to converge to the attractor (p,q) = (0,0) from some initial state (p 0 , q 0 ) = (1, q 0 ).(The initial probability in the L2 population, q 0 , is of course an unknown.)Since the strength of L2-difficulty d is also unknown, the best one can do is to compute these times-to-extinction for several combinations of the parameter values, thereby hopefully establishing at least a subset of the parameter space that predicts the empirically observed facts.
Iterating the pair of equations ( 18) repeatedly, one can find the number of generations it takes for the population to converge to the attractor (p, q) = (0,0) from various initial conditions (p 0 ,q 0 ) = (1, q 0 ), for various selections of d and for σ = 0.2 and σ = 0.6 (roughly corresponding to the endpoints of the interval estimates in Table 2).Convergence to the attractor is here defined, somewhat arbitrarily, as the values of both p and q being below 0.001 (0.1% use of G 1 ).
Figure 3 plots the passage times found using this procedure.From these results, it is clear that convergence to the attractor in 5 generations is possible.For proportions of L2 speakers on the order of σ = 0.6, a strength of L2-difficulty greater than about d = 0.5 is sufficient; for lower proportions on the order of σ = 0.2, L2-difficulties in excess of about d = 5 guarantee convergence in up to 5 generations.In future work, it would be important to estimate plausible values of d using independent evidence, so that model fit can be evaluated in a more principled manner (subject to fewer researcher degrees of freedom).Having said that, the above demonstration shows that the model makes the development of Afrikaans possible (even if not necessary).q 0 = 0.1 q 0 = 0.5 q 0 = 0.9

Predictions: Afro-Peruvian Spanish
In the case of Afro-Peruvian Spanish, the situation is radically different: in this case, it needs to be explained why the null subject grammar was never completely overthrown by the corresponding grammar without null subjects, which would be favoured by L2 learning.Recall that the critical value of the bifurcation parameter σ, the proportion of adult L2 learners in the population, was found in Section 4 to be where α = α 1 /α 2 is the ratio of the advantages of the two grammars and D = d/α 2 represents the relative L2-difficulty of G 1 , scaled by the advantage of G 2 .The lower bound of σ crit occurs as On the other hand, in Section 5.2 the value α = 14 was estimated in this particular case.This corresponds to a σ ∞ crit of 13/14 ≈ 0.93.In other words, the proportion of L2 speakers in the population would need to be at least about 0.93-and possibly higher, if D turned out to have a small value-for the grammatical advantage enjoyed by the L2-difficult grammar G 1 to be overcome and for the origin (p,q) = (0,0) to be an attractor.Since based on the available demographic data such a high proportion of L2 speakers never obtained in Colonial Peru (Section 5.3), we infer that null subjects were never about to be completely lost in the Afro-Peruvian variety of Spanish.
For any given non-zero σ, however, a stable, attracting state is implied in the interior of the state space (see again Section 4.2).This would appear to correspond, qualitatively, to the linguistic classification of Afro-Peruvian Spanish as a mixed NS language (Section 2.2)-that is, as a variety sometimes employing, sometimes not employing, null subjects.It is worth pointing out, however, that as the proportion of L2 speakers decreases over time, this interior equilibrium tends to a stable rest point on the boundary of the state space, crucially with the property p = 1 (so that L1 speakers have reverted to using null subjects all the time).This may help to explain the reported instability of the mixed NS status of Afro-Peruvian Spanish-the observation that the younger speakers in these communities are turning towards the standard Spanish grammar, with full null subjects (Sessarego 2014).Although Sessarego (2014) attributes this development mostly to sociolinguistic causes-to the prestige enjoyed by standard coastal Peruvian Spanishthe above considerations suggest an alternative, or at least complementary analysis: the loss of overt subjects results from the fact that there are fewer L2 speakers of the variety in the speech community, leading to fewer constructions totally lacking null subjects in the input data based on which L1 learners acquire their variety.

Conclusion and outlook
This paper has presented a mathematical model of contact-induced linguistic changes in which adult L2 learning plays a leading role.Fairly lenient assumptions about language learning were used to characterize the terminal state of learners exposed to a given learning environment; this terminal state was then used to derive a model of the population dynamics of a mixed population of L1 and L2 speakers.Focusing on a deterministic approximation of the full stochastic model allowed us to derive formal results about the equilibria and bifurcations of the system in relation to three main model parameters: α, a measure of how much grammatical advantage the L2-difficult grammar G 1 has over its competitor G 2 ; D, a measure of the strength of the L2-difficulty of G 1 ; and σ, the proportion of L2 speakers present in the population.It was shown that the system has either one or two equilibria: either the point (p,q) = (0,0), at which both the L1 and L2 This seems likely, for instance, for null subjects in cases such as that of Afro-Peruvian Spanish.
On the other hand, in the case of variables where little to no advantage is enjoyed by either competing variant for purely linguistic reasons (such as verbal inflection in non-NS languages), those variables may be very sensitive to external factors such as L2 learning, with fairly low values of σ sometimes sufficing to set the language on the course to lose the feature.
This paper has discussed one potential mechanism of contact-induced change.It assumes that L2 learners introduce a "linguistic mutation" to a speech community; this mutation then spreads as the primary linguistic data of L1 learners changes.Changes are cumulative and take place, typically, over extended inter-generational timescales.It is important to note that this is, however, only one possible mechanism that may generate the types of diachronic developments here studied.An alternative view holds that changes propagate as L1 speakers accommodate to the language use of L2 speakers in an act of foreigner-talk (Valdman 1981;Atkinson & Smith & Kirby 2018).It is likely, in fact, that both mechanisms are at play at least to some extent in the real world.What is clear in any case is that the primary linguistic data of L1 learners will change regardless of the specific mechanism; it is this shift in input which, ultimately, secures propagation of innovations.
Future modelling work can, naturally, look into the effects of introducing more intricate mechanisms of skewed input.In a similar vein, it would be desirable in future work to model other aspects of L2 acquisition in more detail.In the present paper, universal L2-difficulty is the driving force of innovation.This difficulty is assumed to apply to all L2 learners in a similar way, regardless of the learner's L1.Although evidence for such a universal psychological difficulty was cited above, the existence of transfer effects in L2 acquisition is also well-documented in the literature (Schwartz & Sprouse 1996).In extensions of modelling work of this kind, such effects may be studied alongside universal biases.A related point is that, in the current framework, L1 and L2 acquisition are identical up to the effect of L2-difficulty; there is, for instance, no way in which L2 learners might sometimes outperform L1 learners ("positive transfer").Such extensions could be considered in future research.
At the population-dynamic level, future work should explore in detail the effects of modelling the stochastic dynamics of interacting populations explicitly-beyond the proof-of-concept simulations reported in Section 4.3.The great benefit of studying the deterministic limit of the model, in abstraction of stochastic fluctuations, is that strong analytical results become available.
The bifurcation threshold identified in this paper is one such result: for any given combination of model parameters-proportion of L2 learners, extent of L2-difficulty, and advantage ratio-the model predicts either complete change or only partial change or no change at all.This increases the falsifiability of our theory, as all parameters can in principle be estimated from data and prediction compared against historical outcome.Should it turn out that the model makes the wrong predictions, some of its assumptions can be modified.For this reason, acquisition of further empirical test cases is an important desideratum.Although much current work into the role of demographic factors in language change is typological, synchronic and correlational in nature, diachronic data are essential in evaluating mechanistic models of change.
The results of this paper also add to a growing body of literature illustrating the utility of the Variational Learning model of linguistic variation and change.In the future, it will be important to move in the direction of more direct tests of key model ingredients, such as the learning rate parameter.Equally important is the balanced consideration and development of alternative formal models and, once the empirical predictions of each model have been figured out, the rigorous statistical comparison of competing models vis-à-vis empirical data.

Figure 1 :
Figure 1: Orbit diagram of the system showing the values of p (top row) and q (bottom row) at the stable equilibrium, for various values of advantage ratio α (columns), L2-difficulty D (horizontal axis) and proportion of L2 speakers σ (vertical axis).The dashed line supplies the bifurcation boundary σ crit : the L2-difficult grammar G 1 is extinct in each speaker population above this line, but coexists with grammar G 2 below it.

Figure 2 :
Figure 2: Finite simulations of the model.(A) Learning trajectories of 10 individual learners, half of them L1 and the other half L2 learners, for different learning rates ɣ.First 10,000 iterations shown only, sampled every 100 iterations for increased clarity; behaviour after this initial transient period is identical.(B) Diachronic trajectory of 15 generations of learners, 100 learners in each generation (half L1, half L2).The boxplots characterize the simulated data, i.e. the distribution of p (or q) across all learners at the end of their learning period.The solid curves give the deterministic prediction using equation (18).For simulation parameters, see text.
expect, thematic subjects far outnumber expletive subjects in discourse.Drawing on data on their frequencies of occurrence from multiple sources, Simonenko & Crabbé & Prévost

Figure 3 :
Figure 3: Time of convergence to attractor in the Afrikaans case study, for various values of L2-difficulty d, proportion of L2 speakers σ and initial probability of L2-difficult grammar in the L2 speaker population q 0 .

Table 1 :
retained Eventual outcome of contact situation, for different combinations of advantage ratio α = α 1 /α 2 , normalized L2-difficulty D = d/α 2 and proportion of L2 learners σ.The critical value of the latter is given in (19).

Table 2 ,
from Giliomee & Elphick (1979), gives an overview of the population of the Dutch Cape Colony from 1670 to 1820, the period relevant for the emergence of Afrikaans.Although it is evident from these figures that slaves, freed slaves and the indigenous peoples represented a significant fraction of the population at each stage, translating these data into figures of the likely proportion of L2 learners at the different time points is non-trivial.First, no data exist for the native population before 1798.Secondly, not all of the non-European population would be L2 learners: perhaps some of them would not have needed to learn Dutch, but even more importantly, at some point part of this population will have begun to acquire Dutch as an L1, or at least as a bilingual L2 in childhood.For these reasons, I have computed fairly wide interval estimates for σ, the proportion of (adult) L2 speakers in the colony, given in the rightmost column of

Table 2 .
The left endpoint of each interval represents the conservative estimate that half of the non-European population would have spoken Dutch as an L2; the right endpoint gives the absolute maximum, on the (quite certainly unrealistic) assumption that the entire non-European population spoke Dutch as an L2.