1 Introduction

One of the challenges for a theory of phonotactics is recognizing that constraints can hold inside morphemes but be lifted at morpheme boundaries (Trubetzkoy 1939; Chomsky and Halle 1968; et. seq.). This situation is quite common: for example, in English, the cluster [md] is not found inside morphemes, but it is allowed in suffixed verbs such as “hemm-ed”. Likewise, when it comes to nonlocal phonological interactions, some languages respect the relevant constraints in any phonological word, but it seems to be equally if not more common for nonlocal phonotactics to apply differently inside vs. across morphemes. These kinds of patterns present an interesting learnability problem: if a learner attends to phonological words only, the relevant constraints may be violated, so how, if at all, do speakers arrive at the knowledge of correct constraints that hold morpheme-internally?

Our paper investigates this question in a study of nonlocal phonological interactions in Bolivian Aymara. Within Aymara morphemes, plain-aspirate, plain-ejective and heterorganic ejective-ejective combinations are described as restricted (see (1)), though these combinations may arise across morpheme boundaries (see (2)):

(1) Aymara laryngeal phonotactics inside morphemes (MacEachern 1997)
  a. phutu ‘heat’ *tuphu
  b. k’apa ‘cartilage’ *kap’a
  c. k’ask’a ‘acid to the taste’ *t’ank’a
(2) Aymara laryngeal phonotactics across morpheme boundaries (our fieldwork data)
  a. paʎ+t’a+ɲa ‘about to choose’
    tiɲ+ʧ’uki+ɲa ‘to color carefully’
  b. ʧaʎm+thapi+ɲa ‘to finish chewing’
    qaq+thapi+ɲa ‘to finish scratching’
  c. ʧ’um+t’a+ɲa ‘about to drain’
    t’isn+ʧ’uki+ɲa ‘to thread carefully’

Impressionistic descriptions of phonological patterns are often made more nuanced by explorations of natural language corpora, experimentation with native speakers, and computational modeling. In this paper, we look at phonological generalizations in Bolivian Aymara through these three lenses.

Our examination of a morphologically parsed web corpus partially confirms traditional descriptions in the literature for the plain-ejective and plain-aspirate restrictions: while there are numerous exceptions to the restriction in tautomorphemic combinations, there are far more exceptions heteromorphemically than tautomorphemically. Ejective-ejective combinations, however, are infrequent regardless of morphological context.

Despite the exceptions in the lexicon and the overall infrequency of ejective-ejective combinations, two experiments support the synchronic status of restrictions on plain-ejective and ejective-ejective combinations. Native Aymara speakers make more repetition errors on nonce words that violate the putative restrictions than on control words, and speakers make fewer errors on nonce words when the interacting stops may be interpreted as belonging to different morphemes than when they must be parsed as tautomorphemic.

After establishing the corpus and behavioral evidence for the restrictions, we present a computational model that learns the morphologically sensitive, nonlocal phonotactic restrictions from our corpus. The modeling work shows that while certain aspects of the phonotactic restrictions are observable in an unparsed data set (cf. Martin 2007), training on a parsed corpus with morpheme boundaries is necessary to capture the full range of patterns in our experiments and the descriptive literature.

The modeling work in this paper expands on the model developed in Gouskova and Gallagher (to appear). There, we proposed a method for inductively learning nonlocal projections that capitalizes on the observation that nonlocal interactions can be observed in local phonotactics: if X and Y cannot cooccur at longer distances inside a word, they usually cannot be separated by a single segment, either (Suzuki 1998). Our learner induces nonlocal projections by attending to the properties of the language’s segment-level phonotactics. In languages with nonlocal phonological interactions, segments within a certain natural class are restricted from cooccurring across an arbitrary amount of intervening material: e.g., in Quechua pairs of ejectives may not cooccur across an intervening vowel *[k’ap’i], an intervening vowel and consonant *[k’amp’i] or across more material *[k’amip’a] (Gallagher 2016). The arbitrary nature of the intervening segmental material has supported analyses of these patterns that reference an autosegmental tier or projection where only the interacting segments are visible.1 For the Quechua case, this would mean that there is one level of representation in which all segments are visible to the grammar, and another level of representation in which only ejectives are visible; it is on this “ejective projection” that the cooccurrence restriction can be stated as a simple bigram *[+cg][+cg].

Hayes and Wilson’s (2008) inductive phonotactic learner allows the analyst to define nonlocal projections so the model can learn nonlocal phonology. We propose that nonlocal projections can be learned inductively by analyzing the constraints in a baseline grammar without projections. In languages with nonlocal phonology, the baseline grammar will sometimes include trigram constraints of the form *A-any_segment-B. A trigram of this sort is a clue to the learner that natural classes A and B interact nonlocally, and that the nature of the intervening material is irrelevant. Our original model builds projections based on these trigram constraints, and in this paper we expand the procedure to also induce projections from morpheme boundary trigrams: A-[–mb]-B, where [–mb] is the class of all segments but not the morpheme boundary symbol. Intuitively, these constraints will arise in a language where the segments A and B cannot cooccur inside a morpheme (*A-any_segment-B), but occur with some frequency at morpheme boundaries (✓A+B). Constraints of this form indicate that natural classes A and B interact nonlocally, but strictly tautomorphemically. The simulations reported in this paper show that the morphologically sensitive, nonlocal restrictions in Aymara are observable as morpheme boundary trigrams in a parsed corpus, despite the presence of exceptions. We further show that these restrictions cannot be discovered in an unparsed corpus without morphological information, suggesting that Aymara learners may acquire these phonotactic restrictions later in their learning trajectory, only after substantial morphological learning has been accomplished.

The paper is structured as follows. Section 2 summarizes the laryngeal constraints that hold of Aymara words—we cover the descriptive generalizations in the literature on the language, and present our own study of a web corpus of Aymara. Section 3 presents two experimental studies with Aymara speakers, which test their knowledge of nonlocal phonotactics that hold of morphologically simple words as opposed to complex ones in a nonce word repetition experiment. Section 4 presents our computational model and a simulation that induces nonlocal projections from the web corpus described in Section 2. Section 5 offers some general discussion, and Section 6 concludes the paper.

2 Laryngeal restrictions in Aymara

2.1 Background

The consonant inventory of Aymara contains fifteen stops, exhibiting a three-way laryngeal contrast between plain (voiceless unaspirated), ejective and aspirate at five places of articulation. The full inventory is shown in Table 1 (MacEachern 1997; Hardman 2001).

Table 1

Consonant inventory of Aymara.

labial dental postalveolar velar uvular glottal
plain p t ʧ k q
ejective p’ t’ ʧ’ k’ q’
aspirate ʧʰ
fricative s ʃ χ h
nasal m n ɲ
liquid l ɾ ʎ
glide w j

The distribution of ejective and aspirate stops is restricted within morphemes, both inside suffixes and in roots, which we exemplify in (3). As shown in (3a), ejectives and aspirates may appear in either initial or medial position of roots, which are primarily CV(C)CV. Both ejectives and aspirates are rare in roots with an initial plain stop, however (see (3b)). Pairs of heterorganic ejectives are also rare, though forms with identical ejectives are attested (see (3c)). Other combinations of ejectives and aspirates are attested (see (3d)), though see Section 4.4.4 below for further details. Examples are from de Lucca (1987), and these and other patterns are also discussed in detail in MacEachern (1997) and Bennett (2013).

(3) Aymara ejective and aspirate distribution
  a.   ʧhaku ‘coarse’   k’aʧa ‘voice’
      laqha ‘darkness’   hajp’u ‘evening’
  b. *paqha   *kajp’u  
  c.   p’ap’i ‘roasted fish’ *k’ap’i  
      phuʎʧ’u ‘bag’   thakhi ‘road’
      k’amphi ‘tip over’    

The three laryngeal combinations that are underattested inside morphemes – plain-ejective, plain-aspirate and ejective-ejective – are attested in words. These combinations arise when suffixes with an ejective or aspirate consonant combine with roots with a plain or ejective stop. Some examples are given in (4), from work with a native speaker consultant in El Alto, Bolivia. These examples involve three verbal suffixes, [-t’a] ‘about to’, [-ʧ’uki] ‘carefully, continuously’, and [-tʰapi] ‘finish’. All three of these suffixes trigger syncope (deletion of the root-final vowel).2

(4) Ejectives and aspirates in morphologically complex words in Aymara
  a. paʎ+t’a+ɲa ‘about to choose’ tiɲ+ʧ’uki+ɲa ‘to color carefully’
    taw+t’a+ɲa ‘about to row’ pump+ʧ’uki+ɲa ‘to mix carefully’
  b. ʧaʎm+thapi+ɲa ‘to finish chewing’ qaq+thapi+ɲa ‘to finish scratching’
  c. ʧ’um+t’a+ɲa ‘about to drain’ t’isn+ʧ’uki+ɲa ‘to thread carefully’
    q’eχ+t’a+ɲa ‘about to whip’ k’uɲ+ʧ’uki+ɲa ‘to bend over continuously’

There are several exceptions to the restrictions on tautomorphemic ejective-ejective, plain-ejective and plain-aspirate combinations. These are given in (5). Some of these exceptions occur at the level of a trigram on the linear string – important for our model – while others occur across more intervening material and would be noticeable only when looking at a nonlocal projection. Additionally, there are four combinations of plain-ejective and two combinations of plain-aspirate that occur in clusters. These are reported by our consultant to be monomorphemic forms, though Hardman claims that root-internal stop codas are not found. The morphological structure of these forms is thus in question. An additional observation is that several of the exceptions end in the sequence [t’a], just like the productive suffix.

(5) Exceptions to tautomorphemic restrictions
  a. taphijala ‘earthen wall’ qhaʧqha ‘rough to the touch’
    kawkha ‘where’ ʧhapʧha ‘mediocre’
  b. pist’a ‘scarcity’ ʎupt’a ‘bribery’ lupt’a ‘when it is very hot’
    qaʧ’i ‘type of potato’ loqt’a ‘scope’ ukʧ’a ‘height’
  c. q’ewt’a ‘curve, angle’        

2.2 Descriptive lexical statistics

To assess the statistical evidence available to an inductive learner trying to acquire these restrictions, we looked at the observed combinations of all three series of stops. Our data set is a morphologically segmented word list. This list was compiled by taking an unsegmented web corpus of 88,728 forms, collected from 438 webpages by An Crúbadán (http://crubadan.org/).3 The corpus was converted to lowercase and cleaned to remove numbers, non-alphanumeric characters and English or Spanish forms. Forms were also removed if they contained stray apostrophes, hyphens or other typos that we couldn’t interpet. This list was then crossed with a list of 1846 roots, derived from the de Lucca (1987) dictionary with the help of a native speaker consultant, and a list of 50 suffixes and their allomorphs from the Hardman (2001) grammar and the de Lucca (1987) dictionary. There were 46,164 forms that were divisible into a known root and known suffixes, and these forms comprise the corpus we use in this paper.

Before looking at combinations of stops directly, we report on the distribution of stops by position in our word corpus, comparing the number of each class of stops in root initial position, root medial position, and in a suffix. Table 2 gives the raw counts on the left (e.g., there are 9,113 plain stops which are in root-initial position in the word corpus), as well as the probability of a stop from the given class in the given position (e.g., 20% of our 1,846 roots begin with a plain stop, 7% with an aspirate and 7% with an ejective, the remaining 66% of roots begin with a vowel, fricative or sonorant consonant). These numbers show that plain stops are frequent in both roots and suffixes, while aspirates and ejectives are both much more frequent in roots than in suffixes.

Table 2

Observed occurrences and probability of stops in three positions.

root initial root medial suffix
plain 9,113 (0.20) 17,441 (0.12) 65,148 (0.23)
aspirate 3,007 (0.07) 2,871 (0.02) 817 (<0.01)
ejective 3,037 (0.07) 2,622 (0.02) 1,901 (<0.01)

Tables 3 and 4 report the observed counts for stop combinations in tautomorphemic and hetermorphemic strings, on both the baseline and a nonlocal projection containing only stops. For tautomorphemic sequences, we looked at the cooccurrence of nonadjacent stops in a baseline trigram – that is, of a C1XC2 string, where X can be any segment except the morpheme boundary symbol – and for adjacent bigrams on the stop projection. For hetermorphemic sequences, we looked at trigrams where the medial gram was the morpheme boundary on the baseline and the stop projection. In all tables, “ejective-ejective” refers to counts made over only non-identical combinations; all other combinations represent both identical (where applicable) and non-identical combinations.

Table 3

Corpus counts for tautomorphemic and heteromorphemic stop combinations in a baseline trigram. Examples are schematic and ellipses represent any additional material.

tautomorph. example heteromorph. example
plain-aspirate 0 …patʰa… 149 …lip+tʰa…
plain-ejective 0 …pat’a… 659 …lip+t’a…
ejective-ejective (non-identical) 0 …p’at’a… 1 …lip’+t’a…
plain-plain 3532 …pata… 3673 …lip+ta…
aspirate-plain 683 …pʰata… 30 …lipʰ+ta…
ejective-plain 765 …p’ata… 46 …lip’+ta…
aspirate-aspirate 613 …pʰatʰa… 0 …lipʰ+tʰa…
ejective-aspirate 466 …p’atʰa… 4 …lip’+tʰa…
aspirate-ejective 38 …pʰat’a… 2 …lipʰ+t’a…
Table 4

Corpus counts for tautomorphemic and heteromorphemic stop combinations on a stop projection. Examples represent cases where stops are adjacent on a projection but non-adjacent on the baseline.

tautomorph. example hetermorph. example
plain-aspirate 61 …pastʰa… 261 …pas+tʰa…
plain-ejective 68 …past’a… 668 …pas+t’a…
ejective-ejective (non-identical) 4 …p’ast’a… 7 …p’as+t’a…
plain-plain 5389 …pasta… 23519 …pas+ta…
aspirate-plain 943 …pʰasta… 1262 …pʰas+ta…
ejective-plain 906 …p’asta… 1781 …p’as+ta…
aspirate-aspirate 712 …pʰastʰa… 1 …pʰas+tʰa…
ejective-aspirate 476 …p’astʰa… 22 …p’as+tʰa…
aspirate-ejective 38 …pʰast’a… 29 …pʰas+t’a…

Table 3 shows that, within a baseline trigram, the restricted combinations are unattested tautomorphemically. Ejective-ejective combinations are also nearly unattested in heteromorphemic contexts, but plain-aspirate and plain-ejective combinations are more frequent across a morpheme boundary. The bottom portion of the table shows that other combinations of stops are either frequent in both heteromorphemic and tautomorphemic contexts, or are more frequent tautomorphemically than heteromorphemically (note that while ejectives and aspirates may occur in suffixes, the numbers here reflect the rarity of such suffixes in the corpus as a whole). As noted in Section 2.1 above, vowels in both roots and suffixes syncopate via affixation, creating consonant clusters across morpheme boundaries. This is crucial to the distinction between tautomorphemic and heteromorphemic contexts in Table 3. The plain-aspirate, plain-ejective and ejective-ejective combinations that occur in a heteromorphemic trigram are actually adjacent in the linear string, since the morpheme boundary symbol constitutes the medial gram in the trigram. If Aymara did not have syncope, heteromorphemic combinations would only be noticeable in a tetragram, compare actual [qaq+tʰapi+ɲa] ‘to finish scratching’ to hypothetical [qaqa+tʰapi+ɲa]. We will return to this point below.

Table 4 gives the counts on a stop projection. Here, both [p’at’] and [p’ant’] would count as ejective-ejective combinations. Plain-aspirate and plain-ejective combinations are still much more frequent in hetermorphemic than tautomorphemic contexts, though there are substantially more exceptions tautomorphemically than are observable in a baseline trigram. Ejective-ejective combinations are again rare in both contexts. The bottom portion of the table shows that most combinations (plain-plain, aspirate-plain, ejective-plain) are well attested in both morphological contexts, while aspirate-ejective combinations are somewhat rare in both contexts. Aspirate-aspirate and ejective-aspirate combinations are both more frequent in tautomorphemic combinations, again due to the general rarity of aspirates in suffixes.

The numbers here show that the restrictions on plain-ejective and plain-aspirate combinations in descriptive grammars are supported in counts over a word corpus, at both the baseline trigram level and on a nonlocal projection including only stops. The restrictions on ejective-ejective combinations are more difficult to assess, due to the rarity of heteromorphemic combinations (though our consultant work shows that these are possible, if not frequent in the corpus). The other six combinations of stops are reported to be licit in all morphological contexts, and this appears to be essentially true in our corpus as well, though certain combinations are unattested or nearly unattested (see Section 4.4.4 for discussion of other restricted combinations).

The experiments presented in the next section look at how native speakers of Aymara treat forms with plain-ejective and ejective-ejective combinations, in both heteromorphemic and tautomorphemic contexts. We then present the results of our computational model in Section 4, showing how the counts presented above are reflected in an inductive phonotactic grammar.

3 Experimental work

The two experiments reported below provide behavioral evidence that speakers of Aymara have learned the laryngeal restrictions on ejectives, and that treatment of these phonotactic structures is sensitive to morphological structure.

3.1 Experiment 1: sensitivity to restrictions on ejectives

Experiment 1 presents Aymara speakers with simple disyllabic forms, which could be interpreted as pseudo-nouns, that contain plain-ejective or ejective-ejective combinations. Speakers’ errors in repeating such forms are evaluated to assess whether speakers have internalized phonotactic restrictions on these combinations.

3.1.1 Participants

The participants were 21 native Aymara speakers, all Spanish bilinguals. All were college educated and resided in El Alto, Bolivia, and most were students at Universidad Pública de El Alto. There were seven male and fourteen female participants, aged 19–30. Fifteen participants reported that they had been speaking Aymara since birth, five learned Aymara between the ages of 4 and 7, and one at 12.

3.1.2 Methods

Stimuli The stimuli were disyllabic C1VC2V nonce forms. Control items had a phonotactically legal ejective in C2 and a fricative or sonorant in C1. Ejective-ejective items contained a putative phonotactic violation by having heterorganic ejectives in C1 and C2 and plain-ejective items contained a putatively restricted combination of a plain stop in C1 and an ejective in C2. There were fifteen items of each type, and an additional fifteen, phonotactically legal fillers with a plain stop in C2, for a total of 60 items. The complete list of stimuli is shown in Table 5.

Table 5

Stimuli for Experiment 1.

control ejective-ejective plain-ejective filler
lap’a saq’o nut’a k’it’a p’ik’a k’ap’u tip’i tuk’i kip’a t’apu kupa lipu
jup’a moq’o yap’i p’it’a t’oq’e k’ut’a kut’a toq’e kap’i k’api tipu napu
juk’u lip’u lik’a q’ap’i t’ap’u k’up’i pit’a kup’a tip’a k’ati kipi natu
juk’a nap’u luk’a k’ip’a k’ip’i p’uk’a kip’u kap’a puk’i k’upi kapu japi
seq’a nat’u maq’o q’at’a q’op’i t’aq’o qat’i tip’u taq’e p’uka puki luka

The stimuli were made from recordings of a native Aymara speaking consultant reading phonotactically legal nonce words. The stimuli were made by splicing together C1V and C2V during the closure of the second stop, e.g., [lap’a] was made by splicing [lapa] and [map’a] together during the labial closure. All stimuli were normalized for amplitude, but were otherwise unmodified.

Procedure Participants were seated in front of a laptop computer wearing AudioTechnica noise cancelling headphones. The stimuli were presented using PsyScope (http://psy.ck.sissa.it/). On each trial, the audio stimulus was played once and participants were asked to repeat what they heard as precisely as possible. Participants were told that the words they would hear were not real words of Aymara, though they would contain sounds familiar from Aymara. Once participants had repeated the item, they pressed any key on the keyboard to move on to the next trial. No orthographic representation of the stimuli was given.

Analysis The audio recordings of participants’ responses were transcribed, and coded for accuracy and type of error, if any. Errors on ejective-ejective and plain-ejective items were then further classified as repairs or non-repairs, depending on whether they removed the putative phonotactic violation or not.

3.1.3 Results

Accuracy Overall accuracy differed between control, ejective-ejective and plain-ejective items, as shown in Figure 1. Accuracy on control items was very high (97%), while accuracy on ejective-ejective and plain-ejective items was lower, consistent with a difference in phonotactic legality. Items with an ejective-ejective combination were repeated accurately more often than items with a plain-ejective combination (50% vs. 32%).

Figure 1
Figure 1

Accuracy on control, ejective-ejective and plain-ejective forms in Experiment 1. Open circles indicate an individual participant’s performance; boxplots show summary statistics across all participants.

All trials were coded for accuracy (correct or incorrect), and a binomial, linear mixed model was then fit with accuracy as the dependent variable, a ternary predictor of type, a random effect of type by participant and a random slope by participant. Ejective-ejective was set as the baseline to which the other two factor levels, plain-ejective and control, were compared. The model was fit using the lmer function in the lme4 package (Bates et al. 2014) for R (R Development Core Team 2018, https://www.r-project.org/). Both comparisons were significant. Accuracy on control is significantly higher than accuracy on ejective-ejective (β = 4.81, SE = 1.02, z = 4.72, p < 0.0001), and accuracy on plain-ejective stimuli is significantly lower than on ejective-ejective stimuli (β = –1.03, SE = 0.44, z = –2.31, p = 0.02).

To allow comparison with Experiment 2 below, the results of Experiment 1 were also analyzed for an effect of place of articulation. Plain-ejective and ejective-ejective trials were coded for whether the medial ejective was dental or not (labial, velar or uvular), and a model was fit to accuracy (correct or incorrect) with type (plain-ejective or ejective-ejective), place (dental or not) and their interaction as predictors. The model had a random intercept by participant and a random slope for place (a model with random slopes for place and type failed to converge). The interaction between type and place was not significant, so it was removed from the model. In the model without the interaction, the main effect of type was again significant (accuracy on plain-ejective stimuli is lower than on ejective-ejective stimuli, β = –0.90, SE = 0.19, z = –4.83, p < 0.0001), and the model also found a main effect of place. Accuracy on forms with a dental ejective in C2 is higher than on forms with a non-dental (β = 0.67, SE = 0.29, z = –2.30, p = 0.02).

Errors The frequency of different errors on ejective-ejective and plain-ejective stimuli the frequency summarized in Table 6 and Table 7, distinguishing between repair and non-repair errors.4

Table 6

Errors on ejective-ejective stimuli.

error example % of responses (n)
repair C2 de-ejectivization k’ap’u → k’apu 39% (121)
C1 and C2 de-ejectivization k’ap’u → kapu 2 % (6)
C1 deletion k’ap’u → ap’u 0.5% (1)
C2 aspiration k’ap’u → k’apʰu 1% (4)
total 42.5%
non-repair C1 de-ejectivization k’ap’u → kap’u 4% (12)
labial → dental k’ap’u → k’at’u 3.5% (11)
total 7.5%
Table 7

Errors on plain-ejective stimuli.

error example % of responses (n)
repair C2 de-ejectivization kap’u → kapu 12% (37)
ejective reassociation kap’u → k’apu 28% (86)
C1 change kap’u → hap’u 5% (16)
total 45%
non-repair ejective doubling kap’u → k’ap’u 22% (66)
labial → dental kap’u → kat’u 1% (3)
C2 aspiration kap’u → kapʰu 1% (2)
total 24%

Errors on both types of stimuli remove the putative phonotactic violation more often than not, though ejective doubling errors are quite common on plain-ejective stimuli. Looking at the distribution of errors, we can see that the difference in accuracy between ejective-ejective and plain-ejective stimuli does not stem from a difference in how often these structures are repaired; repair rates for the two stimulus types are comparable. Instead, the lower accuracy on plain-ejective forms overall is driven by the greater rate of non-repair errors; see Gallagher (2016) for further discussion of this type of error. The relevance of errors that map a labial ejective to a dental will be discussed in conjunction with the results of Experiment 2 in Section 3.3.

3.1.4 Discussion

The results of Experiment 1 support the status of both ejective-ejective and plain-ejective combinations as synchronically restricted in Aymara. Forms that violate these restrictions are repeated significantly less accurately than phonotactically legal controls.

There is also an effect of place of articulation, with higher accuracy on forms with medial dental ejectives. As described above, Aymara has a productive suffix [t’a] that may result in plain-ejective or ejective-ejective combinations at the word level. While the stimuli in Experiment 1 had the shape of bare roots – as opposed to the morphologically complex forms in Experiment 2 – the greater accuracy on forms with a dental ejective reflects the likelihood of a dental in C2 when plain-ejective and ejective-ejective combinations occur at the word level. The independent roles of place of articulation and morphological structure will be discussed further below, by comparing the results of Experiment 1 and Experiment 2.

3.2 Experiment 2: Laryngeal restrictions and morphology structure

The goal of Experiment 2 is to test whether speakers’ errors on ejective-ejective and plain-ejective combinations are influenced by morphological structure, and whether these restrictions hold across more than a single intervening vowel. Experiment 2 presents participants with the same kinds of phonotactically illegal structures – ejective-ejective and plain-ejective pairs – as in Experiment 1, but in Experiment 2 the nonce words are pseudo-verbs ending in the infinitval suffix [-ɲa]. The experiment compares performance on stimuli where the illegal ejective in C2 must be interpreted as part of the root vs. stimuli where the illegal ejective in C2 may be interpreted as part of a productive suffix [t’a]. Additionally, the pseudo-roots in Experiment 2 all contain a coda consonant, so the interacting consonants are separated by a VC sequence as opposed to the single V in Experiment 1.

3.2.1 Participants

The participants were 20 of the participants from Experiment 1 (data from one participant were accidentally not recorded). Participants were balanced as to whether they completed Experiment 1 or Experiment 2 first.

3.2.2 Methods

Stimuli The stimuli were trisyllabic pseudo-verbs, all ending in the infinitival suffix [-ɲa]. The pseudo-verb stem was C1V1CC2V2, where C1 was either a plain stop or an ejective and C2V2 was either [p’a] or [t’a]. Forms with [t’a] were plausibly polymorphemic, while forms with [p’a] were not, since [t’a] is a productive verbal suffix and there is no suffix [-p’a]. All forms had a coda consonant in the first syllable, because in real words the suffix [-t’a] triggers deletion of a root vowel and thus forms a cluster with the final root consonant.

The test forms just described fell into one of four categories, based on the laryngeal restriction that was violated and the place of articulation of C2 (as a stand-in for implied morphological complexity): plain-ejective-labial, plain-ejective-dental, ejective-ejective-labial, ejective-ejective-dental. There were ten tokens in each test category and 40 filler items, which had a plain stop in C2 and a plain stop, fricative or sonorant in C1, for a total of 80 items. The test items are given in Table 8.

Table 8

Stimuli for Experiment 2 (not including fillers).

ej-ej-labial ej-ej-dental pl-ej-labial pl-ej-dental
k’asp’a+ɲa k’as+t’a+ɲa kasp’a+ɲa kas+t’a+ɲa
k’isp’a+ɲa k’is+t’a+ɲa kisp’a+ɲa kis+t’a+ɲa
k’aʎp’a+ɲa k’aʎ+t’a+ɲa kaʎp’a+ɲa kaʎ+t’a+ɲa
k’uʎp’a+ɲa k’uʎ+t’a+ɲa kuʎp’a+ɲa kuʎ+t’a+ɲa
ʧ’imp’a+ɲa ʧ’in+t’a+ɲa ʧimp’a+ɲa ʧin+t’a+ɲa
ʧ’amp’a+ɲa ʧ’an+t’a+ɲa ʧamp’a+ɲa ʧan+t’a+ɲa
ʧ’uʎp’a+ɲa ʧ’uʎ+t’a+ɲa ʧuʎp’a+ɲa ʧuʎ+t’a+ɲa
ʧ’iʎp’a+ɲa ʧ’iʎ+t’a+ɲa ʧiʎp’a+ɲa ʧiʎ+t’a+ɲa
q’asp’a+ɲa q’as+t’a+ɲa qasp’a+ɲa qas+t’a+ɲa
q’oʎp’a+ɲa q’oʎ+t’a+ɲa qoʎp’a+ɲa qoʎ+t’a+ɲa

The stimuli were made from recordings of a native Aymara speaker producing phonotactically legal nonce words, using the same splicing method and normalization as described for Experiment 1.

Procedure    The procedure was identical to Experiment 1.

Analysis       The analysis was identical to Experiment 1.

3.2.3 Results

Results from three participants were removed from further analysis because they had a low accuracy rate on filler items (15%, 32% and 64%), showing that they struggled with the task as a whole. The following discussion reflects the results of the remaining 17 participants.

Accuracy as shown in Figure 2, accurate repetition of forms with [t’a], those that are plausibly polymorphemic, was higher than those with [p’a], where the plain-ejective or ejective-ejective combination must be interpreted as monomorphemic (74% vs. 35%). This distinction held for both ejective-ejective and plain-ejective combinations.

Figure 2
Figure 2

Accuracy in Experiment 2. Open circles indicate an individual participant’s performance; boxplots show summary statistics across all participants.

A binomial, linear mixed model was fit to accuracy with predictors of place of articulation (labial or dental), violation type (ejective-ejective or plain-ejective), and their interaction, along with by-participant random slopes for place and type (a model with the interaction as a random by-participant slope failed to converge) and a random intercept for participant. The model finds a main effect of place, with lower accuracy on labial forms than dental forms (β = –1.37, SE = 0.31, t = –4.39, p < 0.0001). The model also revealed a significant interaction between place and violation type (β = –1.52, SE = 0.62, t = –2.46, p = 0.02). While overall accuracy only marginally differs between ejective-ejective and plain-ejective violations (β = 1.11, SE = 0.59, t = 1.89, p = 0.06), the direction of the effect differs depending on place. For labials, accuracy on ejective-ejective is slightly higher than on plain-ejective (38.5% vs. 32%), while for dentals, plain-ejective accuracy is higher than ejective-ejective accuracy (82% vs. 67%).

Errors In Experiment 2, a high number of errors involved changing a labial ejective to a dental ejective, thereby repairing the phonotactic violation by separating the combining stops with a morpheme boundary. For example, in [k’asp’aɲa], the pair of ejectives must be interpreted as cooccurring within a root (*[k’asp’a+ɲa]), while in [k’ast’aɲa] the pair of ejectives may be interpreted as cooccurring across a morpheme boundary (*[k’ast’a+ɲa] and [k’as+t’a+ɲa] are both possible parses). The frequency of different errors is summarized in Tables 9 and 10. In each table, errors in the top section remove the phonotactic violation entirely. In the second section, errors change the plausible morphological structure of the pseudo-verb by changing a labial to a dental, and in the third section errors do not repair the violation. Virtually all errors are much more frequent for labial forms than dental forms, and place errors are only attested for labial forms.

Table 9

Errors on cooccurrence stimuli. Top: errors that remove the phonotactic violation, middle: errors that change place and morphological structure, bottom: non-repair errors. Percentages indicate the total rate of errors out of all responses (e.g., 60.5% labial stimuli were produced with errors, and 38.5% were produced without errors).

error example labial dental
% (n) % (n)
lar. repairs C2 de-ej. k’asp’aɲa --> k’aspaɲa 18.5 (31) 7 (12)
C1 & C2 de-ej. k’asp’aɲa --> kaspaɲa 4 (6) 1 (2)
C1 change k’asp’aɲa --> asp’aɲa 0.5 (1) 2 (3)
pl. repairs C1 de-ej & place k’asp’aɲa --> kast’aɲa 12 (20) 0 (0)
place change k’asp’aɲa --> k’ast’aɲa 18.5 (31) 0 (0)
non-repairs C1 de-ej. k’asp’aɲa --> kasp’aɲa 7 (12) 23 (39)
total 60.5 33
Table 10

Errors on ordering stimuli. Top: errors that remove the phonotactic violation, middle: errors that change place and morphological structure, bottom: non-repair errors. Percentages indicate the total rate of errors out of all responses (e.g., 18% of dental stimuli were produced with errors, and 82% were produced without errors).

error example labial dental
% (n) % (n)
lar. repairs C2 de-ej. kasp’aɲa --> kaspaɲa 15 (28) 10 (17)
ej. reassociation kasp’aɲa --> k’aspaɲa 6 (11) 0.5 (1)
C1 change kasp’aɲa --> asp’aɲa 0 (0) 0.5 (2)
pl. repairs place change kasp’aɲa --> kast’aɲa 40 (74) 0 (0)
ej. double & pl. kasp’aɲa --> k’ast’aɲa 2 (4) 0 (0)
non-repairs ejective double kasp’aɲa --> k’asp’aɲa 5 (9) 7 (11)
total 68 18

3.2.4 Discussion

Experiment 2 shows that speakers’ treatment of plain-ejective and ejective-ejective combinations is sensitive to the inferred morphological structure of the forms in which they occur. When these consonant combinations must be interpreted as being tautomorphemic, there is a higher error rate than when the combination may be interpreted as heteromorphemic.

3.3 Summary and comparison of Experiments 1 and 2

Together, Experiments 1 and 2 show that Aymara speakers have learned morphologically sensitive restrictions on ejective-ejective and plain-ejective combinations. Comparison between the two experiments supports a role for both morphological structure and phonetic category in the place of articulation effect found in both studies.

Error rates on forms with dental ejectives were lower than error rates on forms with non-dental ejectives in both experiments. This effect is at least partly an effect of phonetic category: dental ejectives are much more common as the second consonant in a plain-ejective or ejective-ejective sequence than other places of articulation (97% of such combinations in our corpus have [t’] as C2), because of the frequency of the suffix [-t’a]. We can thus conclude that speakers are sensitive to the different distribution of [t’] compared to other ejectives.

To see if there is also a contribution of morphological structure, we need to consider the increased accuracy on dental ejective forms between Experiments 1 and 2. In Experiment 1, all forms, including those with dental ejectives, have a monomorphemic structure. Some of the dental ejective stimuli in Experiment 1 cannot be decomposed into suffixes (e.g., [qat’i] – there is no suffix [t’i]). Stimuli with [t’a], such as [kat’a] and [nut’a], are unlikely to be analyzed as root-suffix because the suffix [t’a] always attaches to bases that are at least CVC (e.g., [taw+t’a+ɲa] ‘about to row’).5 The other reason for doubting that the dental ejective stimuli in Experiment 1 are morphologically decomposed is that [t’a] is an aspectual suffix that is usually followed by tense and person marking. There are only six words in the corpus where [-t’a] is the last and only suffix, and none of them have the shape CVt’a.

On the other hand, in Experiment 2, forms with a dental ejective have a polymorphemic structure, since they contain the two verbal suffixes [t’a] and [ɲa]. To test for an effect of morphological structure, an additional, post-hoc statistical model was run. Responses to ejective-ejective and plain-ejective stimuli from the two experiments were pooled and a binomial linear mixed model was fitted. The dependent variable was accuracy, and the independent variables were experiment (Experiment 1 or Experiment 2) and place (dental or not). There was a random intercept for participant and a random by-participant slope for place (a model with a random slope for experiment failed to converge). The model found a significant interaction between place and experiment (β = 1.14, SE = 0.28, t = 4.15, p < 0.0001), revealing that the effect of place differs between the two experiments. The difference between accuracy on dental and non-dental forms is larger in Experiment 2 (74% vs. 35%, a 39 point difference), where stimuli have a polymorphemic structure, than in Experiment 1 (52% vs. 37%, a 15 point difference) where stimuli have a monomorphemic structure. While the experiments were not originally designed to be compared in this way, the raw differences in accuracy and the high significance level in the statistical test are supportive of an effect of morphological structure above and beyond place of articulation.

The types of errors between the two experiments also differ, and further show the importance of morphological structure to repetition accuracy. While place of articulation errors are quite rare in Experiment 1, these errors are very frequent in Experiment 2. Errors that map a labial ejective to a dental make up just 4.5% of errors in Experiment 1 but 73% in Experiment 2. Participants’ responses in Experiment 2 are thus tracking the polymorphemic, pseudo-verb structure of the stimuli, skewing responses to create a phonotactically legal form by changing place of articulation and thus morphological structure.

In sum, the experiments here provide behavioral evidence that the laryngeal restrictions on ejectives and their interaction with morphology are part of speakers’ synchronic grammars.

4 Learning simulations

Having presented the corpus and behavioral evidence for the restrictions in Aymara, we move on to modeling the learning of these patterns from our corpus. Our model starts with a parsed corpus of word forms, notices the need for nonlocal projections, and induces a set of constraints that capture the local and nonlocal phonology of the language. We show how our model fits our experimental results as well as a broader range of restrictions in the literature on Aymara.

4.1 A model of learning projections from baseline phonology

4.1.1 A brief description of the learner

In this section, we present a brief description of our learning model, which is described in more detail in Gouskova and Gallagher (to appear). The implementation of the model is available on GitHub at https://github.com/gouskova/inductive_projection_learner.

The model builds on Hayes and Wilson’s (2008) UCLA Phonotactic Learner (UCLAPL). The first stage of the learning procedure constructs a phonotactic grammar based on a list of phonological words and features describing each segment (see (8)). The model proceeds to construct a set of constraints against unattested and underattested sequences (see (9)). These constraints are formulated in terms of natural classes, and they are given weights using a Maximum Entropy procedure, which seeks to maximize the probability of the learning data. The resulting grammar can be used to assign harmony scores to test words, so that its fit to the data can be compared against, e.g., experimental data from human speakers (see Goldwater and Johnson 2003; Daland et al. 2011; Berent et al. 2012; Hayes and White 2013; Wilson and Gallagher 2018).

4.1.2 Building projections from cues in the baseline grammar

The original version of the learner has the capability to posit constraints on autosegmental projections provided by the analyst: for example, Hayes and Wilson (2008) demonstrate that their learner can find constraints enforcing vowel harmony in Shona verbs when it is given a projection that includes only vowels; the learner can also capture the stress pattern of Wargamay when given the appropriate projections for segments that bear primary and secondary stress. We exploit this capability in our extension to the learner, which finds projections and/or modifies the training data automatically when certain cues are present in the baseline grammar. For the simulations reported in this paper, two such cues are instrumental:

(6) Segmental placeholder trigrams: constraints of the form *X-any_segment-Y, where X and Y are part of a natural class Z. When the learner finds such trigrams, it adds projection Z to its search space of constraints.
(7) Morpheme-boundary trigrams: constraints of the form *X-non_morpheme_boundary-Y, where X and Y are part of a natural class Z. When the learner finds such trigrams, it adds a projection Z to its search space of constraints.

Hayes and Wilson’s learner automatically adds the feature [±word boundary] to every feature set in order to capture word edge phonotactics. The non-boundary segments of each language are then automatically part of the largest natural class, [–word boundary]. Since Hayes and Wilson’s learner has a bias toward broad natural classes, the learner will identify the constraints that refer to this class relatively early compared to other trigrams, provided the language offers support for them. Intuitively, the presence of constraints whose middle segment can be any of the segments in the language, *X-any_segment-Y, is a cue to the learner that the segments to either side of “any segment” interact nonlocally. The logic for non-morpheme-boundary trigrams is similar. If the grammar includes a constraint *X-non_morpheme_boundary-Y, this tells us that X and Y are permitted across a morpheme boundary but not across any intervening segment, so *X-any_segment-Y holds tautomorphemically.

Our extension of the learner uses constraints of this type to posit a projection that includes whichever natural class is the smallest class including both X and Y. For example, if the learner finds that [l] and [r] cannot occur across any segment (as in an idealized version of Latin, Steriade 1987; Cser 2010), it will posit a projection of liquids. More specifically, in the simulation of Quechua reported in Gouskova and Gallagher (to appear), the learner’s baseline includes the constraint *[–cont,-son][–wb][+cg], or *stop-any_seg-ejective. The smallest class that includes all stops and ejectives is the natural class of stops, so the stop projection is added to the grammar and searched for constraints. This is schematically illustrated in (8)–(11).

(8) Input to the learner
  a. training data: {<#pata#>, <#p’ata#>, <#pʰata#>, <#t’ampa#>, <#map’a#>, <#lama#>, …}
  b. feature set
 
    1. (9)
    1. Stage 1 output of the learner: baseline grammar
    1. (10)
    1. Learner posits a projection for the smallest natural class that includes [–son, –cont] and [+cg]:
    1. (11)
    1. Stage 2 output of the learner: projection grammar

Occasionally, the learner identifies more than one placeholder constraint, in which case we allow it to search each of the resulting projections for constraints if the natural classes they entail are distinct.

If the learner is trained on morphologically parsed data, it may detect constraints of the form *X-non_morpheme_boundary-Y. In our feature sets, all segments are [–morpheme boundary], but word edges and morpheme boundaries are [+morpheme boundary]. If the learner posits a constraint whose middle gram is [–morpheme boundary], this means that segmental trigrams of the form *X-any_segment-Y are underattested, but the trigram X-morpheme_boundary-Y is attested often enough to exclude it from the formulation of the constraint. This situation will arise in a language like Aymara that has few stop-any_segment-ejective trigrams but a fair number of stop-morpheme_boundary-ejective trigrams. The phonotactic restriction is cancelled in heteromorphemic contexts – the occurrence of an ejective in close proximity to a stop is a boundary signal (Grenzsignal) in the sense of Trubetzkoy (1939). Constraints of this type also cue our learner to construct a projection. The only difference is that when the learner is looking at morphologically parsed words, the morpheme boundary symbol will also be present on all projections.6 If morpheme boundaries can be present on a projection, they will separate the segments that would otherwise form a bigram, as shown in (12). For compactness, we will write morpheme boundaries as “+” rather than [+morpheme boundary], and the non-morpheme boundary class will be [–mb].

(12) Projections with morpheme boundaries
  a. Projection b. What is visible  
    baseline/default   t a m p’ a p a n + t’ a
    [–son, –cont]   t         p’ p        + t’

4.2 Parameters manipulated by the analyst

There are several parameters that affect the learner’s ability to find generalizations. First, the segmental features determine whether the learner can group the segments into the right natural classes. The learner is sensitive to the size of the classes, as well as their overall number (see Hayes and Wilson 2008, Gouskova and Gallagher to appear). Since we were primarily interested in laryngeal restrictions, we selected a feature set that uses mostly privative features (specifically, [plain], [cg] and [sg]). Hayes and Wilson’s learner favors constraints whose natural classes mention as few features as possible, so privative features allow certain classes to be picked out more easily; in the Aymara case, this means that plain stops can be picked out as [+plain] instead of as [–cg, –sg]. See Section 4.3 for the full list.

The other parameters have an effect on the number of constraints induced, the length of segmental strings they scope over, and how closely the learner fits the grammar to the data. We were generous in the number of constraints we allowed the learner to discover, since this learner stops when it cannot identify any constraints that pass the selection criterion. The length of constraint strings on the segmental projection ranged from 1 (as in *[+cg]) to 3 (as in *[–syllabic][–syllabic][–syllabic], “no CCC clusters”); the length of constraint strings on higher projections ranged from 2 to 3. In addition to these two fairly simple parameters, there are several parameters that affect the fit of the model in various ways.

The first parameter is gain (Della Pietra et al. 1997; Wilson and Gallagher 2018; Gouskova and Gallagher to appear), which replaces the O/E threshold criterion in the version of the learner described in Hayes and Wilson (2008). The gain of a constraint is proportional to the reduction in the Kullback-Leibler divergence between the current grammar and the grammar with C added, and the weights of all the other constraints unchanged. Put differently, gain is higher when the probability distributions in the learning data are closer to those generated by the grammar if the constraint were added. A constraint can only be added if its gain exceeds the threshold; the higher the gain, the harder it is to add new constraints. We have found moreover that gain can be lower when the training data sets are small and each datum is relatively informative, but larger data sets yield more sensible grammars when gain is higher.

The second parameter we manipulated is gamma. This parameter affects how the objective function of the learner is calculated each time a constraint is added – it scales the harmony score relative to the negative log probability, with the effect of increasing the impact of constraint violations by individual candidates. Increasing gamma makes it less likely that constraints with very low weights will be learned (NB: both Della Pietra et al. 1997 and Wilson and Gallagher 2018 use ɣ to refer to the gain threshold; this is distinct from the gamma parameter). There are additional parameters that can be manipulated, such as the Laplace regularizer λ, whose function is to penalize constraints with large weights (Wilson and Gallagher 2018: 615). We set λ to a small constant 0.00001.

4.3 Learning data and features

The corpus we used was based on the Aymara wordlist on the An Crúbadán project website (http://crubadan.org), described in Section 2. We created two versions of the corpus: an unsegmented list of phonological words (transcribed on the basis of the transparent orthography), and a segmented list with morpheme boundaries. Recall that we only used those words in the An Crúbadán list that contained the roots that also occur in the de Lucca (1987) dictionary. Since morpheme boundaries add to the overall length of each string, we had to filter the segmented word list to exclude words above a certain length.7 This left us with 46,164 words each in the segmented and unsegmented lists.

The feature set we used for all simulations is shown in Table 11. In addition to the phonemes of Aymara, the feature set contains the morpheme boundary “+”, the word boundary, and a special “copy” segment X, which has just one privative feature, [+copy]. As explained in Gouskova and Gallagher (to appear), this copy notation is necessary because the learner does not implement algebraic notation in its constraint language (Berent et al. 2012). Thus, to allow the learner to distinguish between the allowed identical pairs of ejectives and the disallowed non-identical ones (recall Section 2.1), we transcribe words such as [t’ant’a] as [t’anXa].

Table 11

Features of Aymara segments for computational simulations.

long syll son cont cg sg plain lab dent pal vel uv rhot lat nas lo bk hi
p 0 0 0 + + 0 0 0 0 0 0 0 0 0 0
t 0 0 0 + 0 + 0 0 0 0 0 0 0 0 0
ʧ 0 0 0 + 0 0 + 0 0 0 0 0 0 0 0
k 0 0 0 + 0 0 0 + 0 0 0 0 0 0 0
q 0 0 0 + 0 0 0 0 + 0 0 0 0 0 0
p’ 0 + 0 0 + 0 0 0 0 0 0 0 0 0 0
t’ 0 + 0 0 0 + 0 0 0 0 0 0 0 0 0
ʧ’ 0 + 0 0 0 0 + 0 0 0 0 0 0 0 0
k’ 0 + 0 0 0 0 0 + 0 0 0 0 0 0 0
q’ 0 + 0 0 0 0 0 0 + 0 0 0 0 0 0
0 0 + 0 + 0 0 0 0 0 0 0 0 0 0
0 0 + 0 0 + 0 0 0 0 0 0 0 0 0
ʧʰ 0 0 + 0 0 0 + 0 0 0 0 0 0 0 0
0 0 + 0 0 0 0 + 0 0 0 0 0 0 0
0 0 + 0 0 0 0 0 + 0 0 0 0 0 0
s 0 + 0 0 0 0 + 0 0 0 0 0 0 0 0 0
ʃ 0 + 0 0 0 0 0 + 0 0 0 0 0 0 0 0
χ 0 + 0 0 0 0 0 0 0 + 0 0 0 0 0 0
h 0 + 0 + 0 0 0 0 0 0 0 0 0 0 0 0
m 0 + 0 0 0 + 0 0 0 0 0 + 0 0 0
n 0 + 0 0 0 0 + 0 0 0 0 + 0 0 0
ɲ 0 + 0 0 0 0 0 + 0 0 0 + 0 0 0
r 0 + 0 0 0 0 + 0 0 0 + 0 0 0
l 0 + + 0 0 0 0 + 0 0 0 + 0 0 0
ʎ 0 + + 0 0 0 0 0 + 0 0 + 0 0 0
j 0 + + 0 0 0 0 0 0 0 0 0 0 0 +
w 0 + + 0 0 0 0 0 0 0 0 0 0 0 + +
i + + + 0 0 0 0 0 0 0 0 0 0 0 +
u + + + 0 0 0 0 0 0 0 0 0 0 0 + +
a + + + 0 0 0 0 0 0 0 0 0 0 0 +
e + + + 0 0 0 0 0 0 0 0 0 0 0
o + + + 0 0 0 0 0 0 0 0 0 0 0 +
+ + + + 0 0 0 0 0 0 0 0 0 0 0 +
+ + + + 0 0 0 0 0 0 0 0 0 0 0 + +
+ + + + 0 0 0 0 0 0 0 0 0 0 0 +
+ + + + 0 0 0 0 0 0 0 0 0 0 0
+ + + + 0 0 0 0 0 0 0 0 0 0 0 +

4.4 The simulations

4.4.1 Baseline grammar trained on a corpus of segmented words

We trained the learner on the corpus of morphologically segmented words, since we expected it to be able to identify the generalizations about tautomorphemic stops when it was supplied with the crucial information about morpheme boundaries. The simulation we report here had a gain of 400 and gamma of 150, and the grammar included 120 constraints. The baseline grammar contains three morphological trigram constraints (see Table 12), corresponding to the three restricted laryngeal combinations. Plain-ejective, plain-aspirate and ejective-ejective combinations are found across a morpheme boundary but are underattested across any intervening segment.

Table 12

Morpheme-boundary constraints in the baseline grammar trained on segmented words.

Constraint Weight Sequences penalized
a. *[+plain][–mb][+cg] 13.214 [p t ʧ k q]-seg-[p’ t’ ʧ’ k’ q’]
b. *[+plain][–mb][–cont, +sg] 13.039 [p t ʧ k q]-seg-[ph th ʧh kh qh]
c. *[+cg][–mb][+cg] 12.886 [p’ t’ ʧ’ k’ q’]-seg-[p’ t’ ʧ’ k’ q’]

These three constraints act as cues to the creation of two projections, shown in Table 13. The oral stop projection is motivated by constraints (a) and (b), since the smallest natural class grouping over plain stops and aspirates or over plain stops and ejectives is the class of all oral stops. The ejective projection is motivated by constraint (c).

Table 13

Projections posited from morpheme-boundary trigrams.

Projection Defining features What is visible
oral stops [–son, –cont] p t k ʧ q, p’ t’ k’ ʧ’ q’, pʰ tʰ kʰ ʧʰ qʰ, +
ejectives [+cg] p’ t’ k’ ʧ’ q’, +

4.4.2 The full grammar with projections

In the next step of the learning procedure, the same training data set is revisited with the two projections identified in the baseline simulation, in addition to the default projection. Learning in this stage starts from scratch – it does not include any of the constraints learned on the default projection in the first stage of learning.

Table 14 shows all of the constraints that the learner posited on the two nonlocal projections, grouped by projection. Within each projection, the constraints are shown in the order they were added to the grammar – following Hayes and Wilson’s heuristics, bigram constraints are considered first, since there are fewer of them than trigrams. There are several constraints on the classes of ejectives and aspirates, as well as constraints on individual ejective and aspirate segments. There are also some constraints on place of articulation combinations. The constraints that capture the restrictions on plain-ejective and ejective-ejective combinations are given in bold.

Table 14

Part of the final grammar induced from training on the full segmented corpus of Aymara: Constraints discovered on the nonlocal projections.

Projection Constraint Weight Sequences penalized
a. [+cg] *[–syll][–syll] 4.899 ejective…ejective
b. [+cg] *[–wb][+palatal][+wb] 11.938 ejective/+…[ʧ’]…#
c. [+cg] *[–wb][+uvular][+wb] 11.324 ejective/+…[q’]…#
d. [–son, –cont] *[–wb][+cg,+labial] 12.694 stop/+…[p’]
e. [–son, –cont] *[–syll][+cg,+velar] 11.629 stop…[k’]
f. [–son, –cont] *[–syll][+cg,+uvular] 11.274 stop…[q’]
g. [–son, –cont] *[+dental][+palatal] 12.616 [t t’ tʰ]…[ʧ ʧ’ ʧʰ]
h. [–son, –cont] *[+velar][+uvular] 11.673 [k k’ kʰ]…[q q’ qʰ]
i. [–son, –cont] *[+plain,+labial][+sg] 12.352 [p]…aspirate
j. [–son, –cont] *[+plain,+uvular][+sg] 11.298 [q]…aspirate
k. [–son, –cont] *[+plain,+dental][+sg] 11.768 [t]…aspirate
l. [–son, –cont] *[+uvular][+velar] 5.629 [q q’ qʰ]…[k k’ kʰ]
m. [–son, –cont] *[–wb,+mb][+sg,+palatal] 12.242 +…[ʧʰ]
n. [–son, –cont] *[–wb][–syll][+sg] 11.366 stop/+…stop…aspirate
o. [–son, –cont] *[+plain][+cg][+wb] 6.351 plain stop…ejective…#
p. [–son, –cont] *[–syll][–syll][+uvular] 11.594 stop…stop…[q q’ qʰ]
q. [–son, –cont] *[–syll][–syll][+palatal] 11.628 stop…stop…[ʧ ʧ’ ʧʰ]
r. [–son, –cont] *[+cg][–syll][–syll] 11.832 ejective…stop…stop

To assess how these constraints capture the phonological restrictions in Aymara, we test how the grammar rates the nonce words from the repetition studies in Section 3 and then go on to look at a broader set of structures.

4.4.3 Testing the model against the experimental results

The grammar makes many of the same distinctions among nonce words that Aymara speakers made in the repetition experiment. The correlations between repetition accuracy and the score assigned by the grammar are plotted in Figures 3 and 4.

Figure 3
Figure 3

Harmony scores for stimuli from repetition experiment 1, assigned by the final grammar trained on segmented word corpus. Each point in the plot is labeled according to stimulus type: “CT” is control, “PE” is plain-ejective, “EE” is ejective-ejective.

Figure 4
Figure 4

Harmony scores for stimuli from repetition experiment 1, assigned by the final grammar trained on segmented word corpus. Each point in the plot is labeled according to stimulus type: “EL” is ejective-ejective-labial, “ED” is ejective-ejective-dental, “PL” is plain-ejective-labial and “PD” is plain-ejective-dental.

Figure 3 pools accuracy averages across participants for each nonce word in Experiment 1; the regression line shows the overall correlation between repetition accuracy and harmony scores (the shaded region is the 95% confidence interval). Each data point in the plot represents an average accuracy score for a specific word in the experiment, labeled according to the type of stimulus: “CT” is control, “PE” is plain-ejective, “EE” is ejective-ejective. Like participants in the experiment, the model distinguishes control words from forms with the restricted plain-ejective and ejective-ejective combinations. The model assigns a score of –6 to all control words, compared to an average score of –22 to plain-ejective or ejective-ejective forms.

The model also reflects the distinction between dental ejectives and other ejectives. The grammar includes constraints that penalize all ejective-ejective or plain-ejective combinations, but it also includes specific constraints on stop…[p’], stop…[q’], and stop…[k’] combinations, which further penalize forms with non-dental ejectives. Plain-ejective and ejective-ejective combinations with a dental ejective receive a higher average score of –12, while these same laryngeal combinations with other medial ejectives receive a lower average score of –25.

In the experiment, participants’ accuracy was somewhat higher for ejective-ejective forms (50%) than plain-ejective forms (32%) – a distinction that is not reflected in the model (both categories have an average score of –22). This difference was small, though significant, in the behavioral data, but the difference between plain-ejective and ejective-ejective forms was inconsistent across the two experiments, so we cannot draw firm conclusions about differences in grammaticality between these two types of combinations. The correlation between the model’s harmony scores and the Aymara speakers’ average accuracy in repeating the words is fairly high (Kendall’s τ = 0.68, Spearman’s ρ = 0.83). We report non-parametric correlations only because the harmony scores are not normally distributed.

Figure 4 plots the nonce words tested in Experiment 2. The stimulus types are labeled as follows: ED: ejective-ejective-dental, EL: ejective-ejective-labial, PD: plain-ejective-dental, PL: plain-ejective-labial. The model assigns higher scores to forms with a dental ejective (–12 on average for both ejective-dental and plain-dental combinations), which are polymorphemic, than to monomorphemic forms with a labial ejective (–33 for ejective-labial and –28 for plain-labial combinations), reflecting the main effect of place of articulation in the experiment (overall accuracy on forms with dental ejectives: 74%; overall accuracy on forms with labial ejectives: 35%). Again, the small, inconsistent differences between ejective-ejective and plain-ejective combinations observed in the experiments are not reflected in the model. The correlations between harmony scores assigned by the model and the Aymara speakers’ average accuracy in this experiment are τ = 0.52, ρ = 0.72.

The model captures the distinction between labial and dental forms in two ways. First, forms with labials have a tautomorphemic plain-ejective or ejective-ejective combination, so they violate the constraints *[+plain][+cg][+wb] on the stop projection or *[–syllabic][–syllabic] on the ejective projection. Forms with dental ejectives, on the other hand, have a morpheme boundary intervening between the stops and thus escape a violation of these constraints. Second, the model contains a constraint on labial ejectives that are not the first stop in the word (*[–wb][+labial, +cg]), but no such constraint on dental ejectives, resulting in lower scores for forms with a labial ejective than a dental ejective, regardless of morphological structure.

As discussed in Section 3.3, the independent roles of place of articulation and morphological structure can be teased apart by comparing the results of Experiment 1 and Experiment 2. In the behavioral data, there were fewer errors on forms with dental ejectives in both experiments, but the difference between forms with dental and non-dental ejectives was larger in Experiment 2, where morphological structure was also at play. Our model shows the same qualitative pattern, though the difference is slight: the difference in scores between dental and non-dental forms is 13 points for Experiment 1 but 18 points for Experiment 2.

4.4.4 Beyond the experimental data: Evaluating the full range of stop combinations

We have focused thus far on just two underattested structures in Aymara: plain-ejective and ejective-ejective combinations. There are other underattested stop combinations in Aymara, however, and in this section we look at how our model reflects these other restrictions.

As discussed in Section 2, plain-aspirate sequences are underattested. Our model includes four constraints that penalize plain-aspirate combinations. There are constraints against three specific plain stops followed by the class of aspirates – [p]-aspirate, [t]-aspirate and [q]-aspirate – as well as the more general constraint [–wb][–syllabic][+sg]. This latter constraint penalizes all stop-aspirate combinations, not just plain-aspirate ones, but only when they are preceded by another stop or a morpheme boundary.

To assess the grammar, we constructed a small set of targeted test words. All test words had a CaCa structure, and contained two stops. Table 15 shows the scores that our model assigns to words with several different laryngeal configurations: four (a)–(d) that are described as unrestricted in the literature (ejective-aspirate and aspirate-ejective combinations are discussed below), and three that are restricted (e)–(g). The scores reported in this and subsequent tables are averaged over all CaCa forms that contained the relevant combination of consonants, and did not violate any other restriction described in this section (to allow assessment of each restriction individually). For example, the score for “pl-pl” was averaged over 22 of the 25 hypothetical CaCa forms with two plain stops ([papa], [pata], [paʧa], [paka], [paqa], [tapa], etc.); combinations of dental-palatal ([taʧa]), uvular-velar ([qaka]) or velar-uvular ([kaqa]) were excluded since these are subject to additional restrictions, discussed below.

Table 15

Harmony scores assigned by our final model to a small test set of nonce words assessing laryngeal restrictions.

Lar. combo Description in lit Score Constraints violated
a. pl-pl unrestricted –6 none
b. ej-pl –6 none
c. asp-pl –6 none
d. asp-asp –8 *[-high][+sg, +palatal] (default)
e. pl-ej restricted –20 *[–wb][+cg, +labial] (stop)
*[–syll][+cg, +velar] (stop)
*[–syll][+cg, +uvular] (stop)
*[+plain][+cg][+wb] (stop)
f. pl-asp –16 *[+plain, +labial][+sg] (stop)
*[+plain, +uvular][+sg] (stop)
*[+plain, +uvular][+dental] (stop)
g. ej-ej –23 *[–syll][–syll] (ejective)
*[–wb][+cg, +labial] (stop)
*[–syll][+cg, +velar] (stop)
*[–syll][+cg, +uvular] (stop)

Table 15 shows that the grammar clearly distinguishes between unrestricted laryngeal combinations, which receive a high score of –6, and the restricted combinations, which receive lower scores (note that –6 is the highest score given to any 4 segment form by the grammar because the model includes a constraint against any segment *[], which penalizes longer words). The plain-ejective and ejective-ejective combinations receive lower scores than plain-aspirate combinations because not all plain-aspirate combinations are penalized by the grammar. The model only penalizes forms with three of the plain stops, [p t q], followed by aspirates, but assigns a score of –6, comparable to unrestricted combinations, to forms with [ʧ]-aspirate or [k]-aspirate combinations. In this case, the model dances around the exceptions to the restriction, positing several more specific but more accurate constraints on individual plain-aspirate combinations as opposed to a single more general but less accurate constraint covering all plain-aspirate combinations.

Aymara shows place cooccurrence restrictions as well as laryngeal cooccurrence restrictions. Combinations of dorsals (uvulars and velars) and combinations of coronals (dentals and palatals) are infrequent. Our model includes the constraints *[dental][palatal], *[velar][uvular] and *[uvular][velar] on the stop projection, which penalize three of the four possible combinations. The distinction between dental-palatal and palatal-dental combinations seems reasonable: there are 84 palatal-dental sequences in our corpus compared to 24 dental-palatal sequences. For dorsals, there is just 1 uvular-velar combination and 0 velar-uvular combinations in our corpus, so both constraints are warranted. Table 16 shows that dental-palatal, velar-uvular and uvular-velar combinations receive low scores, while palatal-dental combinations receive higher scores comparable to the unrestricted laryngeal combinations.

Table 16

Harmony scores assigned by our final model to a small test set of nonce words assessing place cooccurrence restrictions.

Combination Harmony score Constraints violated
a. dental-palatal –24 *[+dental][+palatal] (stop)
b. palatal-dental –6 none
c. velar-uvular –18 *[+velar][+uvular] (stop)
d. uvular-velar –14 *[+uvular][+velar] (stop)

A set of quite complicated restrictions applies to ejective-aspirate and aspirate-ejective pairs. While both of these laryngeal combinations are attested, not all individual combinations of segments occur. The attested combinations of ejectives and aspirates in our corpus are shown in Table 17. As described in MacEachern (1997), which stop is ejective and which is aspirate is predictable based on place of articulation. If the initial consonant is a labial or a uvular, it will be aspirated (see (j)–(l)); otherwise, it is ejective (see (a)–(h). In uvular-labial pairs, the uvular is ejective (see (i)). Any combination not shown in the table is unattested.

Table 17

Counts of Ejective-aspirate and aspirate-ejective combinations.

ejective-aspirate observed ejective-aspirate observed aspirate-ejective observed
a. t’…pʰ 2 f. ʧ’…qʰ 5 j. pʰ…t’ 7
b. t’…kʰ 11 g. k’…pʰ 15 k. pʰ…ʧ’ 24
c. t’…qʰ 334 h. k’…tʰ 11 l. qʰ…t’ 7
d. ʧ’…pʰ 29 i. q’…pʰ 13
e. ʧ’…kʰ 56
Total 476 Total 38

Table 18 shows the harmony scores assigned to nonce words with attested and unattested combinations of ejectives and aspirates. The model correctly distinguishes between attested and unattested aspirate-ejective sequences, via the three place specific constraints (*[–wb][+labial, +cg], *[–wb][+uvular, +cg] and *[–wb][+velar, +cg]) which penalize non-initial labial, velar and uvular ejectives that are preceded by another stop. The grammar doesn’t include any constraints on ejective-aspirate sequences, and attested and unattested combinations are only weakly distinguished by the model. This difference arises from an orthogonal bigram constraint against [aeo][ʧʰ] sequences.

Table 18

Harmony scores assigned by our final model to nonce words for aspirate/ejective combinations broken down by place.

Combination Harmony score Constraints violated
a. aspirate-ejective, attested –6 none
b. aspirate-ejective, unattested –16 *[–wb][+labial, +cg] (stop)
*[–wb][+uvular, +cg] (stop)
*[–wb][+velar, +cg] (stop)
c. ejective-aspirate, attested –7 *[-high][+sg, +palatal] (default)
d. ejective-aspirate, unattested –10 *[-high][+sg, +palatal] (default)

Hayes and Wilson’s model has a preference for more general constraints, stated over larger natural classes. To completely match the distribution of stops in the language, the model would have to include many constraints on individual segmental combinations. While the grammar does include several constraints that refer to classes of a single segment, in other cases such specific constraints are not learned. We leave it to future experimental work to identify what generalizations Aymara speakers have learned about ejective-aspirate and aspirate-ejective combinations, and how the model might need to be modified to match speaker behavior.

Finally, pairs of segments that differ only in laryngeal features (e.g., [t’…tʰ], [p’…p], [kʰ…k], etc.) are also reported to be restricted and are nearly absent in our corpus. Our model does not include constraints on any of these combinations, again because such constraints would refer to individual segments and the learner typically does not learn such constraints.

4.5 An unsegmented corpus

To further establish the role of morphological information in the success of our model, we ran learning simulations on the same word corpus, but with morpheme boundary markers removed. Recall from Table 3 that there are many instances of plain-aspirate, plain-ejective and ejective-ejective combinations in the corpus as a whole, but they are mostly found across a morpheme boundary. We tested whether these sequences were frequent enough to obscure the restrictions if morpheme boundaries are not represented. To start, we look at the number of plain-ejective, plain-aspirate and ejective-ejective combinations that occur in a baseline trigram in the unparsed data set in Table 19; for comparison, we repeat the numbers for tautomorphemic and heterormorphemic sequences in the parsed data set from Table 3. For all three restricted combinations, there are exceptions in the unparsed data set, and plain-aspirate combinations in particular are quite frequent. It is worth noting that the heteromorphemic trigram combinations in the parsed data set are actually bigrams in the unparsed data set (for example, [ati+p+t’a+ɲ] in the parsed data set appears as [atipt’aɲ] in the unparsed data set), so these forms do not introduce exceptions at the trigram level in the unparsed data set. Instead, many of the exceptions that we see in the unparsed data are actually tetra- or penta-grams in the parsed data set that appear as trigrams once morpheme boundaries are removed. For example, the form [huk’ampst’aɲ] appears in the unparsed data set but is [huk’a+m+p+s+t’a+ɲ] in the parsed data set.

Table 19

Number of observed combinations appearing in a trigram configuration in the parsed (tautomorphemic and heteromorphemic columns) and unparsed data sets, for the three restricted stop combinations.

tautomorphemic heteromorphemic unparsed
plain-aspirate 0 149 434
plain-ejective 0 659 17
ejective-ejective (non-identical) 0 1 5

We then turned to examine whether the learner found placeholder trigram constraints when trained on the unparsed corpus (we call this the “Induced Unparsed Model”, in contrast to the model we presented in Section 4.4.3).8 We tested a range of gain and gamma combinations, and compared the placeholder trigrams found in the baseline model when trained on the parsed and unparsed data sets. All runs of the learner were asked to find a maximum of 200 constraints. We report on a representative sample of the numerous combinations we tried in Table 20. Morpheme boundary trigram constraints corresponding to at least two of the three restricted combinations are found for the parsed data under almost all settings, and settings with a higher gain or gamma allow the model to detect all three restrictions in the baseline grammar. For the unparsed data, placeholder trigram constraints are only found with a very low gamma of 1. Even with gamma this low, the Induced Unparsed model only finds a single placeholder trigram on ejectives; this model never finds a placeholder trigram corresponding to the plain-aspirate restriction and may miss the ejective-ejective restriction as well.

Table 20

Morpheme-boundary trigram constraints found at various settings in the parsed and the unparsed versions of the Aymara corpus.

gain gamma parsed corpus unparsed corpus
100 1 none *[+plain][][+cg]
400 1 *[+plain][–mb][–continuant,+sg] *[+plain][][+cg]
*[–sonorant, –continuant][–mb][+cg]
800 1 *[+plain][–mb][–continuant,+sg] *[–son, –cont][][+cg]
*[–sonorant, –continuant][–mb][+cg]
100 50 *[+plain][–mb][+cg] none
*[+plain][–mb][+sg, –cont]
400 50 *[+plain][–mb][+cg] none
*[+plain][–mb][+sg, –cont]
800 50 *[+cg][–mb][+cg] none
*[+plain][–mb][+cg]
*[+plain][–mb][+sg, –cont]
100 150 *[+cg][–mb][+cg] none
*[+plain][–mb][+cg]
*[+plain][–mb][+sg, –cont]
400 150 *[+cg][–mb][+cg] none
*[+plain][–mb][+cg]
*[+plain][–mb][+sg, –cont]
800 150 *[+cg][–mb][+cg] none
*[+plain][–mb][+cg]
*[+plain][–mb][+sg, –cont]

One of the grammars built after training on the unparsed corpus is investigated in more detail in Figure 5 and Table 21. This grammar’s gain is 500, and gamma = 1. The baseline grammar constraint *[+plain]-any_seg-[+cg] motivates the [–son, –cont] projection, on which all restrictions could be correctly stated. But the final grammar achieves a poor fit to the experimental data, as shown by the nearly vertical lines in Figure 5. The correlations for Experiment 1 are τ = 0.22 and ρ = 0.35; the correlations for Experiment 2 are τ = 0.43 and ρ = 0.60.

Figure 5
Figure 5

The Induced Unparsed Model: a grammar with induced projections built from the unsegmented corpus, tested on Aymara experimental data. Data points are labeled according to stimulus type; Exp. 1: “CT” is control, “PE” is plain-ejective, “EE” is ejective-ejective. Exp. 2: ED: ejective-ejective-dental, EL: ejective-ejective-labial, PD: plain-ejective-dental, PL: plain-ejective-labial).

Table 21

Constraints on the stop projection discovered after training on a corpus of unparsed words (Induced Unparsed Model).

Constraint on –son, –cont projection weight violated by
a. *[–wb][+cg] 1.494 stop…[p’ t’ k’ q’ ʧ’]
b. *[+cg][+wb] 1.998 [p’ t’ k’ q’ ʧ’] … #
c. *[+plain,+uvular][+wb] 0.989 [q qʰ q’] … #
d. *[+palatal][+wb] 1.154 [ʧ ʧʰ ʧ’] … #
e. *[+cg,+labial][] 2.068 [p pʰ p’]… stop
f. *[+velar][+uvular] 1.73 [k kʰ k’] … [q qʰ q’]
g. *[+plain][+sg,+uvular] 4.31 [p t k q ʧ] … [qʰ]
h. *[+cg,+uvular][–wb] 0.551 [q’] … stop
i. *[+wb][+cg,+dental][+palatal] 2.731 # [t’] [ʧ ʧʰ ʧ’]
j. *[+cg,+palatal][–wb] 0.997 [ʧ’] … stop
k. *[+cg,+velar][][+wb] 1.981 [k’] … stop … #

The Induced Unparsed Model’s lack of success at representing phonologically meaningful underattestations in the unparsed data becomes clear when we look at the constraints it posits on the stop projection (see Table 21). Their low weights are due in part to the gamma setting; almost all of them are violated frequently by Aymara words. While these constraints penalize some restricted combinations, they don’t capture the full extent of the restrictions nor are their weights high enough to distinguish restricted from unrestricted structures.

With this low gamma setting, the Induced Unparsed Model does not succeed in distinguishing meaningful underattestations in the data, and instead learns many, low weighted constraints with numerous exceptions. Our more succesful Induced Parsed Model in Section 4.4.3 has higher gamma and gain (in addition to access to morheme boundaries), which makes the Induced Parsed Model more selective and a better fit to the phonological distinctions supported by traditional phonological analysis and experimental work with native speakers. Given these same settings (400 gain, 150 gamma), the Induced Unparsed Model’s grammar fails to include any placeholder trigram constraints from which it could posit nonlocal projections.

We also considered whether it was possible to capture phonological distinctions on the stop projection with unparsed data, when the learner was given a higher gain and gamma and we supplied the stop projection manually. This is the Manual Unparsed Model. As shown in Figure 6, this model’s grammar achieves a better fit to Aymara speakers’ performance in the repetition experiments than the grammar in Figure 5. But there are interesting differences in the details.

Figure 6
Figure 6

The Manual Unparsed Model: a grammar built from the unsegmented corpus, tested on Aymara experimental data, with manually supplied projections. Data points are labeled according to stimulus type; Exp. 1: “CT” is control, “PE” is plain-ejective, “EE” is ejective-ejective. Exp. 2: ED: ejective-ejective-dental, EL: ejective-ejective-labial, PD: plain-ejective-dental, PL: plain-ejective-labial).

The Manual Unparsed Model, trained on unparsed data, captures the distinction between dental and labial ejectives via constraints on everything but dentals (in Table 22, (c), (e)–(g), (k)–(m)). But this model fails to make a distinction between control stimuli such as [lap’a] (100% correct) and [p’it’a] (67% correct) – they all receive a harmony score of –6. This is because this grammar does not include a general constraint against ejectives in second position on the stop projection. Constraint (c) on the ejective position is too specific. This grammar is overfitting, learning overly specific constraints to accommodate the exceptions in the data.

Table 22

Constraints on ejective and stop projections induced from a corpus without morpheme boundaries.

Projection Constraint Weight Seq’s penalized
a. +cg *[–wb][+labial] 8.06 ejective…[p’]
b. +cg *[–wb][+palatal] 5.531 ejective…[ʧ’]
c. +cg *[+dental][–wb] 14.559 [t’]…ejective
d. -son-cont *[][–wb,+mb] 17.915 stop … +
e. -son-cont *[–wb][+cg,+uvular] 6.023 stop … [q’]
f. -son-cont *[–wb][+cg,+velar] 6.265 stop … [k’]
g. -son-cont *[–wb][+cg,+labial] 6.942 stop… [p’]
h. -son-cont *[+plain][+sg,+uvular] 4.974 [p t k q ʧ]… [qʰ]
i. -son-cont *[+dental][+sg,+palatal] 6.33 [t tʰ t’] … [ʧʰ]
j. -son-cont *[+dental][+cg,+palatal] 13.976 [t tʰ t’] … [ʧ’]
k. -son-cont *[+plain,+labial][+cg,+palatal] 13.507 [p]…[ʧ’]
l. -son-cont *[+palatal][+cg,+palatal] 5.633 [ʧ ʧʰ ʧ’]…[ʧ’]
m. -son-cont *[+plain,+uvular][+cg,+palatal] 12.905 [q]…[ʧ’]
n. -son-cont *[+cg,+labial][+sg] 5.42 [p’]…aspirate
o. -son-cont *[+plain][+sg][+cg] 13.224 plain…aspirate …ejective
p. -son-cont *[+plain][+sg][+sg] 13.625 plain…aspirate…aspirate
q. -son-cont *[–wb][][+cg,+palatal] 13.283 stop…stop…[ʧ’]
r. -son-cont *[+sg][+plain][+sg,+labial] 12.555 aspirate…plain…[pʰ]
s. -son-cont *[+sg][+cg][+sg] 12.723 aspirate…ejective…aspirate

When it comes to Experiment 2, however, the Manual Unparsed Model’s fit to behavioral data is comparable to the model we reported in Section 4.4.3 (although the differences between “good” and “bad” forms are smaller in Experiment 2 – the opposite of the parsed grammar). The reasons for this have to do with the abundance of dental ejectives in Aymara suffixes; by positing place-specific constraints against non-dental ejectives in 2nd position on the stop projection, the model manages to approximate the same generalizations.

For a quantitative comparison, Table 23 summarizes the non-parametric correlations between the harmony scores each model assigns to the experimental stimuli and the averaged accuracy in the repetition experiments with Aymara speakers. Model (a), which uses parsed data both for inducing the projections and for the final grammar has the best correlations with Experiment 1, and the best correlations overall. Model (b), which includes no morphological information at all, achieves the lowest correlations across the board. The third model, (c), does worse on the first experiment and slightly better on the second experiment, but its overall correlations with behavioral data are not as good as the model in (a).

Table 23

Correlations between harmony scores assigned by the three models and accuracy in repetition experiments with Aymara speakers.

Experiment 1 Experiment 2 Overall
τ ρ τ ρ τ ρ
a. Induced Parsed Model (Figs. 3, 4) 0.68 0.83 0.52 0.72 0.59 0.77
b. Induced Unparsed Model (Fig. 5) 0.22 0.35 0.43 0.60 0.31 0.42
c. Manual Unparsed Model (Fig. 6) 0.49 0.67 0.55 0.74 0.52 0.69

Zooming out from laryngeal cooccurrence restrictions, the Manual Unparsed Model is not quite right in other ways. Aymara morphemes obey an exceptionless constraint against CCC clusters (0 of them in the corpus), but such clusters are created by syncope at morpheme boundaries (recall Section 2.1). The right constraint to capture this would be *[–syll][–syll][–syll]. The Induced Parsed Model (a) contains such a constraint, and gives it a high weight of 14.506. The Manual Unparsed Model cannot motivate such a constraint – there are 3322 words in the Aymara corpus that have such clusters. What this model does instead is posit many specific constraints, sometimes with rather low weights, against various CCC clusters that it sees few examples of – e.g., *[–son, –cont][+labial][–cont], with a weight of 6.778. This is just one example of a morpheme structure constraint that a morphology-agnostic learner cannot capture.

5 Discussion

5.1 Morpheme boundaries and place of articulation

The experimental and modeling work show that the distinction between tautomorphemic and heteromorphemic stop-ejective sequences is likely both a direct effect of morphology and also an effect of place of articulation. Participants in Experiment 1 made slightly fewer errors on stimuli with dental ejectives than with other ejectives, reflecting the frequency of dental ejectives in non-initial position. This effect was exaggerated in Experiment 2, where the structure of nonce words favored a polymorphemic parse. Similarly, the constraints in the grammar are sensitive to both morphological structure and place of articulation. The model includes constraints on tautomorphemic but not hetermorphemic laryngeal combinations and it also includes more specific constraints on individual segments; due to their presence in suffixes, [t’] and [ʧ’] are more frequent than other ejectives in non-initial position, and the constraints in the grammar reflect this asymmetry.

One element of the experimental results not directly captured by the model is that while participants were more accurate on dental ejectives than labial ejectives in Experiment 2, they still made more errors on dental ejective forms than on filler items. In contrast, our model predicts forms like [k’ast’aɲa] to be fully grammatical. Errors on dental ejectives likely reflect the two available morphological parses for these forms. Since these were nonce words that weren’t associated with any meaning, speakers did not know for certain that [t’a] in these forms was the verbal suffix. There are also roots in the language that end in [t’a]. Errors in the repetition task are predicted if participants occasionaly parse a form like [k’ast’aɲa] as [k’ast’a+ɲa], while accurate repetitions are expected if the [k’as+t’a+ɲa] parse is hypothesized.

5.2 Phonotactic learning and segmentation

Our modeling simulations showed that while some pieces of the restrictions can be detected in an unsegmented corpus, morpheme boundaries are necessary to fully capture the patterns. This means that infants and children learning Aymara cannot have a complete phonotactic grammar until they have learned enough morphology to segment the speech stream. The prediction is that the trajectory of phonotactic awareness of laryngeal restrictions should be different in Aymara learners than in learners of a language where laryngeal restrictions are categorical and hold at the word level, like Quechua.

Even though languages do not mark every morpheme boundary phonotactically, speakers of languages such as Finnish and Dutch have been shown to use phonotactic knowledge to segment speech in experimental settings (Suomi et al. 1997; McQueen 1998). The model we presented makes a simplifying assumption that at some point, the learner examines a fully parsed corpus with boundaries. A more realistic approach would use phonotactics to deduce where boundaries are located (as in the StaGe model of Adriaans and Kager 2010). StaGe uses bigram probabilities to posit word boundaries. Aymara would lend itself to such an approach: ejectives and aspirates are most common root-/word-initially (recall Table 2), so the distribution of plain-ejective bigrams would be a clue to boundaries even for a learner that does not yet have detailed morphological segmentation information. We leave for future work an implementation of a more complete model that deduces both where morpheme/word boundaries are and whether they lead to nonlocal projections.

5.3 Typological considerations: Syncope and morpheme boundary projections

In our baseline grammar for Aymara, the nonlocal restrictions on stop combinations are reflected in morpheme boundary trigrams like *[+plain][–mb][+sg, –continuant]. The pattern of syncope in Aymara is crucial to these constraints being found, because syncope allows stops to appear adjacent to one another across an intervening morpheme boundary.9 Without syncope, a hypothetical form like /lipa+t’a/ would be realized as [lipa+t’a] (as opposed to [lip+t’a]), with the interacting stops only appearing in a tetragram; syncope creates a consonant-morpheme_boundary-consonant trigram. In a language without syncope, the restrictions on stop combinations may still be observable in a baseline trigram, but they would be reflected in a placeholder trigram (in which the medial gram is ‘any segment’, e.g., *[+plain][][+sg, –continuant]) as opposed to in a morpheme boundary trigram. Under our proposal about how nonlocal projections are induced, both placeholder trigrams and morpheme boundary trigrams trigger the learner to add a nonlocal projection to their search space of constraints, so syncope should not be crucial to the learning of a nonlocal restriction.

A learner does not know in advance whether including morpheme boundaries on to projections will lead to better generalizations. We showed in Gouskova and Gallagher (to appear) that in languages like Quechua, for example, it is possible to discover the nonlocal interactions between stops from phonological words alone, and presumably Quechua learners acquire this knowledge before they are morphologically aware, since the restrictions are categorical within words. It is important in Quechua that the nonlocal projection include all stops but not include morpheme boundaries, since the relevant restrictions hold both within morphemes and across morpheme boundaries.

We hypothesize that in both Quechua-type languages and Aymara-type languages, learning starts on unparsed words, and if any projections are discovered, they include word but not morpheme boundaries. When words are morphologically segmented, the phonotactic grammar is reassessed, in case any generalizations were missed in the unparsed grammar. In a language like Aymara, we showed that the nonlocal laryngeal restrictions are not noticeable from the unparsed data, so a learner of Aymara would not be able to learn these restrictions until they had acquired some morphological structure, at which point nonlocal projections with the morpheme boundary symbol and constraints on this projection would be added to the grammar. In a language like Quechua, where similar laryngeal restrictions are not morphologically sensitive, the laryngeal restrictions should be learned earlier and represented on a nonlocal projection that does not include morpheme boundaries. When the Quechua learner returns to phonotactic learning with morphological information, the learner should not uncover any new placeholder trigrams on stops, since the distribution of stop combinations is already fully accounted for on the nonlocal stop projection without the morpheme boundary symbol. We verified that this is in fact how things work for Quechua. We first trained a baseline model on an unparsed corpus (about 10k words, described in Gouskova and Gallagher to appear), from which the leaner built a nonlocal stop projection. We then trained a model with this projection on the parsed data, and indeed, the learner found constraints on the nonlocal stop projection but did not learn any new placeholder trigrams on the default projection that would motivate adding the morpheme boundary symbol to the stop projection. We leave it to future work to examine in more detail how morphologically insensitive generalizations can be incorporated into a later stage of learning where morphological structure is represented.

5.4 Morphologically sensitive phonotactics and the subset problem

Patterns such as those of Aymara present two types of subset problem (Baker 1979; Bowerman 1988). First, phonotactic learning in general requires the learner to err on the side of assuming more restrictive grammars and to construct its own negative evidence. In order to posit these more restrictive grammars, an Optimality-Theoretic learner with innate constraints requires a bias to keep faithfulness constraints ranked low; the negative evidence comes from the theory’s Gen component (Hayes 2004; Prince and Tesar 2004). If the learner induces constraints from data instead, it constructs its own negative evidence by comparing the attested data to plausible phonotactic distributions generated at random (Hayes and Wilson 2008). But even such a learner will not notice nonlocal interactions – it must either be given the nonlocal representations a priori (as in Hayes and Wilson’s proposal), or it must be nudged in the direction of looking for nonlocal representations. In our proposal, the learner does a second pass of phonotactic learning in response to generalizations that it may have missed on the first pass, signaled by placeholder trigram constraints. This kind of bias is designed to alert the learner to the need for more restrictive constraints.

Second, if the learner assumes that the sequences allowed in words are also allowed in morphemes, then this morphologically agnostic learner will learn the superset grammar. For a language such as English, the superset learner hears words with [md] clusters and assumes, incorrectly, that such clusters are allowed anywhere, not just at morpheme boundaries. Our experiments suggest that Aymara speakers make the more conservative assumption. We suggest that in order to learn the right level of generalization, an Aymara learner has to revisit phonotactic learning once morphological information is available. We implemented this by supplying morpheme boundaries and using sequences at boundaries as a clue that nonlocal interactions are present.

The alternative we did not discuss is to split the learning data into morphemes, and learn phonotactics over these morphemes. For Aymara, a plausible learning data set that would reveal the right regularities would be a corpus of roots. This is the learning data we used in Gouskova and Gallagher (to appear). The benefit of using roots as learning data is that the learner may use just one simple cue, segmental placeholder trigrams, without attending to morpheme boundaries or reifying them to the representational level of segments with feature values. This would essentially use a sublexicon to learn morpheme-level phonotactics that hold over just a subset of the language’s forms (Gouskova and Becker 2013; Becker and Gouskova 2016). The main reason we did not use a sublexicon model here is that it is not clear how to define phonotactics over bound roots; verbal roots in Aymara are obligatorily suffixed. The application of phonotactic learning to words is straightforward to implement, whether they are embedded in connected speech or taken as a lexicon-like list. On the other hand, bound roots and suffixes are not complete, pronounceable phonological objects, and learning over such entities would be one level of abstraction removed from a realistic learning scenario.

An anonymous reviewer suggests an alternative to using morpheme boundaries: marking segments of the root with a [±root] feature. This would effectively allow the learner to capture the root-internal cooccurrence restriction by including [+root] in the constraint. One problem with this move is computational implementation: the addition of this binary feature would mean a larger set of natural classes (and therefore many more constraints) for the learner to analyze. As for capturing the right level of generalization, this would be fitting for a language like Quechua, where affixes do not have laryngeals at all (motivating *[–root, +cg] and *[–root, +sg]). But in the case of Aymara, the restrictions do hold both inside roots and inside suffixes, so this would probably not be sufficient – Aymara requires something more along the lines of McCarthy’s (1989) planar separation for morphemes, so they dwell on different planes and escape cooccurrence restrictions.

5.5 Cue-based learning

Our approach uses properties of the learning data to detect other properties of the language, and as such, it can be considered to be an example of cue-based learning (Dresher and Kaye 1990; Gibson and Wexler 1994; Dresher 1999). A critique of cue-based learning is that it assumes a lot of learning machinery specific to language (Nazarov and Jarosz 2017). We do believe that the problem of boundary-sensitive phonotactics is a fairly language-specific one, and it is not immediately clear how one would approach it without recognizing morphemes as separate pieces with boundaries.

The logic of inducing nonlocal interactions from trigrams is not strictly phonological: one could argue for a domain-general status of the deduction that if A and B cannot cooccur nearby in a configuration A-X-B, it worthwhile to check whether A and B can cooccur at longer distances. Our learner uses an independent criterion to check whether A and B can interact at all – they must be part of a natural class, as defined by the language’s phonological contrasts and alternation system. This requirement, along with the requirement that constraints in the grammar must receive robust statistical support, minimizes the learner’s ability to notice nonlocal interactions between unrelated segments, which most linguists would describe as accidental.

6 Conclusion

This paper has examined a set of morphologically sensitive, nonlocal restrictions in Aymara in a corpus study, behavioral experiments, and a computational model. Our corpus results showed that the restricted combinations are not categorical, even tautomorphemically, though there is an asymmetry between tautomorphemic and heteromorphemic combinations. We show that both the nonlocality and the morphological sensitivity of the restrictions is observable from trigram constraints in a grammar trained on just the linear string of segments. By building projections based on these morpheme boundary trigram constraints, our model captures the range of restrictions reported in the language. The induced grammar succeeds through a combination of general constraints on relatively large classes, like the class of stops or the class of ejectives, and more specific constraints on individual segments.

Our experimental work supports the traditional description of the phonotactics of the language, and our modeling work shows that nonlocal restrictions are learnable inductively, by attuning to properties of the phonotactics of the linear string. By comparing a phonotactic grammar trained on parsed vs. unparsed data, we saw that the patterns are largely obscured by the exceptions found across morpheme boundaries – only one of the restricted combinations is observable as a baseline trigram in the unparsed corpus, and only at an extremely low gamma. With these settings, the model is not generally distinguishing between phonologically meaningful gaps and accidental gaps and achieves a poor fit to the data. We hope that future work will incorporate the learning of morphological boundaries and phonotactics into a single model.

Additional Files

The additional files for this article can be found as follows:

Notes

  1. The difference between “tier” and “projection” has to do with representational assumptions: the term tier has historically meant a level of a structured autosegmental representation, whereas a projection is merely a representation that includes all and only the members of some class, e.g., all the vowels in a word on a [+syllabic] projection. We adopt the latter term following Hayes and Wilson (2008), see also Clements (1976); Goldsmith (1976); McCarthy (1979); Archangeli (1985); McCarthy (1989) and many others. [^]
  2. Mid vowels here and throughout are allophonic, triggered by the presence of a preceding or following uvular consonant. While these examples show a suffix with an ejective or affricate attaching directly to a root, these suffixes may attach after other suffixes as well (indeed, this is more frequent than attachment to a root in our corpus). [^]
  3. The corpus posted on the website is cut off at 50,000 forms. The full version of the corpus was obtained via personal communication with the developers. [^]
  4. An anonymous reviewer reports that in Peruvian Aymara, forms with two identical ejectives can variably be produced with just an initial ejective, e.g., [t’ant’a] ~ [t’anta]. If this is also true in Bolivian Aymara, knowledge of this alternation may influence participants to choose C2 de-ejectivization as a strategy to repair pairs of non-identical ejectives. [^]
  5. There are several roots that do not undergo syncope (recall 2.1) when suffixed with [-t’a] in our corpus: [hawi], [ana], [qoʎa], and [wajka]. [^]
  6. Morpheme boundary symbols are the simplest implementation for this, but there are of course alternatives. For a discussion of theoretical and learnability issues, see Pyle (1972); McCarthy (1989); Beckman (1997); Adriaans and Kager (2010); Becker and Allen (submitted); Kastner and Adriaans (2018); and others. [^]
  7. The Java learner has a technical limitation: it cannot handle strings longer than about 40 characters. The maximum string length is used in generating the “sample salad” of phoneme strings that the learner compares to the learning data in figuring out what is missing from the learning data, and it must do so in finite time, so the shorter the words in the learning data, the better. See Daland (2015) for more. [^]
  8. An anonymous reviewer asks whether the morpheme boundary marker was included in the feature set for the Induced Unparsed Model, despite not being present in the learning data. The presence of a segment in the feature file could influence the learning process, since the model uses the feature set to randomly sample the expected distribution of the given segments and compare that to the observed distribution in the data. We ran models under both conditions, and got similar results. Table 20 reports models with the morpheme boundary symbol in the feature set. [^]
  9. We thank two anonymous reviewers for pointing out the importance of syncope, leading to the discussion in this section. [^]

Acknowledgements

For feedback on this and related work, we would like to thank audiences at NYU, AMP 6 at UC San Diego, and NELS 49 at Cornell. We would like to thank Ildikó Emese Szabó for assistance with building the corpus, and Colin Wilson for sharing the code for the gain-based version of the MaxEnt Phonotactic Learner.

Funding Information

This work was funded by NSF BCS-1724753 to the first two authors.

Competing Interests

The authors have no competing interests to declare.

References

Adriaans, Frans & René Kager. 2010. Adding generalization to statistical learning: The induction of phonotactics from continuous speech. Journal of Memory and Language 62. 311–331. DOI:  http://doi.org/10.1016/j.jml.2009.11.007

Archangeli, Diana. 1985. Yokuts harmony: Evidence for coplanar representation in nonlinear phonology. Linguistic Inquiry 16. 335–372.

Baker, C. L. 1979. Syntactic theory and the projection problem. Linguistic Inquiry 10. 533–581.

Bates, Douglas, Martin Maechler, Ben Bolker & Steven Walker. 2014. lme4: Linear mixed-effects models using S4 classes. http://CRAN.R-project.org/package=lme4, R package version 1.7.

Becker, Michael & Blake Allen. Submitted. Learning alternations from surface forms with sublexical phonology. Phonology. http://ling.auf.net/lingbuzz/002503.

Becker, Michael & Maria Gouskova. 2016. Source-oriented generalizations as grammar inference in Russian vowel deletion. Linguistic Inquiry 47. 391–425. DOI:  http://doi.org/10.1162/LING_a_00217

Beckman, Jill. 1997. Positional faithfulness, positional neutralization, and Shona vowel harmony. Phonology 14. 1–46. DOI:  http://doi.org/10.1017/S0952675797003308

Bennett, William. 2013. Dissimilation, consonant harmony and surface correspondence. Rutgers, NJ: Rutgers University dissertation.

Berent, Iris, Colin Wilson, Gary Marcus & Doug Bemis. 2012. On the role of variables in phonology: Remarks on Hayes and Wilson (2008). Linguistic Inquiry 43. 97–119. DOI:  http://doi.org/10.1162/LING_a_00075

Bowerman, Melissa. 1988. The “no negative evidence” problem: How do children avoid constructing an overly general grammar. In John A. Hawkins (ed.), Explaining language universals, 73–101. Oxford, UK: Basil Blackwell.

Chomsky, Noam & Morris Halle. 1968. The sound pattern of English. New York: Harper & Row.

Clements, George N. 1976. Palatalization: Linking or assimilation? In Salikoko S. Mufwene, Carol A. Walker & Sanford B. Steever (eds.), Papers from Chicago Linguistic Society 12, 96–109. Chicago, IL: Chicago Linguistic Society.

Cser, András. 2010. The alis/aris allomorphy revisited. In Franz Rainer, Wolfgang Dressler, Dieter Kastovsky & Hans Luschuetzky (eds.), Variation and change in morphology: Selected papers from the 13th international morphology meeting, Vienna, 33–51. Amsterdam and Philadelphia: John Benjamins. DOI:  http://doi.org/10.1075/cilt.310.02cse

Daland, Robert. 2015. Long words in maximum entropy phonotactic grammars. Phonology 32. 353–383. DOI:  http://doi.org/10.1017/S0952675715000251

Daland, Robert, Bruce Hayes, James White, Marc Garellek, Andrea Davis & Ingrid Norrmann. 2011. Explaining sonority projection effects. Phonology 28. 197–234. DOI:  http://doi.org/10.1017/S0952675711000145

Della Pietra, Stephen, Vincent Della Pietra & John Lafferty. 1997. Inducing features of random fields. IEEE transactions on pattern analysis and machine intelligence 19. 380–393. DOI:  http://doi.org/10.1109/34.588021

de Lucca, Manuel. 1987. Diccionario práctico Aymara-Español, Español Ayamara. Cochabamba, Bolivia: Los Amigos del Libros.

Dresher, Elan. 1999. Charting the learning path: Cues to parameter setting. Linguistic Inquiry 30. 27–67. DOI:  http://doi.org/10.1162/002438999553959

Dresher, Elan & Jonathan Kaye. 1990. A computational learning model for metrical phonology. Cognition 34. 137–195. DOI:  http://doi.org/10.1016/0010-0277(90)90042-I

Gallagher, Gillian. 2016. Asymmetries in the representation of categorical phonotactics. Language 92. 557–590. DOI:  http://doi.org/10.1353/lan.2016.0048

Gibson, Edward & Kenneth Wexler. 1994. Triggers. Linguistic Inquiry 25. 407–454.

Goldsmith, John. 1976. Autosegmental phonology. Cambridge, MA: Massachusetts Institute of Technology dissertation.

Goldwater, Sharon & Mark Johnson. 2003. Learning OT constraint rankings using a maximum entropy model. In Jennifer Spenader, Anders Eriksson & Östen Dahl (eds.), Proceedings of the Stockholm workshop on variation within Optimality Theory, 111–120. Stockholm: Stockholm University.

Gouskova, Maria & Gillian Gallagher. To appear. Inducing nonlocal constraints from baseline phonotactics. Natural Language and Linguistic Theory.

Gouskova, Maria & Michael Becker. 2013. Nonce words show that Russian yer alternations are governed by the grammar. Natural Language and Linguistic Theory 31. 735–765. DOI:  http://doi.org/10.1007/s11049-013-9197-5

Hardman, Martha James. 2001. Aymara. Munich: Lincom Europa.

Hayes, Bruce. 2004. Phonological acquisition in Optimality Theory: The early stages. In René Kager, Joe Pater & Wim Zonnevald (eds.), Fixing priorities: Constraints in phonological acquisition, 158–203. Cambridge: Cambridge University Press.

Hayes, Bruce & Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39. 379–440. DOI:  http://doi.org/10.1162/ling.2008.39.3.379

Hayes, Bruce & James White. 2013. Phonological naturalness and phonotactic learning. Linguistic Inquiry 44. 45–75. DOI:  http://doi.org/10.1162/LING_a_00119

Kastner, Itamar & Frans Adriaans. 2018. Linguistic constraints on statistical word segmentation: The role of constraints in Arabic and English. Cognitive Science 42. 494–518. DOI:  http://doi.org/10.1111/cogs.12521

MacEachern, Margaret. 1997. Laryngeal cooccurrence restrictions. Los Angeles, CA: University of California, Los Angeles dissertation.

Martin, Andrew. 2007. The evolving lexicon. Los Angeles, CA: University of California, Los Angeles dissertation.

McCarthy, John. 1979. Formal problems in Semitic phonology and morphology. Cambridge, MA: Massachusetts Institute of Technology dissertation.

McCarthy, John. 1989. Linear order in phonological representation. Linguistic Inquiry 20. 71–99.

McQueen, James. 1998. Segmentation of continuous speech using phonotactics. Journal of Memory and Language 39. 21–46. DOI:  http://doi.org/10.1006/jmla.1998.2568

Nazarov, Alexei & Gaja Jarosz. 2017. Learning parametric stress without domain-specific mechanisms. In Karen Jesney, Charlie O’Hara, Caitlin Smith & Rachel Walker (eds.), Proceedings of the Annual Meeting on Phonology 2016. Linguistic Society of America. DOI:  http://doi.org/10.3765/amp.v4i0.4010

Prince, Alan & Bruce Tesar. 2004. Learning phonotactic distributions. In René Kager, Joe Pater & Wim Zonnevald (eds.), Fixing priorities: Constraints in phonological acquisition, 245–291. Cambridge: Cambridge University Press.

Pyle, Charles. 1972. On eliminating BMs. In Paul Peranteau, Judith Levi & Gloria Phares (eds.), Papers from Chicago Linguistic Society 8, 516–532. Chicago, IL: Chicago Linguistic Society.

R development core team. 2018. R: A language and environment for statistical computing. Vienna, Austria. http://www.R-project.org.

Steriade, Donca. 1987. Redundant values. In Anna Bosch, Barbara Need & Eric Schiller (eds.), Papers from Chicago Linguistic Society 23, 339–362. Chicago, IL: Chicago Linguistic Society.

Suomi, Kari, James McQueen & Anne Cutler. 1997. Vowel harmony and speech segmentation in Finnish. Journal of Memory and Language 36. 422–444. DOI:  http://doi.org/10.1006/jmla.1996.2495

Suzuki, Keiichiro. 1998. A typological investigation of dissimilation. Tucson, AZ: University of Arizona, Tucson dissertation.

Trubetzkoy, Nikolai. 1939. Grundzüge der phonologie [Foundations of phonology]. Prague: Travaux du cercle linguistique de Prague.

Wilson, Colin & Gillian Gallagher. 2018. Accidental gaps and surface-based phonotactic learning: A case study of South Bolivian Quechua. Linguistic Inquiry 49. 610–623. DOI:  http://doi.org/10.1162/ling_a_00285