1 Introduction

In many languages, nouns are categorized into classes on the basis of the agreement patterns they trigger on nominal modifiers and in some cases the verb (see, e.g., Hockett 1958; Dixon 1986; Corbett 1991; Aikhenvald 2006; Wechsler 2009: and others). This phenomenon is known as grammatical gender and includes the social gender or sex-based grammatical agreement systems commonly found in Indo-European languages (characterised by 2–3 genders comprising masculine, feminine and neuter) along with noun class systems (which include at least 2, to more than 20 nominal classes). The agreement the noun triggers is in most cases associated with the form of the noun or its semantic features (see e.g., Lang 1976; Corbett 1991; Konishi 1993; Katamba 2003; Irmen & Kurovskaja 2010; Aikhenvald 2012). In French for instance, nouns with the suffix -ette are typically feminine (e.g., camionnette “van” takes the feminine definite determiner la); by contrast nouns ending in -on are mostly masculine (e.g., le ballon “the ball”). At the same time, nouns which refer to referents with female or male social gender also almost categorically trigger feminine and masculine grammatical agreement, respectively. Despite these form-based and meaning-based cues to gender in French, there are typically many exceptions. For instance, le squelette “the skeleton” takes the masculine determiner despite ending in -ette, while la maison “the house” takes the feminine determiner despite having ending in the -on typically associated with the masculine nouns. Likewise, a noun like la personne “the person” takes feminine agreement even when referring to male individuals.

These kinds of exceptions, found in French and in almost all other languages with grammatical gender, have raised theoretical and empirical questions as to how language learners and users form generalizations about these kinds of systems (see e.g., Karmiloff-Smith 1981; Zapf & Smith 2007; Coppock 2009; Gagliardi & Lidz 2014; Björnsdóttir 2021: among many others). For example, a number of theories have been proposed to explain what makes a given generalization productive in the face of exceptions. Specifically, theories of morphological productivity are designed to predict when a rule (e.g., that nouns with a given form trigger a given agreement pattern) can be applied to novel lexical items (see, e.g., Aronoff 1976; Baayen 1993; Bauer 2001; Yang 2016). This general approach, which we follow in the present paper, assumes that users of a language may memorise the gender or class of some (or all) nouns (see e.g., Bauer 2001), but that they also have knowledge of productive cues, like aspects of nominal form and meaning, which they can use to predict the class of a new noun. This idea is supported by experimental studies which show consistent patterns of class assignment of novel nouns in the presence of such cues, even in relatively young children (e.g., Karmiloff-Smith 1981; Pérez-Pereira 1991; Gagliardi & Lidz 2014). It can also be revealed in patterns of loanword adaptation. For example, a well-known example is the Swahili word kitabu (book), borrowed from Arabic into a gender class based on its initial ki which matches a nominal prefix in Swahili. In Kîîtharaka, similar examples are mûciki (music), mûgate, (bread), and mûbira (ball), all loanwords borrowed into a single class based on the initial prefix and showing the agreement pattern of that class.1

Here we make use of a well-known theory of morphological productivity, the Tolerance Principle (Yang 2016). The Tolerance Principle is essentially an evaluation metric which can be used (e.g., by language learners) to determine when to generalise a potential rule to novel items and when not to. The Tolerance Principle provides a threshold number of exceptional cases, past which a rule is predicted not to be productive. For instance, according to the Tolerance Principle, a learner of English whose lexicon includes 7 verbs forming the past tense with the -ed inflection and 3 exceptional cases (i.e., irregular verbs) should generalise the -ed rule: for a set of 10 items the Tolerance Principle specifies a threshold of 4 exceptions. The theory stipulates:

Let a rule R be defined over a set of N items. R is productive if and only if e, the number of items not supporting R does not exceed θN. Thus, eθN=N/ln(N)

(Yang 2016: 64; 2018: 1)

θN here stands for the tolerance threshold, which is based on a theory of lexical search and processing (Sternberg 1969; Foster 1976). Two values are important for calculation of the tolerance threshold— N (the number of items that follow the rule) and e (the number of items that do not follow the rule, i.e., exceptions). The formula N/ln(N) (total number of items divided by the natural logarithm of the total number of items) thus determines the number of exceptions past which a rule is not predicted to be productive.

The theory is based on several assumptions. First, that the distribution of lexical items follows Zipf’s law (Zipf 1949). Zipf’s law states that the rank of a word in a naturally occurring corpus is inversely proportional to its frequency. Items in the lexicon are therefore considered to be listed in order of frequency, with the most frequent and the least frequent appearing first and last, respectively. Secondly, access to these lexical items during lexical processing is assumed to take place in a frequency-sensitive serial search fashion, following earlier work by Sternberg (1969) and Foster (1976). Items are frequency ranked and accessed starting with the highest ranked item, proceeding through the lexicon to the lowest. If there are exceptions, they are also evaluated separately, in order of their frequency. This follows the well-known Elsewhere Condition (Kiparsky 1973; 1982), which states that exceptions to a linguistic rule are considered to be listed and evaluated first before the rule is applied.

The third assumption is that language learners have at their disposal two possibilities—they can generalise (hence apply a rule for the majority of the items on the basis of positive evidence, and additionally list only the exceptions), or they can memorise all the lexical items individually. The choice depends on the cost of processing. The former option will use a function of time, T(N,e), to check off all the exceptions before applying the rule whereas the latter (where all words are treated like exceptions), will require a different function of time, T(N,N), to process). Yang (2016), using a model lexicon size of 100 words, shows that the processing time between two grammar options achieves equity when e = 22, beyond which it becomes more costly to generalise (pp. 60–65). In this case, 22 is the tolerable number of exceptions required for a rule to be productive in a lexicon of 100 words. The learner thus applies an evaluative metric (the Tolerance Principle) to determine what the tolerance threshold is. Both Zipf’s law and the Elsewhere condition are fundamental in the derivation of the formula for calculating this threshold. The threshold is a probabilistic expression of time complexity in accessing a random ith ranked word, which is 1/iHN, where HN is the Nth harmonic number (see, e.g., Yang 2016; 2018: for additional details on calculating this probability and the discussion therein). The time complexity of accessing all the nouns (N) in the non-productive rule option is therefore N/HN. For mathematical efficiency, HN can be substituted with the natural logarithm, ln(N), because the two are known to approximate, thus N/HNN/ln(N). Hence, productivity depends on computational efficiency, calculated by comparing the time involved in processing linguistic items in a lexical search process. The Tolerance Principle makes testable predictions about what generalisation learners will form, and how language users represent and generalise their knowledge. It has been used to study a wide variety of empirical phenomena (e.g., Coppock 2009; Yang 2016; Emond & Shi 2020; Björnsdóttir 2021; Li & Schuler 2023). Let us now consider why an evaluation metric such as the Tolerance Principle is critical for studying nominal classification systems characterised by multiple assignment rules, and many exceptions, as found in Bantu languages.

1.1 Theoretical approaches to the study of gender and noun classes in Bantu

Bantu languages in particular have drawn much attention for the complex behaviour of their nominal systems. One major locus of debate revolves around what motivates the categorization of nouns into classes in the systems found in the languages of this large family. All Bantu languages are characterized by a set of prefixes that attach to stems (e.g., mû-rîmi, “farmer”, kî-banga, “machete” [Kîîtharaka]) in addition to prefixal agreement markers which appear on dependent words (e.g., demonstratives, adjectives, possessives) (see e.g., Bleek 1862; Guthrie 1948; Corbett 1991; Carstens 1991; Katamba 2003). These nominal prefixes have sometimes been treated as class markers, i.e., indicating the class of the noun and in turn triggering agreement on dependent words. However, they are in fact very similar to nominal endings like -ette found in French (or -o/-a in Spanish): a given prefix tends to be associated with a given agreement pattern (see e.g., Harris 1991: on why the Spanish -o and -a should not be considered gender markers per se but as predictors of gender). For this reason, we treat the ubiquitous nominal prefixes in Bantu as potential predictors or cues to agreement class and/or gender. Examples of noun class agreement in Kîîtharaka are given in (1), (2) and (3).2 Example 3 illustrates a case, similar to the French exceptions provided above, where the nominal prefix (class 9) and the agreement prefix (class 1) are not aligned. In this case, therefore, the gender of the noun cannot be predicted based on the form of the noun.

    1. (1)
    1. rû-rara
    2. 11-palm.leaf
    1. rû-ûra
    2. 11-dem.dist
    1. rû-tuune
    2. 11-red
    1. ‘That red palm-leaf’
    1. (2)
    1. gî-kombe
    2. 7-cup
    1. kî-mwe
    2. 7-one
    1. kî-nene
    2. 7-big
    1. ‘One big cup’
    1. (3)
    1. n-dagitaarî
    2. 9-doctor
    1. w-a
    2. 1-conn
    1. mw-ekûrû
    2. 1-female
    1. ‘A female doctor’

Just as in other gender systems, like French, both form (the morphophonological shape of the noun) and meaning have been argued to play a role in Bantu gender assignment systems. Early research argued for a semantically arbitrary system, where class is determined purely based on noun prefixes (e.g., Krapf 1850; Bleek 1862; Meinhof 1906: among others). However, later influential research argued that noun classes are actually associated with semantic regularities (e.g., Guthrie 1967; Richardson 1967; Welmers 1973; Zawawi 1974). This motivated a reconstruction of Proto-Bantu, placing semantics at the centre of its nominal classification system (e.g., Givón 1971b; Givón 1971a; Welmers 1973; Denny & Creider 1986). For example, Denny & Creider (1986) suggested a two-way, semantically-motivated system for count and mass nouns in Proto-Bantu. Count nouns were argued to be broadly categorised on the basis of kind (i.e., animate vs. inanimate) and configuration (i.e., solid objects versus objects with a clear outline, edges, or distinct inside and an outside). Mass nouns were argued to be categorised based on whether they are cohesive or dispersive.

However, as in many complex nominal systems, descriptions of Bantu nouns classes based purely on noun form or on semantics are problematic in several respects. First, similar to the French examples given above, nominal prefixes in Bantu languages do not always deterministically align with agreement patterns. In these cases, researchers typically consider that agreement, not the noun form, indicates its class/gender (see e.g., Herbert 1985; Corbett 1991). That is, the agreement pattern a given noun will take is not always straightforwardly determined from the nominal prefix (see e.g., Msaka 2019: for how the prefix-based system fails to characterise the Chichewa nominal classification system). We will further illustrate this using Kîîtharaka data in the next section. Secondly, there are many nouns in Bantu languages without overt prefixes that nonetheless trigger agreement on nominal dependents. This complicates a purely form-based system (though of course positing something like a null prefix, or a default class is possible). At the same time, the productivity of semantic rules in Bantu nominal systems has been a subject of rigorous debate, with some scholars (e.g., Richardson 1967) arguing that the semantic foundation of the Proto-Bantu system is empirically unverifiable, and others concluding that it is only relevant for animate/human nouns (see, e.g., Crisma et al. 2011; Msaka 2019). Regardless, it is clear that, like for nominal prefixes, the semantic features of nouns do not always perfectly align with the agreement patterns they take.

Typological studies of nominal classification systems have, nonetheless, highlighted the fact that in most complex gender systems, both semantics and morphophonology interact to determine a noun’s gender, i.e., the agreement class it belongs to (see e.g., Corbett 1991; Di Garbo 2014; Corbett & Fedden 2016; Güldemann & Fiedler 2019: and others). As such, a nominal classification system can be considered as having both a set of agreement classes and a set of nominal form classes (Güldemann & Fiedler 2019). Agreement classes are comprised of nouns that share similar agreement across all agreement targets. For instance, the Kîîtharaka noun phrases muntû û-mwe, “one person” and antû ba-îrî, “two people” indicate that the stem -ntû takes agreement û- in singular and ba- in plural. Pairing of nouns on the basis of singular and plural agreement indicates the abstract gender of a noun called target gender in (Corbett 1991: et seq.). In this way, gender can be seen as an abstract feature of a stem, stored in a speaker’s mental lexicon. On the other hand, nominal form classes are based on shared morphophonological attributes of nouns, e.g., nominal prefixes. The Kîîtharaka nouns î-thaga, “metal” and î-rema, “tent” belong to the same form class, while ma-thaga “metals” and ma-rema “tents” belong to another one. Pairing nouns on the basis of singular and plural nominal forms gives rise to what has been called deriflection classes—the morphophonological equivalent of gender (Güldemann & Fiedler 2019).3

Recent research on Bantu nominal classes has used corpus evidence—in particular, lists of nouns along with their nominal form and agreement classes—in order to quantitatively evaluate how both morphophonological and semantic cues together can predict the gender of a noun (e.g., De Schryver & Nabirye 2010; Ngcobo 2010; Taljard & De Schryver 2016; Msaka 2019). However, while these analyses increase the reliability of the data on which theories of nominal classes are based, there has not yet been research applying any specific quantitative theory of productivity to a Bantu gender system.

Therefore, the goal of this paper is first to characterise the nominal classification system of Kîîtharaka, in terms of both gender and deriflection. We will then endeavour to determine whether morphophonological features of nouns (i.e., nominal prefixes) and particular aspects of noun meaning (i.e., noun semantics) are productive cues to gender, i.e., to the agreement patterns that a given noun displays. To do this, we will use the Tolerance Principle, described above, which allows us to evaluate the empirical data we have from a large list of Kîîtharaka nouns, and make predictions about productivity in the face of exceptions in this complex system. We introduce the Kîîtharaka nominal classification system and its particular complexities in the next section.

1.2 The Kîîtharaka nominal classification system

Kîîtharaka [Bantu, E54] is spoken in Tharaka Nithi County in Kenya by about 215,000 Atharaka people who mainly live in Tharaka North, Tharaka South and Chiakariga Sub-Counties (KNBS 2019: 424).4 The language is relatively understudied, though there is some previous work describing its nominal system (see e.g., Bible Translation & Literacy 1993; wa Mberia 1993). In a brief description, wa Mberia (1993) for example, characterises Kîîtharaka as having 17 “noun classes”, based on the nominal prefix, most of which can be organized as singular/plural (deriflection) pairs as is traditional in the Bantuist literature. Notably, he also outlines a set of potential semantic features, such as cultivated fruits, shrubs and trees, birds, insects, human beings, etc., that characterize the noun system in Kîîtharaka.

As with many semantic accounts of noun class systems in Bantu, this characterisation is largely based on subjective impression or intuition. Further, wa Mberia (1993) notes that there are many exceptions in each class—i.e., nouns which do not share the relevant semantic feature(s). Many of the semantic features he mentions are also invoked for multiple classes. As discussed above, exceptions are common in these types of systems, and that is part of what makes quantitative evaluation using a theory of morphological productivity important. Therefore, we will return to these issues below. Table 1 shows the kind of description of Kîîtharaka given by wa Mberia (1993), with a list of the class numbers for reference. The table includes the nominal prefix associated with each class, and an example noun with agreement (on a numeral) in each class.5 (vowels), prosthetic consonants [y] and [c] are inserted before a stem that starts in a vowel, to satisfy the phonological constraint that two distinct vowels cannot appear at the beginning of a word.

Table 1: Kîîtharaka noun classes, as described by wa Mberia (1993). Horizontal lines delimit singular/plural pairs, with corresponding nominal and agreement prefixes. Sample nouns with agreeing numerals are also provided. Note: (i) Some nouns in classes 14/15 (in parentheses) take class 6 agreement, see below for additional discussion; (ii) Parentheses in class 16 prefix mark optional use of the initial bilabial fricative (B) across speakers.

Class number Nominal Prefix Agreement Prefix Sample nouns and agreement Gloss
1 mû(u)- û- muntû û-mwe one person
(1a) û- chibû û-mwe one chief
2 a- ba- antû ba-îrî two people
2a ba chibû ba-îrî two chiefs
3 mû(u)- û- mûtî û-mwe one tree
4 mî- î- mîtî yî-îrî two trees
5 Î- rî- îgûna rî-mwe one baboon
6 ma- ma- magûna ma-îrî two baboons
7 k(g)î- k(g)î- gîkaabû kî-mwe one basket
8 I- bi- ikaabû bi-îrî two baskets
9 n-/∅- î- ngûkû î-mwe one chicken
10 n-/∅- i- ngûkû ci-îrî two chickens
11 rû- rû- rûrigi rû-mwe one thread
10 n- i- ndigi ci-îrî two threads
12 k(g)a- k(g)a- kaana ka-mwe one child
13 tû- tû- twana tw-îrî two children
(14) û- bû- ûcûrû bû-bû this porridge
(15) k(g)û- k(g)û- kûruga gû-kû this cooking
16 (b)a- a- (b)antû a-mwe one place
17 k(g)û- k(g)û- gûntû kw-ingî many places

Notably, Table 1 obscures a number of more complex properties of the Kîîtharaka nominal system. First, some classes are treated as distinct despite having identical nominal prefixes—classes 1 and 3, 9 and 10, 15 and 17—unexpected if classes are defined on this basis alone. There is evidence from agreement that the class pairs 1/2 and 3/4 should be distinguished, but this is not done in all cases. For example, classes 2 and 2a are not treated as fully distinct, and many classes which share the same agreement prefix are not collapsed. This reflects the problem of combining the two different notions of gender and deriflection discussed above (as noted by Güldemann & Fiedler 2019). Moreover, what is shown here is the typical patterning of nouns in the system. In fact, there is quite a bit of variation. For example some nouns with the typical class 7 prefix k(g)î-, class 9 prefix n- or with no prefix actually take class 1 (û-) agreement.6 Similarly, nouns in various nominal form (nominal prefix-based) classes e.g., class 1, 7, 9, take the plural agreement prefix ma-, the pattern that is normally used with class 5/6 nouns (see Figure 1, for an illustration of this class mismatch and also Section 2.2 for how such cases are accounted for in our approach).

Figure 1: Prefix-agreement mismatch in Kîîtharaka. Here, we reorganise the traditional class number notation to collapse nouns that share a nominal prefix. Nouns with the same prefix can take different forms of agreement.

Second, although it appears that nouns in classes 14 and 15 do not have plural counterparts, this is not entirely correct. For example, the bulk of nouns in class 15 are derived from verbs (specifically infinitives) but there are also some body parts e.g., gûtû “ear”, kûgûrû “leg” which take class 15 agreement in the singular and class 6 ma- agreement in the plural.7 Similarly, class 14 includes mainly abstract nouns but there are some others as well, which either take ma- agreement, if interpreted as collective plural, and otherwise n- plural agreement.

Table 2: Genders of Kîîtharaka, based on pairing of singular and plural agreement form classes. The agreement indicated is in line with the traditional accounts. Note: class 15 and 17 plural agreements are the same. Henceforth, we will refer to both as class 15. The singular agreement of classes 1 and 3 are also the same orthograhically, but in speech, the class 3 agreement is articulated with a higher tone. For this reason, we retain the distinction in our analyses.

Gender Agreement class prefixes Traditional class number pairs
A û—a 1/2
B û—î 3/4
C rî—ma 5/6
D k(g)î—k(g)î 7/8
E n—n 9/10
F rû—n 11/10
G k(g)a—k(g)a 12/13
H a—k(g)û 16/17
GAC bû—bû 14
GAC k(g)û—k(g)û 15
GAC all transnumerals (TransN)

To capture these patterns, we therefore describe Kîîtharaka using both nominal form and agreement classes, which respectively correspond to a deriflection and a gender system (Güldemann & Fiedler 2019). We leave aside here nouns that do not have plural or singular versions either because they are singulare/plurale tantum or because they do not make number distinctions. We treat such nouns as transumerals following Güldemann & Fiedler (2019), hence outside the gender system, together with infinitives. We therefore consider the agreement triggered by such nouns to constitute general agreement classes (GAC) (see Msaka 2019: for a similar treatment of infinitives, locatives and CPs). This results in 8 deriflectional and 8 gender categories for Kîîtharaka, shown in Table 2. Following the established tradition in Bantu (see e.g., Carstens 1991) we label the genders as presented in Table 2. Diagrammatic representations of the deriflection and gender system are shown in Figure 2.

Figure 2: The gender and deriflection systems of Kîîtharaka, including an illustration of possible variation. Note: The dotted lines represent non-deriflectional/gender paradigms used with nouns which can appear variably in different classes. For example, when the plural of a noun is treated as collective, or when special humans that typically appear in class 9 are treated as normal humans or transnumerals.

An obvious observation from Figure 2 in reference to Table 1 is that class 17 has disappeared and class 3 is expressed in the same way as class 1—both take the agreement û- in singular. We have only come across one exceptional case—in the context of numeral 1 “one”, the class 3 agreement is articulated with a high tone. For the purpose of comparisons with the set tradition in these analyses, we do not dissolve these classes but position them together to show that they share similar agreement patterns. From this analysis, Kîîtharaka can be characterized as having a “crossed” gender system according to the terminology of Corbett (1991), with variable convergence at classes 6 and 10. Class 6 has elsewhere in the Bantu literature been argued to function as a default plural (see e.g., Bosire 2006; Ström 2012; Fuchs et al. 2018; Fuchs & van der Wal 2022). However, outside of regular class 5 nouns, many of the other nouns that pluralise in this class have a collective interpretation in the plural. Another observation is that the deriflection system lumps the prefix-less nouns together, but these nouns belong to different agreement classes, hence genders. These prefix-less nouns (characterized as belonging to class 1a/2a or 9/10 in the traditional prefix-based system) belong to class 1/2 or 9/10 on the basis of agreement.

A number of these observations point to the fact that, as in other Bantu languages, some of the nominal prefixes of Kîîtharaka can be characterised as multifunctional—they function as the primary prefixes for a particular set of nouns, but other nouns can also take these prefixes (either alone, or in combination with their primary prefix) in order to express particular evaluative meanings. For example, the prefix k(g)î- is typically associated with class 7 (Gender D) nouns. However, a noun from gender A (e.g., mû-ana) can take the prefix (as in kî-ana) to derive a pejorative meaning “ugly child”. In the Kîîtharaka system, class 12/13 can be used to express diminutive meaning, class 5/6 can express augmentative meaning while pejorative meaning is associated with class 7/8. Prefixes used with this more overtly derivational function are sometimes referred to as secondary noun class prefixes (see e.g., Fortune 1970; Dembetembe 1995; Harjula 2006; Déchaine et al. 2014; Di Garbo 2014; Dube et al. 2014; Taraldsen et al. 2018: and others), or as multifunctional morphemes whose particular meaning is dependent on context (as in Di Garbo 2014; Msaka 2019). In many Bantu languages the primary nominal prefixes for the noun classes with loose semantics (i.e., classes which appear to contain nouns from diverse semantic domains), like class 5/6, and 7/8, have apparently been “recycled” to express evaluative meaning (Déchaine et al. 2014; Di Garbo 2014; Msaka 2019). It is however important to note that even in their recycled use, these evaluative prefixes typically stack on the primary prefixes and obligatorily control agreement, thus behaving like the primary counterparts.

An important empirical question is how to accommodate these productively derivational elements within a noun class or gender system in a way that accounts for their unique morphosyntactic properties. Some researchers treat these prefixes as theoretically identical to other prefixes, i.e., they bear noun class features (see e.g., Mufwene 1980; Myers 1987; Carstens 1991; Bresnan & Mchombo 1995; Maho 1999). Others attribute the multifunctionality to different syntactic positions that the two sets of prefixes occupy (see e.g., Déchaine et al. 2014; Fuchs & van der Wal 2022). We will adopt a similar approach here and leave the issue for future research.

To summarize this section, we have here described the major trends in the Kîîtharaka noun class system. The focus here was the relationship between gender (as defined by singular and plural agreement pairs) and deriflection (as defined by singular and plural noun class prefix pairs). We have also highlighted the fact that these two systems do not perfectly match up: there are nouns in particular agreement classes that do not share a nominal prefix (and vice versa). In the next section, we bring in semantic features, which share this property: there are potential correlations between these features and gender, but there are also clearly exceptions. We then assess the degree to which nominal prefixes and semantic features predict Kîîtharaka gender using the Tolerance Principle as an evaluation metric.

2 Methodology

2.1 The corpus

As noted above, the Tolerance Principle is a theory of productivity in learning, and thus perhaps the most appropriate corpus from which to derive predictions for Kîîtharaka might be a corpus of child-directed speech (CDS) (Yang 2016; 2018). However, it has been shown that in the absence of such a corpus, common nouns can be sampled from an adult corpus (e.g., the most frequent 1000–2000 words) to represent the kinds of nouns children are likely to encounter in the acquisition process (see e.g., Yang 2018; Kodner 2020). Here we use two adult language corpora: a large extract from the Kîîtharaka translation of the bible, and the Summer Institute of Linguistics (SIL) African Wordlist. While frequency data would in theory be available for the bible sub-corpus, this alone may not represent the kind of nouns children come across. The SIL African wordlist sub-corpus therefore supplements this, comprising the most common nouns in an African context.8 A total of 45,844 word tokens (9,656 word types) was extracted from the Kîîtharaka bible (Bible Translation & Literacy 2019). This text source was chosen because it is the only existing Kîîtharaka corpus. As it was recently translated (published 2019), the bible text can be considered a reasonable source of synchronic language data. The bible text used here includes sections from both the old and new testament to increase type variation across the corpus (see Evans 2007). Chapters that mainly contain a list of proper names (as in Numbers and Chronicles) were excluded. Around 60% of the corpus is from the old testament and the remaining 40% is from the new testament. The second part of our corpus, the SIL African Wordlist, was translated by the first author (see Snider & Roberts 2004). The list contains names of common things like local birds, trees, animals, insects and other phenomena that may have been missing from the bible text.

2.2 Corpus processing and coding

The bible text was uploaded to lancsbox, a corpus software tool developed by Lancaster University (Brezina et al. 2020). We then extracted a word list which enabled us to manually identify nouns (see Ngcobo 2010 for a similar approach). Further processing of the word list was done so that there was only a single lemma form for each noun retained. Nouns were manually extracted from this word list. This resulted in a total of 901 nouns from the bible text. Of the 1,622 translatable words in the SIL Comparative African Word List (between items 1.1 and 11.1),9 1,426 noun types were not already present in the bible corpus. Together the total noun corpus therefore consists of 2,327 nouns. Each noun was evaluated against a set of semantic features collected from previous work on Kîîtharaka specifically and Bantu more generally (see, Creider 1975; Denny & Creider 1986; wa Mberia 1993).

The complete set of semantic features is shown in Table 3 along with the classes for which these features are expected to be potentially relevant. All the nouns in the corpus were coded for all features, and we tested the predicted productivity of each feature for each agreement class.10 The corpus can be accessed on the Open Science Framework (OSF) using the link: https://rb.gy/ows7r1.

Table 3: Semantic features tested, along with the class(es) expected to have this feature potentially associated with them. Note: The term Dispersive mass refers to substances composed of particles that can be dispersed, such as flour and soil. Extended describes dimensions that are relatively long. Spread is used for things that extend in two dimensions in space, such as a mat. Cohesive mass denotes substances that stick together, like liquids. Plant part includes parts of a plant other than fruits, such as flowers or leaves. Artefacts are small man-made objects that can be held by hand, such as tools. Pejorative applies to terms with negative social connotations, for example, “ugly-looking”. Manner nominals, mainly derived from other nouns to indicate a method or way of doing things e.g., speaking “like people of Tharaka”. Narrow describes thin and extended-looking things, and Derived refers to other derived nominals apart from those denoting manner e.g., infinitives and those derived from other words to form abstract notions, e.g., ‘teaching’ derived from the word ‘teach’. Nouns coded with Human feature in the corpus includes those referring to human beings, human professions, other beings like God, spirits and devil (also known as superhumans).

Semantic Feature Expected Gender Agreement class pairs
Human A 1/2
Tree B 3/4
Dispersive mass B 3/4
Extended B 3/4
Spread B 3/4
Cohesive mass C 5/6, 14
Augmentative C 5/6
Plant part C 5/6
Fruit C 5/6
Round C 5/6
Artefact D 7/8
Pejorative D 7/8
Plant D 7/8
Manner D 7/8
Animal E 9/10
Loan E 9/10
Narrow F 11/10
Wavy F 11/10
Diminutive G 12/13
Abstract/Concrete GAC 14, 14/10, 14/6
Derived GAC 15
Infinitive GAC 15

Additionally, morphophonological features—the set of all nominal prefixes in Kîîtharaka, including the null prefix—were also identified and coded for in the same manner as semantic features. The prefixes coded for and their respective expected classes are shown in Table 4. As with the semantic features, all nouns were coded for each prefix, and we tested the productivity of each prefix for each agreement class.

Table 4: Morphophonological features tested, along with the class(es) expected to have this feature potentially associated with them.

Morphophonological feature Expected gender Agreement class
mû(u)- A,B 1,3
a- A 2
mî- B 4
Î- C 5
ma- C 6
k(g)î- D 7
i- D 8
n- E 9/10
E 9/10
rû- F 11
k(g)a- G 12
tû- G 13
û- GAC 14
k(g)û- GAC,H 15
ba- H 16

In addition to these features, each noun was coded for agreement class and gender based on the observed patterning of singular/plural agreement classes. For example, there are 423 nouns that take û- (class 1,3) agreement in the singular. 196 of these take ba- (class 2) agreement in the plural, hence are classified as Gender A. 208 take î- (class 4) agreement hence are coded as Gender B. Of the remainder, 11 made no number distinction—coded as transnumeral (TransN) and 1 takes a class 10 prefix and respective agreement, hence placed in Gender D. There were some cases where the same noun lemma occurred with different singular agreement patterns in the corpus (i.e., across different instances in the bible). We dealt with these as follows: in cases where a lemma appeared with two alternative singular agreement prefixes, e.g., nkoma yathi (class 9 agreement) and nkoma athi (class 1 agreement) “(the) devil went”, we coded the noun as in the agreement class matching the nominal class prefix.11 Cases of alternative plural in class 6 were set as noted above.12 As far as we can tell, this kind of variation is not predictable from any obvious contextual feature. However, it may be that alternative agreement patterns reflect speaker evaluation—for example whether they want to treat the referent as more human-like, or not. We return to this in the Discussion below. There were also cases where the same noun lemma had clearly different meanings across contexts, with each meaning consistently corresponding to a different agreement pattern, e.g., kîrundu wa Ngai “the Holy spirit of God” (class 1 agreement) and kîrundu kîa ûrongo “the spirit of deceit” (class 7 agreement). These were treated as two distinct lemmas.

2.3 Application of the Tolerance Principle

Each of the semantic and morphophonological features coded for was subjected to a quantitative analysis to establish its productivity using the Tolerance Principle (Yang 2016). As mentioned in section 1, the Tolerance Principle is an evaluation metric that provides a way of making predictions about the likely productivity of linguistic rules in the presence of exceptions. The Tolerance Principle makes predictions about how and when learners of a language should be expected to generalise a potential rule. Here, we will use it not to make predictions about when learners will acquire a rule, but as a way of establishing productive rules for gender assignment that speakers of the language may represent. As we have highlighted in the preceding sections, gender systems in Bantu (and beyond) are often characterised by exceptions, with multiple semantic features associated with one class, and even a single semantic feature that transcends several classes. Likewise, the various nominal prefixes (here treated as morphophonological cues to gender assignment) will also be evaluated.

In our case, the relevant variables include the number of nouns, N, with a certain semantic/morphophonological feature across the entire corpus, and the number of exceptions, e, which do not take the particular agreement class or gender for which we are evaluating productivity. Together, these give us the tolerance threshold, θN, for the potential rule, which we evaluate against the exceptions. For example, if there are N nouns with the Narrow feature in the corpus, and we want to evaluate whether the rule Narrow → gender B is productive, the exceptions will be the number of nouns with this feature that take some other agreement pattern, i.e., belong to another gender. Table 5 shows how this potential rule (among others) fares when evaluated under the Tolerance Principle. There are a total of 115 nouns with this semantic feature in the corpus. Out of these, only 32 are gender B (agreement class 3/4) and 83 are exceptional (i.e., are in other agreement classes). The tolerance threshold θN in this case is 24, and therefore the rule is not predicted to be productive. Similarly, Table 5 shows that this same semantic feature is not predicted to be productive for any other gender either.

Table 5: Evaluation of the productivity of the semantic feature “Narrow” using the Tolerance Principle. Bolding highlights the potential rule, Narrow → gender B, discussed in the text. None of these hypothetical rules are predicted to be productive.

Gender Agreement class pairs Narrow N e θN Productive
A 1/2 0 115 24 No
B 3/4 32 115 83 24 No
C 5/6 2 115 113 24 No
D 7/8 5 115 110 24 No
E 9/10 6 115 109 24 No
F 11/10 65 115 50 24 No
G 12/13 4 115 111 24 No
H 16/15 0 115 24 No
GAC 14,15 0 115 24 No
Grand Total 115

3 Results and discussion

3.1 Results

Table 6 summarizes the results of our analysis of semantic features using the Tolerance Principle. Here we show only the evaluation of features with respect to the gender they were predicted to be relevant for. As expected, all other evaluated rules were predicted to be unproductive based on our analysis. Our results reveal that six of the semantic features tested are predicted to be productive as cues for gender determination in Kîîtharaka: Tree, Augmentative, Pejorative, Diminutive, Manner and Infinitive. All three of the evaluative features are predicted to be productive based on the subset of nouns that occur only in these classes (i.e., for which this is the primary gender) and for nouns that have the relevant class as an alternative (i.e., for this is a derived meaning, or secondary gender).13 We show analyses separating these two types of nouns in Table 7.14

Table 6: Predicted productivity of semantic features as cues to Kîîtharaka gender assignment based on the Tolerance Principle. Here we show only the results of particular features with their expected classes. N is the total number of nouns with the relevant feature, n is the number of rule-compliant nouns, e is the number of exceptions, and θN is the threshold defined by the Tolerance Principle. Here we flag the predicted probability of the feature Human and Manner with an asterisk since we will analyze them further below.

Semantic feature Expected class Gender N n e θN Productive
Human 1/2 A 252 196 56 46 No*
Tree 3/4 B 16 15 1 6 Yes
Dispersive mass 3/4 B 64 10 54 15 No
Extended 3/4 B 161 80 81 32 No
Spread 3/4 B 52 3 49 13 No
Cohesive mass 5/6 C 58 3 55 14 No
Augmentative 5/6 C 32 25 8 9 Yes
Plant part 5/6 C 47 5 42 12 No
Fruit 5/6 C 17 8 9 6 No
Round 5/6 C 44 12 32 12 No
Artefact 7/8 D 136 35 101 28 No
Plants 7/8 D 33 13 20 9 No
Pejorative 7/8 D 24 20 4 8 Yes
Manner 7/8 D 10 1 9 4 No*
Animal 9/10 E 155 89 66 31 No
Loan 9/10 E 151 68 83 28 No
Narrow 11/10 F 115 64 51 24 No
Wavy 11/10 F 54 26 28 14 No
Diminutive 12/13 G 76 74 2 18 Yes
Abstract 14 GAC 1006 821 185 146 No
Cohesive mass 14 GAC 58 34 24 14 No
Derived 15 GAC 1082 778 304 155 No
Infinitive 15 GAC 671 671 0 103 Yes

Table 7: Productivity of evaluative classes based on inherent and derivational features. Note: N is the number of inherently/derivational nouns with the target feature, n is the number of these nouns in the respective genders and e the number of the nouns with this feature but in other genders.

Rule type Feature Gender N n e θN Productive
Inherent Augmentative C 26 20 6 8 Yes
Pejorative D 9 8 1 4 Yes
Diminutive G 15 15 0 6 Yes
Derivational Augmentative C 9 7 2 4 Yes
Pejorative D 15 13 2 6 Yes
Diminutive G 61 59 2 15 Yes

The feature Manner appears unproductive for Gender D (class 7/8)—of 10 nouns, only 1 belongs to this gender. However, there is some regularity observed in that all of the 9 exceptions are transnumerals by virtue of being singulare tantum. Following the mechanism we established in Section 2.1 to account for the non-declensional paradigm, these kinds of nouns are not categorized as Gender D but are cases of GAC. Manner is therefore a productive cue to GAC in the same way as the evaluative features described above. This includes names of languages, e.g., Kîî-ibirania “Hebrew language” which serves as a nominal denoting the manner of “speaking like a Hebrew” and other phrases denoting behavioural attributes e.g., kîî-muntû “like a human”. Finally, and most notably, the feature Human—perhaps the semantic feature most often claimed to be relevant for Bantu—is not predicted to be productive. This is because, despite all nouns in Gender A being Human, the number of human nouns in other genders is above the productivity threshold defined according to the Tolerance Principle. However, the Tolerance Principle allows recursive application with feature conjunction. Intuitively, this can result in the feature being productive, though not in isolation. We return to this in the Discussion below.

Turning to morphophonological features, i.e., nominal prefixes, in contrast to the semantic features tested here, almost all of the morphophonological features tested are predicted to be productive. This is illustrated in Table 8, which shows the evaluations of each nominal prefix with respect to the expected gender. Only the prefix mû(u)- is unproductive, because it is associated (nearly equally) with two genders—A and B. The same applies to the class 10 prefix n- which is associated with two genders— E (class 9/10) and F (class 11/10). The prefixes û- and k(g)û- are productive for GAC, while for ba- there is only one noun bearing the prefix and since no computation is needed for this one observation, the Tolerance Principle by design does not yield any tolerance threshold.

Table 8: Predicted productivity of morphophonological features, here nominal prefixes, as cues to Kîîtharaka gender assignment based on the Tolerance Principle. Here we show only the results of particular features with their expected genders. N is the total number of nouns with the relevant feature, n is the number of rule-compliant nouns, e is the number of exceptions, and θN is the threshold defined by the Tolerance Principle. Note that the prefix mû(u)- is tested for association with two genders. We flag the predicted productivity for this prefix as we will discuss it below. For the prefix ba-, the threshold for productivity does not hold when the value of N is 1.

Morphophonological feature Expected class Gender N n e θN Productive
mû(u)- 1 A 397 161 236 66 No*
a- 2 A 188 188 0 36 Yes
mû(u)- 3 B 397 208 189 66 No*
mî- 4 B 208 208 0 39 Yes
î- 5 C 187 153 34 36 Yes
ma- 6 C 145 143 2 29 Yes
k(g)î- 7 D 269 237 32 48 Yes
i- 8 D 240 237 3 44 Yes
n- 9 E 300 292 8 53 Yes
9 E 148 69 81 30 No
n- 10 E 499 377 120 80 No
n- 10 F 497 107 390 80 No
rû- 11 F 122 108 14 25 Yes
k(g)a- 12 G 83 77 6 19 Yes
tû- 12 G 77 77 0 19 Yes
û- 14 GAC 106 103 3 23 Yes
k(g)û- 15 GAC 677 674 3 103 Yes
ba- 16 H 1 1 0

3.2 Discussion

Gender assignment systems are one of the most well-studied features of Bantu languages, but there is longstanding debate as to whether these systems are based on semantics, morphophonology, or both.15 We have argued that part of the reason this debate rages on is that previous work has largely shied away from quantifying the robustness of the features proposed to underlie gender assignment (though cf. Ngcobo 2010; Msaka 2019). Importantly, no previous work on gender assignment in Bantu has engaged with theories of productivity. In this study, we aimed to illustrate how a quantitative approach, couched within a theory of productivity, makes it possible to go beyond intuitions and to deal with exceptional cases in a principled way. We took gender assignment in Kîîtharaka as our test case. Although there are many theories of morphological productivity, here we used the Tolerance Principle (Yang 2016).

Our results suggest that for Kîîtharaka, only a few semantic features are predicted to productively cue agreement class, namely, Augmentative (gender C), Pejorative (gender D), Diminutive (gender G), Tree, and Infinitive (GAC, class 15). On the other hand, morphophonological features—the nominal prefixes—are widely predicted to productively cue gender in Kîîtharaka. In other words, Kîîtharaka speakers are mostly predicted to be able to productively determine the gender of a novel noun (here taken as the agreement pattern it shows) on the basis of a nominal prefix alone. The only apparent exception to this was the prefixes mû(u)- (which is shared by gender A and B) and n- (which appears with plural agreement classes of gender E and F).

In fact, across both parts of our analysis, it is notable that agreement classes 1 and 3 were not predictable from any of the features tested. Perhaps most surprisingly, the feature Human, a target feature for classes 1 and 2 (Gender A), was not predicted to be productive. Human/animacy has been claimed to play a key role in motivating assignment of nouns (to gender A) across the Bantu language family (see e.g., Wald 1975; Contini-Morava 2008; Ngcobo 2010; Crisma et al. 2011: and many others). Likewise, if singular prefixes alone marked agreement class or gender, it would be impossible to predict whether a mû(u)-prefixed noun is in gender A (class 1/2) or gender B (class 3/4). Assuming an explicit theory of productivity forces us to specify what it means for a particular feature to be productive. In this case, that theory requires that nouns with the relevant feature (or features) not be assigned to other classes beyond a certain threshold—clearly the case for humanness and mû(u)- in Kîîtharaka.

However, while it is possible that these features simply are not productive in Kîîtharaka, it is worth considering some other possibilities. First, it could be that our corpus, which is partially extracted from the bible, contains a higher number of certain kinds of nouns than one would find in other texts, or in spontaneous speech (e.g., spirits, God and other super-humans, all assigned the feature Human in our analysis). Indeed some of these nouns are exceptions in our corpus (i.e., they are not in gender A). For instance, kîroria “prophet”, though Human, is in agreement class 7/8 (gender D). An over-representation of such nouns could lead to an increase in the number of exceptions relative to cases that follow the rule, potentially incorrectly predicting the +Human feature to be unproductive for gender A. However, it is not clear how likely this actually is. In fact, a large portion of our corpus—1,426 nouns, more than half of the total—comes from the translated SIL Comparative African Wordlist rather than the Bible text. This is potentially a more balanced representation of commonly used nouns across African languages. Thus, we re-ran our Tolerance Principle analysis with just the subset of nouns from the SIL wordlist. The results were qualitatively similar—e.g., the Human feature was still not productive (see the SIL list only section of the corpus analyses—URL provided in Section 2.2).

A second possibility is that both the semantic feature Human, and the morphophonological feature mû(u)- do productively cue genders A and B, but they require recursive application of the Tolerance Principle. In other words, these features would be productive when re-evaluated based on a subset of the lexicon. In the analysis described above, we treated each semantic and morphophonological feature as individual and independent of any other features. However, both semantics and morphophonology may be used jointly to determine certain aspects of gender assignment. The Kîîtharaka corpus data indeed shows that all human nouns with the prefix mû(u)- are in gender A. Similarly, all non-human nouns with prefix mû(u)- are in gender B. If we treat Human as a binary feature (a possibility for gender features according to Lumsden 1992; Rooryck 1994: and others), the lexicon can be divided into [+Human] nouns and [–Human] nouns. The remaining features, like the morphophonological feature mû(u)-, can then be re-evaluated on each subset. In its original conception, the Tolerance Principle was indeed designed to be applied in such a recursive way. This was motivated by the Maximise Productivity principle, which asserts that learners are actively searching for productive rules (see Yang 2016; 2018). Learners could thus evaluate rules that apply to both the full lexicon, but also those that may apply to only a subset of the lexicon. Recursive application is predicted to affect the time-course of acquisition, but not the eventual productivity of the resulting rule. While recursive rule application also makes the space of possible rules to be tested much bigger, using these particular features immediately jumps out from our data. To explore this, we retested the productivity of the prefix mû(u)- and found that the rule [mû(u)-] → gender A was productive for the subset of the lexicon with the feature [+Human] (N = 166, n = 161, e = 5, θN = 32), and the rule [mû(u)-] → gender B was also productive for the subset of the lexicon with the feature [–Human] (N=231,n=207,e=24 , θN = 42). Notably, in the same vein, the semantic rule [+Human] → gender A can also be reapplied to a subset of the corpus including nouns with the prefix mû(u)-. For this subset, the rule [+Human] → gender A is productive (N = 165, n = 161, e = 4, θN = 32). In other words, recursive application of rules targeting genders A and B with features mû(u)- and +Human give us additional evidence for productivity.

Finally, it is potentially worth looking more closely at the exceptions to the semantic rule [+Human] → gender A. In the entire corpus, there are 252 human nouns of which 196 nouns are in gender A. The remaining 56 human nouns are distributed across gender C (N = 2), gender D (N = 21), and gender E (N = 21) and a few in genders B and G. While this rule would not be productive with this number of exceptions, the exceptional nouns largely comprise humans with special attributes. For instance, there are those with supernatural abilities e.g., kîroria and ngai (gender D) “a lesser god”. There are also those with properties or behaviours that are deemed socially undesirable, e.g., îrwaya “prostitute” or kîonje “a cripple” (Gender C). Some of these nouns also convey pejorative meanings. For example, the noun kîrundu can mean “spirit of God” (gender A), “demon/evil spirit” (gender D) or “spirit of the dead” (gender D).16 These exceptional nouns are given in Table 9.

Table 9: Examples of human (and superhuman) nouns outside of class 1, illustrating that some exceptional nouns have a pejorative meaning.

Noun Gloss Class Unique feature
îrwaya prostitute 5 pejorative
îrimorimo giant 5 superhuman
kîrîndî a large group of people 7 human collectivity
kîrundu a demon/spirit of the dead 7 superhuman
kiuno a fetus 7 pejorative
kîroria a prophet 7 super human
kîonje a cripple 7 pejorative
kîîmbere a firstborn 7 human collectivity
kîa a fool 7 pejorative
nkoma a devil/ghost 9 superhuman
ngai a lesser god/idol 9 superhuman
nthuke generation 9 human collectivity
nkombo a slave 9 pejorative
ntigwa widow 9 pejorative
nthaka (circumcised) a young man/son 9 (special) human
nthaata a barren person 9 pejorative
nkea a poor person 9 pejorative

These examples reveal the possibility that at least some human nouns are not in gender A in order to convey a specific, mostly negative meaning. This accords with our general finding that the semantic features predicted to be productive in Kîîtharaka are mostly evaluative in nature: Augmentative, Pejorative, and Diminutive. And indeed, it has been argued that evaluation plays a key role in nominal classification in other Bantu languages (Castagneto 2017; Msaka 2019). If these nouns are considered not to have the feature Human, i.e., if they are treated as having a distinct feature that combines Human and Pejorative, this could impact how Kîîtharaka speakers represent these nouns and the classes they belong to. For example, re-evaluating productivity after removing nouns coded as Pejorative renders the rule [Human]→ gender A productive (N = 235, n = 196, e = 39, θN = 43). This is another example of the recursive application of the Tolerance Principle, this time by re-evaluating a semantic feature ([Human]) on a subset of nouns without a particular evaluative meaning ([Pejorative]).17

Regardless of exactly how learners come to form this generalisation, our guess is that the feature [+Human] can be used productively by Kîîtharaka speakers to predict the gender of novel nouns. This feature, and related features referencing animacy (or agency) have, in addition to evaluative features, been argued to be relevant in many Bantu languages see e.g., Ström 2012: 274–281; Di Garbo 2014: 42,148,176; Güldemann 2023. The remaining inherent feature which was found to be productive is Tree. This is somewhat surprising, and is supported by a relatively small set of nouns. Of course, the Tolerance Principle is a measure of predicted rule productivity. The next logical step is to look for evidence of whether the predictions from the present analysis are borne out for speakers of Kîîtharaka. For example, psycholinguistic experiments could test whether Kîîtharaka speakers—either adults or children— use the features that are predicted to be productive based on this analysis, and fail to use those predicted to be unproductive when assigning novel nouns to classes. As noted above, here we have used a corpus of nouns that may not precisely reflect the kinds of nouns young children encounter (though the SIL database may be a reasonable approximation). For this, additional data on child-directed speech in Kîîtharaka would be needed. In addition, while we have proposed at least one case where semantics and morphology likely work together, the question of whether speakers privilege one type of cue over the other (e.g., Karmiloff-Smith 1981; Pérez-Pereira 1991; Gagliardi & Lidz 2014; Lawyer et al. 2024), as well as the time course of acquisition in Kîîtharaka remains open. We leave these issues for future work.

It is also worth noting that the approach we have taken here is to treat meaning and form class markers (prefixes) as potential cues to agreement class, rather than the other way around. In some sense, this gives a special status to noun internal cues (Gagliardi & Lidz 2014), and at the same time assumes both noun form and meaning have equal status in the minds of speakers. However, acquisition data suggests that children learning at least some languages exhibit knowledge of nominal prefixes much earlier than semantic cues (e.g., Demuth 2003: and the references therein) and show sensitivity to the noun form earlier than to noun external syntactic alternations like agreeing determiners or adjectives (see e.g., Karmiloff-Smith 1981). However, this is likely to change at a later acquisition stage. Studies on the acquisition of Romance languages show that from age 10, children begin to make use of both semantic and syntactic distributional cues to determine gender (see, e.g., Karmiloff-Smith 1981; Pérez-Pereira 1991). Both child learners and adult users of a language could, therefore, make use of their knowledge of deriflection to predict agreement, and vice versa. In other words, it could be that gender in fact provides a cue to deriflection. This is argued to be the case in Icelandic, for example, where knowledge of nominal inflection class provides a productive cue to gender in some cases, but in others it is gender that provides the only productive cue to inflection class (Björnsdóttir 2021; 2023). In Kîîtharaka, for the most part, the mapping between deriflection and gender is one-to-one, and the gender of most nouns can be determined from their deriflection class. But this is not always the case. For example, as shown in Figure 1, when a noun lacks overt form class features—i.e., does not have a nominal prefix—it is not clear whether the noun is in gender A or gender E. In this case, learners may derive productive generalisations based on alternative cues, like the semantics of a noun or the agreement pattern it takes, and use these when encountering novel nouns.

At a more general level, while determining the psychological reality of specific cues and how they interact in child learners and adult users of Kîîtharaka are important questions for future research, our findings here highlight that robust predictions can be made about gender assignment in Bantu languages using a data-driven approach combined with an explicit theory of productivity. The same approach can in principle be applied to other Bantu languages. The results may align with ours—which suggest that morphophonological features are generally productive but many semantic features are not—or they may suggest differences across Bantu languages. Either way, the approach has the potential to lead to better understanding of how gender assignment works in this family of languages.

4 Conclusion

Gender and noun class systems are found pervasively in the languages of the world. They are often complex systems, in which a variety of cues can determine how nouns are categorized and what agreement patterns they take. Bantu noun class systems present a particularly well-studied case. A common feature of traditional accounts of gender assignment in Bantu is the characterization of the systems on the basis of semantics. Often, numerous abstract and subjective semantic features are considered to motivate the classes nouns belong to. At the same time, the ubiquitous nominal prefixes found in Bantu also present an obvious morphophonological cue to nominal classes. Here we introduced the noun class system of Kîîtharaka, describing general patterns and exceptions, and highlighting the utility of two distinct notions—gender (based on agreement classes) and deriflection (based on nominal prefixes). We then used a data-driven approach to evaluate the predicted productivity of both semantic and morphophonological features as cues to gender in Kîîtharaka. To do this, we created a new corpus of feature-tagged Kîîtharaka nouns and used a well-known theory of productivity, the Tolerance Principle, to test a large set of potential rules, based on semantics and morphophonology. The results show that while morphophonology is predicted to be highly productive, semantic features do not appear to be at the core of the classification system; only five (mainly evaluative) features were predicted to be productive in our analysis. One additional feature, Human is predicted to be productive under recursive application of the Tolerance Principle. These results suggest that morphophonology, not semantics, provides the strongest cue for nominal agreement patterns in Kîîtharaka. We hope to have illustrated that this approach can help more robustly characterise noun class systems in the modern Bantu languages, and make testable predictions about speakers’ knowledge of these systems. In future work, we aim to test these predictions for Kîîtharaka.

Supplementary files

A. Number of nouns by assignment rules and gender. DOI: https://doi.org/10.16995/glossa.11755.s1

B. Classification of nouns by assignment rules and gender. DOI: https://doi.org/10.16995/glossa.11755.s2

Acknowledgements

We would like to thank Fang Wang for reading and providing valuable feedback to earlier versions of this paper. Special thanks to audiences at Bantu 8 conference in Malawi, the This Time for Africa (TTFA) Talk series at Leiden University, the Meaning and Grammar Research Group and the Centre for Language Evolution at the University of Edinburgh, for providing valuable feedback to the first author on this work.

Competing interests

The authors have no competing interests to declare.

Notes

  1. We remain agnostic here as to what degree adult users of a language might actively use these learned associations to derive agreement patterns for known nouns during normal comprehension and production. This is assumed to be the case by e.g., Yang (2016) where the focus is on generalisation during learning. However, in theories of gender and the mental lexicon, gender is often assumed to be a stored feature of known nouns (e.g., Harris 1991; Clahsen et al. 2001; Alexiadou 2004; Gor 2017; Ellingson Eddington 2022), and there is evidence that speakers of Zulu for example access the lexicon to determine the gender even when productive rules apply (Zeller et al. 2022). [^]
  2. In Kîîtharaka and several other Bantu languages, the vowel <û> [o] is realized as <w> before a vowel, with the exception of <u>. [^]
  3. Form classes which constitute deriflection in Bantu are equivalent to what has been called inflectional classes in languages like Russian or Icelandic (see e.g., Bjarnadóttir 2012; Madariaga & Romanova 2022; Markússon 2023). Although agreement classes may be sufficient to describe the gender system of Bantu, distinguishing gender from deriflection highlights aspects of the system that differ from traditional descriptions of Kîîtharaka and other Bantu languages, based on the nominal prefix alone. Of course, the distinction between gender, inflection, and deriflection is a complex issue, and inflectional class status has been widely debated cross-linguistically (see e.g., Corbett 1991; Harris 1991; Carstairs-McCarthy 1994; Bonami & Beniamine 2016; Stump 2016). Here, we treat them as potential cues to gender assignment (see e.g., Corbett 1991; Harris 1991; Kanampiu et al. 2025). How they are acquired and represented by speakers, we leave to future research. [^]
  4. A part of the population living in Maara and Meru South Sub-Counties, and another population mainly occupying Tharaka Sub-County in Kitui County are also considered to speak Kîîtharaka. [^]
  5. Notes: (i) The back vowel [o] orthographically presented as û and mid-front [e] written as î are realised as w and y before another vowel. (ii) On class 2 and 10 plural agreement: because the agreement prefixes are î- and i-, respectively [^]
  6. Note that class 1 agreement on the verb is usually a-, but here we focus on the agreement on the nominal dependents, which is usually û-. [^]
  7. wa Mberia (1993) notes these exceptional nouns but does not classify them into any class. Under a Tolerance Principle approach, these are cases that won’t be productive for any rule, and are thus likely acquired through lexicalization/memorization. [^]
  8. See section 3.2 where we verify that the predictions of the entire corpus also hold for just this corpus. [^]
  9. Verbs were translated as infinitives as this was the only way to ensure noun class 15 was represented. 37 words were not translated as they denoted things that lacked a native equivalent. [^]
  10. The locative feature was not coded for since this class has diachronically diminished in Kîîtharaka and is made up of only two nouns (one each for singular and plural) (see also Fuchs et al. 2018). [^]
  11. There were 5 cases of this kind among all nouns in the bible corpus. As noted in Section 1.2, Kîîtharaka has highly regular singular-plural class mappings. There are some additional cases when plurals can be perceived as collective, which gives rise to a certain amount of variation. For example the class 6 prefix ma- may alternate with the regular plural prefix (see, e.g., Contini-Morava 2000: 8,16; for such cases in Kiswahili.) [^]
  12. The ma- prefix is probably generally used to express collective plural. But in other Bantu languages, this prefix has also been argued to be a default plural (see, e.g., Bosire 2006; Ström 2012; Fuchs & van der Wal 2022). For the purposes of this paper, we will not treat nouns that can have ma- in the plural as separate genders. But this could be re-assessed in future research. [^]
  13. In other words, if an entity is big, bad-looking or small it will belong to gender C, D and G, respectively. However, for these three genders (C, D and G), there are nouns that can be thought of as inherently bearing these evaluative features, while other nouns are derived: they are typically in other genders, but occur in these when they are evaluated as such by the speaker in the context. For both sets of nouns, the relevant semantic feature is productive. [^]
  14. An anonymous reviewer suggests that we could explore whether our predictions would remain if we used smaller corpus samples, following (e.g., Kodner 2020). However, we do not believe that a lexicon size-frequency relationship exists for Kîîtharaka nouns comparable to what is witnessed in the English past tense inflection. In the case of English, past-tense inflection presents a unique challenge due to the presence of high frequency irregulars, many of which form a significant portion of learners’ early lexicons. This phenomenon limits generalizations in a sparse lexicon, requiring learners to memorize a larger set of nouns until they have learnt more words. As we have shown in section 1.2, nominal classification in Kîîtharaka largely comprises a regular formal paradigm. Additionally, Yang (2016; 2018) notes that the Tolerance Principle may not work very well with small lexicon sizes. (Un)productivity based on few values of N may be more of a sampling issue than the (in)ability to apply a rule. [^]
  15. As a reviewer points out, the term ‘based on’ may imply a diachronic rather than a synchronic phenomenon. Of course, in this paper, we are focused on the latter. However, as Bauer (2001) notes the two are not entirely dissociable: the synchronic status of the elements in the lexicon is a result of diachronic events. In other words, once a word derived by a rule enters the lexicon, it becomes a product of a diachronic process. Whether the rule that derived it remains productive is a further question. [^]
  16. As noted in section 2.1, nouns that vary in their singular agreement consistently based on their meaning were treated as different types (i.e., each meaning belongs to a different class). Both spirit of the dead and demon/evil spirit appear in the SIL wordlist as different types. Similarly, the noun kîûyûûyû appeared twice, representing “a grand child” (gender A) and “ancestor” (gender D). [^]
  17. While recursive application is included in the original formulation of the Tolerance Principle, as noted above, it is also worth noting that this introduces some potential questions about how to limit this powerful mechanism. For example, many intuitively unproductive rules could in principle be predicted to be productive on sufficiently small subsets of nouns. This could be dealt with by arguing that productivity must be supported by a sufficient number of examples (e.g., see Ellis 2006; Bauer 2005; Yang 2018; Plag 2018: for relevant comments). Nevertheless, this points to the need for behavioral data to support claims about predictive productivity, as we discuss just below. [^]

References

Aikhenvald, Alexandra Y. 2006. Classifiers and noun classes: Semantics. In Brown, Keith (ed.), Encyclopedia of languages and linguistics, 463–471. Oxford: Elsevier. DOI:  http://doi.org/10.1016/B0-08-044854-2/01111-1

Aikhenvald, Alexandra Y. 2012. Round women and long men: Shape, size, and the meanings of gender in new guinea and beyond. Anthropological Linguistics 54(1). 33–86. DOI:  http://doi.org/10.1353/anl.2012.0005

Alexiadou, Artemis. 2004. Inflection class, gender and DP internal structure. In Müller, Gereon & Gunkel, Lutz & Zifonun, Gisela (eds.), Explorations in nominal inflection, 21–50. Mouton de Gruyter Berlin. DOI:  http://doi.org/10.1515/9783110197501.21

Aronoff, Mark. 1976. Word formation in generative grammar. Linguistic Inquiry Monographs Cambridge, Mass 1. 1–134.

Baayen, Harald. 1993. On frequency, transparency and productivity. In Booij, Geert & Marle, Jaap (eds.), Yearbook of morphology 1992, 181–208. Springer. DOI:  http://doi.org/10.1007/978-94-017-3710-4_7

Bauer, Laurie. 2001. Morphological productivity, vol. 95. Cambridge: Cambridge University Press.

Bauer, Laurie. 2005. Productivity: Theories. In Štekauer, Pavol & Lieber, Rochelle (eds.), Handbook of word-formation, 315–334. Springer. DOI:  http://doi.org/10.1007/1-4020-3596-9_13

Bible Translation & Literacy, East Africa. 1993. Tharaka Project. The noun phrase in Kîîtharaka: A description of the noun class system, adjectives, demonstratives, numerals, and relative clauses in the Tharaka language of Kenya. Nairobi: Bible Translation & Literacy (E.A.), Tharaka Project. https://books.google.co.ke/books?id=K-hJHQAACAAJ.

Bible Translation & Literacy, East Africa. 2019. Bible in Tharaka language. Nairobi: Bible Translation & Literacy (EA).

Bjarnadóttir, Kristín. 2012. The database of modern icelandic inflection (beygingarlỳsing íslensks nútímamáls). Language Technology for Normalisation of Less-Resourced Languages 13.

Björnsdóttir, Sigríður. 2021. Productivity and the acquisition of gender. Journal of Child Language 48(6). 1209–1234. DOI:  http://doi.org/10.1017/S0305000920000732

Björnsdóttir, Sigríður. 2023. Predicting ineffability: Grammatical gender and noun pluralization in Icelandic. Glossa: A Journal of General Linguistics 8(1). 1–40. DOI:  http://doi.org/10.16995/glossa.5823

Bleek, Wilhelm Heinrich Immanuel. 1862. A comparative grammar of South African languages. London: Trübner.

Bonami, Olivier & Beniamine, Sacha. 2016. Joint predictiveness in inflectional paradigms. Word structure 9(2). 156–182. DOI:  http://doi.org/10.3366/word.2016.0092

Bosire, Mokaya. 2006. Hybrid languages: The case of sheng. In Selected proceedings of the 36th annual conference on African linguistics, vol. 18593. 185–193.

Bresnan, Joan & Mchombo, Sam A. 1995. The lexical integrity principle: Evidence from Bantu. Natural Language & Linguistic Theory 13(2). 181–254. DOI:  http://doi.org/10.1007/BF00992782

Brezina, V. & Weill-Tessier, P. & McEnery, A. 2020. Lancsbox (version 5)[computer software]. corpora. lancs. ac. uk/lancsbox.

Carstairs-McCarthy, Andrew. 1994. Inflection classes, gender, and the principle of contrast. Language 70(4). 737–788. DOI:  http://doi.org/10.2307/416326

Carstens, Vicki May. 1991. The morphology and syntax of determiner phrases in Kiswahili. Los Angeles: University of California.

Castagneto, Marina. 2017. Noun classification in Kiswahili. In Napoli, Maria & Ravetto, Miriam (eds.), Exploring intensification: Synchronic, diachronic and cross-linguistic perspectives, 79–97. John Benjamins Publishing Company. DOI:  http://doi.org/10.1075/slcs.189.05cas

Clahsen, Harald & Eisenbeiss, Sonja & Hadler, Meike & Sonnenstuhl, Ingrid. 2001. The mental representation of inflected words: An experimental study of adjectives and verbs in German. Language 77(3). 510–543. DOI:  http://doi.org/10.1353/lan.2001.0140

Contini-Morava, Ellen. 2000. Noun class as number in Swahili. In Contini-Morava, Ellen & Tobin, Yishai (eds.), Between grammar and lexicon, 3–30. Amsterdam/Philadelphia: John Benjamins Publishing. DOI:  http://doi.org/10.1075/cilt.183.05con

Contini-Morava, Ellen. 2008. Human relationship terms, discourse prominence, and asymmetrical animacy in Swahili. Journal of African Languages and Linguistics 29. 127–171. DOI:  http://doi.org/10.1515/JALL.2008.008

Coppock, Elizabeth. 2009. The logical and empirical foundations of baker’s paradox. Stanford University dissertation.

Corbett, Greville. 1991. Gender. Cambridge: Cambridge University Press.

Corbett, Greville G. & Fedden, Sebastian. 2016. Canonical gender. Journal of Linguistics 52(3). 495–531. DOI:  http://doi.org/10.1017/S0022226715000195

Creider, Chet A. 1975. The semantic system of noun classes in Proto-Bantu. Anthropological Linguistics, 127–138.

Crisma, Paola & Marten, Lutz & Sybesma, Rint. 2011. The point of Bantu, Chinese and Romance nominal classification. Rivista di Linguistica 23(2). 251–299.

De Schryver, Gilles-Maurice & Nabirye, Minah. 2010. A quantitative analysis of the morphology, morphophonology and semantic import of the Lusoga noun. Africana Linguistica 16. 97–153. DOI:  http://doi.org/10.3406/aflin.2010.989

Déchaine, Rose-Marie & Girard, Raphaël & Mudzingwa, Calisto & Wiltschko, Martina. 2014. The internal syntax of Shona class prefixes. Language Sciences 43. 18–46. DOI:  http://doi.org/10.1016/j.langsci.2013.10.008

Dembetembe, Norris Clemens. 1995. Secondary noun prefixes taken further with special reference to Shona. South African Journal of African Languages 15(3). 100–108. DOI:  http://doi.org/10.1080/02572117.1995.10587065

Demuth, Katherine. 2003. The acquisition of Bantu languages. In Nurse, Derek & Philippson, Gérard (eds.), The Bantu languages, 209–222. Routledge.

Denny, J. Peter & Creider, Chet A. 1986. The semantics of noun classes in Proto-Bantu. In Craig, Colette G. (ed.), Noun classes and categorization, 217–239. Amsterdam/Philadelphia: John Benjamins Publishing. DOI:  http://doi.org/10.1075/tsl.7.15den

Di Garbo, Francesca. 2014. Gender and its interaction with number and evaluative morphology: An intra-and intergenealogical typological survey of Africa. Department of Linguistics, Stockholm University dissertation.

Dixon, Robert M. W. 1986. Noun classes and noun classification in typological perspective. In Craig, Colette G. (ed.), Noun classes and categorization, 105–112. Amsterdam/Philadelphia: John Benjamins Publishing. DOI:  http://doi.org/10.1075/tsl.7.09dix

Dube, Progress & Ndebele, Lickel & Ndlovu, Mbulisi. 2014. An analysis of the status of the secondary noun prefixes in Ndebele. South African Journal of African Languages 34(2). 145–149. DOI:  http://doi.org/10.1080/02572117.2014.997050

Ellingson Eddington, David. 2022. Processing spanish gender in a usage-based model with special reference to dual-gendered nouns. The Mental Lexicon 17(1). 34–75. DOI:  http://doi.org/10.1075/ml.21011.ell

Ellis, Nick C. 2006. Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24. DOI:  http://doi.org/10.1093/applin/ami038

Emond, Emeryse & Shi, Rushen. 2020. Infants’ rule generalization is governed by the tolerance principle. In Dionne, Danielle & Vidal Covas, Lee-Ann (eds.), 45th annual Boston University conference on language development, 191–204. Somerville, MA: Cascadilla Press.

Evans, David. 2007. Corpus building and investigation for the humanities. https://www.birmingham.ac.uk/Documents/college-artslaw/corpus/Intro/Unit1.pdf. Accessed: 2021-10-23.

Fortune, George. 1970. The references of primary and secondary noun prefixes in Zezuru. African Studies 29(2). 81–110. DOI:  http://doi.org/10.1080/00020187008707322

Foster, Kenneth I. 1976. Accessing the mental lexicon. New Approaches to Language Mechanisms 257–287.

Fuchs, Zuzanna & van der Wal, Jenneke. 2022. The locus of parametric variation in Bantu gender and nominal derivation. Linguistic Variation 22(2). 268–324. DOI:  http://doi.org/10.1075/lv.20007.fuc

Fuchs, Zuzanna & van der Wal, Jenneke & SMircle, Stanford. 2018. Nominal syntax in Bantu languages: How far can we get with gender on n? Talk presented at Standford Smircle.

Gagliardi, Annie & Lidz, Jeffrey. 2014. Statistical insensitivity in the acquisition of Tsez noun classes. Language, 58–89. DOI:  http://doi.org/10.1353/lan.2014.0013

Givón, Talmy. 1971a. Historical syntax and synchronic morphology: An archaeologist’s field trip. In Chicago linguistic society, vol. 7. 394–415.

Givón, Talmy. 1971b. Some historical changes in the noun-class system of Bantu, their possible causes and wider implications. Papers in African Linguistics, 33–54.

Gor, Kira. 2017. The mental lexicon of l2 learners of Russian: Phonology and morphology in lexical storage and access. Journal of Slavic Linguistics 25(2). 277–302. DOI:  http://doi.org/10.1353/jsl.2017.0011

Güldemann, Tom. 2023. Animacy-based gender systems in central Africa. Africana Linguistica 29. 67–123.

Güldemann, Tom & Fiedler, Ines. 2019. Niger-congo “noun classes” conflate gender with deriflection. In Di Garbo, Francesca & Olsson, Bruno & Wälchli, Bernhard (eds.), Grammatical gender and linguistic complexity, vol. 1, 95–145. Language Science Press.

Guthrie, Malcolm. 1948. Gender, number and person in bantu languages. Bulletin of the School of Oriental and African Studies 12(3–4). 847–856. DOI:  http://doi.org/10.1017/S0041977X00083427

Guthrie, Malcolm. 1967. Comparative Bantu: an introduction to the comparative linguistics and prehistory of the Bantu languages, vol. 3. Farnborough: Gregg.

Harjula, Lotta. 2006. The Ha noun class system revisited. A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday. Special Supplement to SKY Jounal of Linguistics 19. 200–208.

Harris, James W. 1991. The exponence of gender in Spanish. Linguistic Inquiry 22(1). 27–62.

Herbert, Robert K. 1985. Gender systems and semanticity: two case histories from Bantu. In Winter, Werner (ed.), Historical semantics/historical word-formation, 171–197. Mouton Publishers. DOI:  http://doi.org/10.1515/9783110850178.171

Hockett, Charles F. 1958. A course in modern linguistics. New Delhi: Oxford & IBH Publishing Co.

Irmen, Lisa & Kurovskaja, Julia. 2010. On the semantic content of grammatical gender and its impact on the representation of human referents. Experimental Psychology 57(5). 367–375. DOI:  http://doi.org/10.1027/1618-3169/a000044

Kanampiu, Patrick & Martin, Alexander & Culbertson, Jennifer. 2025. Experimental evidence for semantic and morphophonological productivity in Kîîtharaka noun classes. Glossa Psycholinguistics 4(1). 1–45. DOI:  http://doi.org/10.5070/G6011.20527

Karmiloff-Smith, Annette. 1981. A functional approach to child language: A study of determiners and reference, vol. 24. Cambridge: Cambridge University Press.

Katamba, Francis. 2003. Bantu nominal morphology. In Nurse, Derek & Philippson, Gérard (eds.), The Bantu languages, 102–120. Routledge.

Kiparsky, Paul. 1973. Elsewhere in phonology. In Anderson, Stephen & Kiparsky, Paul (eds.), A festschrift for Morris Halle, 93–106. New York: Holt, Rinehart & Winston[Google Scholar].

Kiparsky, Paul. 1982. Word-formation and the lexicon. Mid-America Linguistics Conference. https://web.stanford.edu/{ }kiparsky/Papers/WordFormationMALC1982.pdf.

KNBS. 2019. Kenya Population and Housing Census: Volume iii. https://housingfinanceafrica.org/app/uploads/VOLUME-III-KPHC-2019.pdf.

Kodner, Jordan. 2020. Language acquisition in the past. University of Pennsylvania dissertation.

Konishi, Toshi. 1993. The semantics of grammatical gender: A cross-cultural study. Journal of Psycholinguistic Research 22. 519–534. DOI:  http://doi.org/10.1007/BF01068252

Krapf, Johann Ludwig. 1850. Outline of the elements of the Kishuáheli language, with special reference to the Kiníka dialect. Tübingen: L.F. Fues.

Lang, Adrianne. 1976. The semantic base of gender in German. Lingua 40(1). 55–68. DOI:  http://doi.org/10.1016/0024-3841(76)90032-2

Lawyer, Laurel A. & O’Gara, Fate & Ngoboka, Jean P. & van Boxtel, Willem & Jerro, Kyle. 2024. Meaning or morphology: Individual differences in the categorization of Kinyarwanda nouns. Glossa Psycholinguistics 3(1). 1–27. DOI:  http://doi.org/10.5070/G6011226

Li, Daoxin & Schuler, Kathryn D. 2023. Acquiring recursive structures through distributional learning. Language Acquisition 30(3–4). 323–336. DOI:  http://doi.org/10.1080/10489223.2023.2185522

Lumsden, John S. 1992. Underspecification in grammatical and natural gender. Linguistic Inquiry 469–486.

Madariaga, Nerea & Romanova, Olga. 2022. Simplifying grammatical gender in inflectional languages: Odessa Russian and beyond. Zeitschrift für Slawistik 67(2). 244–277. DOI:  http://doi.org/10.1515/slaw-2022-0011

Maho, Jouni Filip. 1999. A comparative study of Bantu noun classes: Acta Universitatis Gothoburgensis dissertation.

Markússon, Jón Símon. 2023. Accounting for different rates of gender reanalysis among Icelandic masculine forms in plural-ur. Nordic Journal of Linguistics 46(3). 331–356. DOI:  http://doi.org/10.1017/S0332586522000166

Meinhof, Carl. 1906. Grundzüge einer vergleichenden grammatik der bantusprachen. Berlin: Dietrich Reimer (Ernst Vohsen).

Msaka, Peter Kondwani. 2019. Nominal classification in Bantu revisited: The perspective from Chichewa. Stellenbosch University dissertation.

Mufwene, Salikoko S. 1980. Bantu class prefixes: Inflectional or derivational? In Kreiman, Jody & Ojeda, Almerindo (eds.), Papers from the sixteenth regional meeting Chicago Linguistic Society. 246–258.

Myers, Scott P. 1987. Tone and the structure of words in Shona. Amherst: University of Massachusetts.

Ngcobo, Mtholeni N. 2010. Zulu noun classes revisited: A spoken corpus-based approach. South African Journal of African Languages 30(1). 11–21. DOI:  http://doi.org/10.1080/02572117.2010.10587332

Pérez-Pereira, Miguel. 1991. The acquisition of gender: What Spanish children tell us. Journal of Child Language 18(3). 571–590. DOI:  http://doi.org/10.1017/S0305000900011259

Plag, Ingo. 2018. Word-formation in English. Cambridge: Cambridge university press. DOI:  http://doi.org/10.1017/9781316771402

Richardson, Irvine. 1967. Linguistic evolution and Bantu noun class system. In Manessy, Gabriel (ed.), La classification nominale dans les langues negro-africaines, Paris: Centre national de la recherche scientifique.

Rooryck, Johan. 1994. On two types of underspecification: Towards a feature theory shared by syntax and phonology. Probus 6. 207–223. DOI:  http://doi.org/10.1515/prbs.1994.6.2-3.207

Snider, Keith & Roberts, James. 2004. SIL comparative African wordlist (SILCAWL). Journal of West African Languages 31(2). 73–122.

Sternberg, Saul. 1969. Memory-scanning: Mental processes revealed by reaction-time experiments. American Scientist 57(4). 421–457.

Ström, Eva-Marie. 2012. The increasing importance of animacy in the agreement systems of Ndengeleko and other Southern Coastal Bantu languages. In Niclas, Burenhult & Arthur, Holmer & Anastasia, Karlsson & Håkan, Lundström & Jan-Olof, Svantesson (eds.), Language documentation and description, 265–285. SOAS. DOI:  http://doi.org/10.25894/ldd198

Stump, Gregory. 2016. Inflection classes. In Baerman, Matthew (ed.), The Oxford handbook of inflection, 113–140. Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780199591428.013.6

Taljard, Elsabé & De Schryver, Gilles-Maurice. 2016. A corpus-driven account of the noun classes and genders in northern Sotho. Southern African Linguistics and Applied Language Studies 34(2). 169–185. DOI:  http://doi.org/10.2989/16073614.2016.1206478

Taraldsen, Knut Tarald & Taraldsen Medová, Lucie & Langa, David. 2018. Class prefixes as specifiers in southern Bantu. Natural Language & Linguistic Theory 36. 1339–1394. DOI:  http://doi.org/10.1007/s11049-017-9394-8

wa Mberia, Kithaka. 1993. Kitharaka segmental morphophonology with special reference to the noun and the verb. University of Nairobi dissertation.

Wald, Benji. 1975. Animate concord in northeast Coastal Bantu: Its linguistic and social implications as a case of grammatical convergence. Studies in African Linguistics 6(3). 267–314.

Wechsler, Stephen. 2009. Agreement features. Language and Linguistics Compass 3(1). 384–405. DOI:  http://doi.org/10.1111/j.1749-818X.2008.00100.x

Welmers, W. M. 1973. African language structures. Berkeley: University of California Press.

Yang, Charles. 2016. The price of linguistic productivity: How children learn to break the rules of language. Cambridge: MIT press. DOI:  http://doi.org/10.7551/mitpress/9780262035323.001.0001

Yang, Charles. 2018. A user’s guide to the Tolerance Principle. Manuscript. University of Pennsylvania. http://ling.auf.net/lingbuzz/004146.

Zapf, Jennifer A. & Smith, Linda B. 2007. When do children generalize the plural to novel nouns? First Language 27(1). 53–73. DOI:  http://doi.org/10.1177/0142723707070286

Zawawi, Sharifa M. 1974. Loan words and their effect on the classification of Swahili nominals: A morphological treatment. Newyork: Columbia University.

Zeller, Jochen & Bylund, Emanuel & Lewis, Ashley Glen. 2022. The parser consults the lexicon in spite of transparent gender marking: EEG evidence from noun class agreement processing in Zulu. Cognition 226. 105148. DOI:  http://doi.org/10.1016/j.cognition.2022.105148

Zipf, George Kingsley. 1949. Human behavior and the principle of least effort: An introduction to human ecology. Cambridge: Addison-Wesley Press.