1 Introduction
Differential Object Marking (DOM) is a cover term for splits in argument marking in which some but not all direct objects (DOs) receive a special marker, morphological or other. Cross-linguistically, this is a widespread pattern which is attested in historically unrelated languages, and for which a series of commonalities have been observed (Bossong 1985; Witzlack-Makarevich & Seržant 2018). Generally speaking, more prominent objects tend to be marked, whereas less prominent ones tend to be unmarked (for the notion of referential prominence, cf. Haspelmath 2021). Spanish is well-known as a DOM language within the Romance language family (Kabatek et al. 2021; Gerards 2023) and its complex grammatical patterns have been discussed widely from different perspectives (e.g. López 2012; Fábregas 2013; García García 2014; Kabatek 2016).
Among the strongest factors of prominence figure animacy and definiteness of the DO. Consider (1) and (2) from Standard Spanish, where according to the literature definite animate objects must be marked, whereas inanimates cannot.1
- (1)
- Veo
- see-prs.1sg
- (*a)
- dom
- la
- def
- guitarra.
- guitar
- ‘I see the guitar.’
- (2)
- Veo
- see-prs.1sg
- *(a)
- dom
- la
- def
- guitarrista.
- guitar player
- ‘I see the guitar player.’
Also, according to the literature, indefinite animates can be marked, and marking affects the interpretation of the DO. This gives way to a broad range of variational patterns depending on discourse status, such as specificity (Leonetti 2004), as well as properties of the verbal predicate, such as the degree of transitivity and the affectedness of the DO (García García 2014). Furthermore, a series of sociolinguistic studies has observed regional differences: in some varieties DOM occurs more frequently in certain structures than in others, and the impact of the grammatical and discourse factors also seems to differ, see Section 1.1. However, given that these studies draw on corpora with different designs, it is unclear to which degree the results are comparable between varieties. What is more, many theoretically relevant contexts of variation occur so rarely in spontaneous language that it is difficult to interpret the respective findings in those studies.
This paper explores collecting highly comparable variational data across varieties with a specifically tailored experimental oral elicitation paradigm. University students were recruited to maintain highly comparable participant profiles. This may limit the study in terms of generalizability to the whole population. However, it is a necessary first step, and follow-up studies can be conducted with more varied participant pools. The paper argues that this method allows for an assessment of the quantitative differences in the variation of relevant constructions and provides crucial data for phenomena for which it is extremely difficult, if not impossible, to collect sufficient data with other methods. The study focuses on a-marking of inanimate DOs because contrary to the normative judgment in (1), marking occurs in many varieties to different degrees in canonical transitive sentences, cf. (3) from Venezuela. Furthermore, the claim that marking is almost mandatory for a specific set of verbs of very low transitivity independently of animacy, such as (4), has not yet been tested empirically.
- (3)
- y
- and
- entonces
- so
- ya
- already
- tengo
- have-prs.1sg
- que
- that
- estar
- be-inf
- esperando
- wait-ger
- al
- dom+def
- autobús
- bus
- ‘and so I have to wait for the bus’ (Balasch 2011a: 76)
- (4)
- El
- def
- artículo
- article
- acompaña
- accompany-prs.3sg
- al/
- dom+def
- *el
- def
- sustantivo.
- noun
- ‘The article accompanies the noun.’ (García García 2014: 144)
In the elicitation study we also collected exploratory data on four additional constructions that have been argued to be sensitive to DOM, either by triggering (complement small clause and accusative-with-infinitive constructions) or blocking it (ditransitive sentences and secondary predication with tener ‘to have’).
The paper is structured as follows: Section 1.1 introduces the main findings from the sociolinguistic literature and explains the selection of urban varieties for comparison, namely from Montevideo, Lima, Mexico City, and Madrid. Section 1.2 then presents the relevant grammatical constructions for the present study, derives a series of hypotheses, and states the precise research questions. Subsequently, Section 2 and Section 3 report the oral elicitation experiment and its results. Section 4 offers a discussion of the findings and their implications.
1.1 The variational literature on Spanish DOM
The sociolinguistic literature has focused primarily on the varieties of Argentina, Cuba, Mexico, Spain, and Venezuela. Tippets (2011) investigates the variation of DOM in the spoken language of Mexico City, Buenos Aires, and Madrid on the basis of the Habla Culta corpora from the 1960s and 70s. In his study, all instances of DOM are first extracted, then the verbs from these sentences were used for an exhaustive compilation of DOs. Verbs that never have marked objects were therefore not included. The multivariate analysis that was subsequently conducted yielded the following factors for each city, in this order: (i) for Buenos Aires, relative animacy between subject and DO, animacy of DO, specificity of DO, and form of DO (lexical vs. proper noun); (ii) for Mexico City, animacy of DO, relative animacy, specificity, and form of DO; (iii) for Madrid, relative animacy, animacy of DO, and form of DO. The discourse status (roughly, whether the referent of the DO had been introduced in the context) was found not to be significant. However, Tippets (2011: 116) suggests that this might be due to the way this variable was implemented in the data annotation. Interestingly, Tippets (2011: 115) reports that the rate of marking of animate and specific nouns is highest for Mexico City (96%), the only sample where near-obligatoriness is reached (Buenos Aires: 88%, Madrid: 79%).
Balasch (2011a; b) compares the spoken language of Mérida (Venezuela) with that of Madrid, again using the Habla Culta corpus for Madrid and a comparable corpus for Mérida. The exclusion protocol is different from that of Tippets, but the main goal is also to consider only those verbs for which variation actually occurs. While Tippets did not use the category of (in)definiteness as such, attempting instead to split it into two independent properties, Balasch does include (in)definiteness among her variables. Balasch (2011b: 119) also points out that analyzing animate and inanimate DO through traditional sociolinguistic variational analysis is a “methodological error” given their unbalanced distribution. Consequently, she analyzes the two groups separately rather than using a statistical tool designed to handle unbalanced samples. In the multivariate analysis (in)definiteness and “co-reference of DO” (Balasch’s operationalization of the discourse status of the DO) are seen to be significant within the animate group (there are too few marked inanimates to run a separate analysis). Comparing Mérida and Madrid, Balasch (2011a: 86) reports higher overall rates of marking with animates in the latter (46% vs. 61%). In the multivariate analysis for Madrid, co-reference is not selected and only (in)definiteness remains significant. Comparing overall frequencies with those of Tippets for his Madrid data, the frequency of marking is surprisingly low, considering that the same corpus was used for Madrid. It can therefore be assumed that the two studies present larger differences in data selection and annotation. For instance, neither Tippets nor Balasch discuss dislocation of DOs, which is known to strongly affect a-marking (Sanz 2011).
Regarding the focus of our present study, i.e. the marking of inanimate DOs, Tippets (2011: 113) reports 5% of a-marking for Madrid, 8% for Buenos Aires, and 15% for Mexico City in the Habla Culta data. A rather high frequency of marking with inanimate DOs in spoken Mexican Spanish is also reported in Company Company (2002). This study indicates 17.2% of marking in the data, a combination of spoken and written language texts, and suggests that DOM might become a general case marker in Mexican Spanish. García García (2015) presents some criticism of this view and the underlying analysis. Barraza (2003) reports 5% of marking for inanimate DOs for a corpus of Mexican texts and Buyse (1998) 3.2%, based on a corpus composed mainly of texts from Peninsular Spanish. Balasch (2011a) reports 2% for her Mérida corpus of spoken data. These results suggest that marking of inanimates occurs more frequently in spoken language, and that Mexico seems to have the highest rates among the varieties under investigation, followed by Buenos Aires (increased use of DOM in this variety had already been reported by Barrenechea & Orecchi 1977).2 However, while the animacy feature groups together human and maybe also some animal referents, inanimates are a much more heterogeneous class, including concrete objects, materials, and different kinds of abstract nouns. Since the aforementioned studies do not give precise information about the composition of their inanimate class, comparison of the percentages from different studies is problematic. Verbal semantics is also neglected, although it plays an important role, as will become evident from the next section. Thus, as already pointed out by García García (2014: 69), each currently available corpus study is in principle only representative of itself. A better controlled and comparable dataset is required in order to discuss possible differences across varieties.
There is no sociolinguistic study on Peruvian Spanish comparable to the aforementioned ones. However, several studies on varieties of Spanish in contact with local indigenous languages have shown different degrees of deviation from the standard language, including variational patterns and the substitution of the marker a by another preposition (Bossong 1991; Mayer & Sánchez 2021; Wall & Obrist 2021). Thus, since there is no study on Coastal Peruvian Spanish and this variety is traditionally considered as the one being closest to the peninsular standard (Lapesa 1981; Cerrón Palomino 2003), we included it in our investigation.
Summing up, by reviewing the findings reported in the literature, it becomes evident that although there is a number of detailed studies on the variation of DOM, it is difficult to extrapolate from them. The most robust generalization possible at the moment seems to be that compared to Spain, the (few) varieties investigated in the Americas have higher percentages of marking on inanimates (except for those where DOM in general is retracting). Given that corpus work on DOM is laborious and non-trivial in many respects, one solution might be to complement corpus studies with controlled experimental investigations, where a stable methodology can be applied more easily at different locations.
To the best of our knowledge, only one experimental study has used a similar elicitation approach to the one reported here and investigated the extension of DOM to inanimate objects, namely Bautista-Maldonado & Montrul (2019) on Mexican Spanish. Participants in that study only marginally marked definite inanimates (3% of cases), and almost never marked indefinites (1%). The authors do not address this finding in the discussion but rather point to results from an acceptability judgment study in the same paper where such objects received higher ratings. One possible explanation for the results of their elicitation study is that they used different verbs for animate and inanimate objects, which also poses a problem for comparison. We will revisit this issue in Section 4.1.
1.2 Six contexts of variation: research questions and hypotheses
In the present parallel experiment, we investigate six constructions – three that allow for DOM on inanimates to a certain degree, two that block marking on animates, according to the literature, and one for which it has been claimed that DOM with indefinite animates is obligatory. Since we cannot discuss the technical details of the analysis proposals from literature for each of these constructions, we will point out the relevant grammatical properties for the present purpose and derive our hypotheses from them.
First, as already shown in (3), even in canonical transitive sentences inanimates are sometimes a-marked by speakers. Section 1.1 has cited some preliminary findings regarding patterns of variation. In order to account for such occurrences, García García (2014: 188f) has pointed out that sometimes nouns that refer to artifacts or objects, although being inanimate, may still display certain proto-agent properties in the sense of Dowty (1991). (3) could be explained following this account by saying that waiting for a bus that is moving on a predetermined path, interacting with human passengers, has some proto-agent properties and hence DOM is licensed. On this account, obviously, putative occurrences of DOM where no proto-agent properties are given would be excluded. Furthermore, it is unclear whether only definite inanimates, as the examples suggest, or also indefinite inanimates can be marked by speakers. The same is true for possible cross-varietal variation. Without alluding to proto-roles, example (3) could also be explained via (referential) specificity (Leonetti 2004) or the distinction between weak and strong readings (Bleam 2005). Waiting for the bus may have a weak as well as a strong reading, and in the case of the latter, there is a specific bus the speaker is waiting for. Based on these observations and the results from the sociolinguistic literature, we raise the following two initial hypotheses H1 and H2 for indefinite DOs in canonical transitive sentences in the four varieties under investigation:
- H1:
- Inanimates in canonical transitive sentences are more frequently marked in Mexico City and Montevideo than in Madrid and Lima.
- H2:
- Establishing the referent of an indefinite NP in previous context leads to more a-marking.
The second relevant set consists of low transitivity constructions (in contrast to canonical transitives), as in (4). Despite only concerning a rather limited number of verbs, they merit consideration since they seem to allow for systematic marking on inanimate DOs (Weissenrieder 1991; García García 2014; Zdrojewski 2020; Camacho 2023). García García (2014: 143–168) finds in his corpus study that for verbs of substitution (reemplazar ‘to replace’, sustituir ‘to substitute’), symmetrical verbs (acompañar ‘to accompany’, igualar ‘to equal’), and reversible-converse verbs (preceder ‘to precede’, seguir ‘to follow’) marking is indeed required, precisely when the low transitivity reading is actually given. In other readings, this is not the case. The relevant contrast concerns the thematic roles that are assigned to the arguments. Hence, readings implying an agentivity cline assign a control, cause, or experiencer role to one of the arguments, while in the low-transitivity readings both arguments receive the same one, often simply exist or act-like. To exemplify this contrast, compare the analysis of (4) in (5) with (6) (which is a statement about silent films). While the semantics of (5) is symmetrical (note that the predicate structure on both sides of the arrow of the semantic formula is identical), the one in (6) is not. Here, the relevant reading is not that pianist and film simply coexist in the same situation, but that the pianist (represented by x) actually adds something to the projection, namely the background music (represented as x’), and thus is the controller of this activity.
- (5)
- El
- def
- artículo
- article
- acompaña
- accompany-prs.3sg
- al/
- dom+def
- *el
- def
- sustantivo.
- noun
- ‘The article accompanies the noun.’ (García García 2014: 144)
- NEXT-TO(EXIST(x), EXIST(y)) → NEXT-TO(EXIST(y), EXIST(x))
- (6)
- Un
- indf
- pianista
- pianist
- acompañaba
- accompany-impf.3sg
- siempre
- always
- las
- def
- proyecciones.
- projections
- ‘A pianist always accompanied the projections.’ (García García 2014: 155)
- CTRL(x, NEXT-TO(EXIST(x’), EXIST(y)))
In García García’s corpus study, only very few examples do not follow this generalization, and the author suggests alternative explanations for them. The author proposes that (almost) obligatory marking of these verbs in the relevant readings follows from the generalization that if the DO has the same amount of proto-agentive properties as the subject, the DO receives the marker. Camacho (2023: 175) criticizes this approach, but this seems to be based on a misunderstanding. Symmetrical does not mean that each of the arguments can occur either as subjects or DOs, but that the predicate does not assign an unequal number of proto-agent properties to them. Furthermore, García García (2014) explicitly argues that it is an advantage of his approach that the notion of animacy does not play a role in it (another critique by Camacho 2023), whereas the fact that animate DOs are marked still follows from his account.3 While García García (2014) focuses on inanimate objects, his account does not predict different outcomes for animate objects (with animate or inanimate subjects), since keeping animacy out of the account is a deliberate choice. From these observations, we derive H3 and H4. Obviously, if H4 is not supported by the data, this does not contradict García García’s account, but it would call for additional explanations.
- H3:
- Reversible-symmetrical verbal predicates in the sense of García García (2014) have (quasi-)obligatory DOM if the interpretation ensures low transitivity.
- H4:
- In reversible-symmetrical verbal predicates, animate and inanimate objects are marked with similar frequency (almost obligatorily).
Complement small clauses (traditionally also labeled as double accusative constructions) have been reported to show high frequencies of DOM on inanimates as well (Weissenrieder 1991). García García (2014: 103) observes in his corpus study that when the predicative is verb-adjacent and followed by the DO, as in (7), DOM occurs categorically on inanimate DOs, whereas marking drops to 21% when the predicative occurs after the DO.
- (7)
- Algunos
- some
- gramáticos[…]
- grammarians
- no
- neg
- consideran
- consider-prs.3pl
- oración
- sentence
- a
- dom
- la
- the
- secuencia
- sequence
- con
- with
- verbo.
- verb
- ‘Some grammarians do not consider the sequence with a verb (to be) a sentence.’
López (2012: 23) furthermore claims that in such constructions DOM is also obligatory with indefinite and animate DOs, an intuition that still awaits more systematic empirical confirmation. From these claims, we derive H5 and H6.
- H5:
- In complement small clauses, inanimates are more frequently marked when the predicative is adjacent to the verb and precedes the DO.
- H6:
- In complement small clauses, indefinite NPs with animate referents are obligatorily marked.
In the discussion around H1 and H2 we pointed out that marking of indefinite animates is usually considered to be strongly conditioned by specificity. However, López (2012) identified a syntactic context where this does not seem to hold, namely what he calls ‘clause union’ – an infinitival complement clause. In such structures, according to his judgment, marking is generally obligatory (8), regardless of specificity. The author mentions causative and perception verbs explicitly, among others. In García García’s corpus study, however, inanimate DOs were only marked occasionally with causative verbs, but never with perception verbs (9) (García García 2014: 105).
- (8)
- a.
- María
- M.
- hizo
- make-perf.3sg
- llegar
- arrive-inf
- tarde
- late
- *(a)
- dom
- un
- indf
- niño.
- boy
- ‘María made a boy be late.’
- b.
- María
- M.
- vio
- see-perf.3sg
- caer
- fall-inf
- *(a)
- dom
- un
- indf
- niño.
- boy
- ‘María saw a boy fall.’ (López 2012: 24)
- (9)
- muy
- very
- peligroso
- dangerous
- porque
- because
- puede
- can-prs.3sg
- hacer
- make-inf
- volcar
- tilt
- al
- dom+def
- avión
- plane
- ‘(it is) very dangerous because it can make the plane tilt’ (García García 2014: 105)
López’ claim regarding the obligatoriness of marking has not yet been tested more broadly, but if it is true, it could be taken as a control condition for obligatory marking compared to the indefinites in canonical transitive sentences. His account of these structures is purely syntactic. In summary, it is argued that other means of case assignment are unavailable in this configuration and therefore the DO has to raise to a position that is associated with a-marking (López 2012: 56f). If this categorical prediction is not confirmed by the data, the most immediate alternative would be to assume that agentivity still plays a role and hence more marking would be expected with causative than with perception verbs. This gives rise to two alternative hypotheses, H7 and H8:
- H7:
- A-marking in clause union (accusative-with-infinitive) is obligatory and, hence, all animate indefinite DOs will be marked.
- H8:
- A-marking in clause union is not obligatory, and agentivity favors marking. Hence, DOs will be more frequently marked with causative than with perception verbs.
Finally, there are two constructions that (partially) block DOM according to the literature. One case in point are sentences with the verb tener (‘to have’), which rejects DOM in most contexts (10a). However, marking is claimed to be possible, as in (10b), when the VP contains a secondary predicate (López 2012: 20).4
- (10)
- a.
- Ana
- A.
- tiene
- have-prs.3sg
- (*a)
- dom
- una
- indf
- hija.
- daughter
- ‘Ana has a daughter.’
- b.
- Ana
- A.
- tiene
- have-prs.3sg
- *(a)
- dom
- una
- indf
- hija
- daughter
- estudiando
- study-ger
- derecho.
- law
- ‘Ana has a daughter who is studying law.’
From a different perspective, Rodríguez Mondoñedo (2007) claims that even definite animate DOs can be used without marking (11), yielding an interpretation of inalienable possession, whereas marking remains obligatory with other readings.
- (11)
- Ella
- she
- tiene
- have-prs.3sg
- el
- def
- hermano
- brother
- en
- in
- la
- def
- cárcel.
- prison
- ‘She has her brother in jail.’ (Rodríguez Mondoñedo 2007: 271)
This raises the question what happens when both criteria are met by a sentence – i.e. a DO which is given agentive properties by a small clause, but which also has a relation of inalienable possession to the subject. The following two hypotheses can be derived, opening room for variation:
- H9:
- If a definite and animate DO of the predicate tener receives the interpretation of inalienable possession, it is not marked.
- H10:
- DOs of the predicate tener can be marked if the secondary predicate assigns agentive properties to the DO.
The second case of (partial) blocking of DOM are ditransitive sentences, in which the indirect object is marked by the preposition a. In such sentences, DOM would lead to two a-marked objects, which is claimed to be strongly disfavored.
- (12)
- Pedro
- P.
- presentó
- present-prf.3sg
- ??(a)
- dom
- su
- his
- mujer
- woman
- a
- to
- sus
- his
- amigos.
- friends
- ‘Pedro introduced his wife to his friends.’ (García García 2014: 53)
In von Heusinger et al. (2016), the authors report confirming evidence from a written questionnaire study, in which participants from Spain had to decide in a two-way forced choice task whether the DO should be marked or not. Overall, participants chose the option without DOM in roughly 40% of the cases, a surprisingly low number according to the authors, possibly influenced by the participants’ awareness of the phenomenon. Hypothetically, speakers would mark DOs in such contexts less in spontaneous production without awareness of the options. Diachronic data provided in von Heusinger (2018) also confirm lower rates of DOM in the relevant constructions. Since ditransitive sentences have not yet been studied in language production data, we advance the following hypothesis:
- H11:
- The presence of a full indirect object NP reduces the frequency of marking on definite and animate DOs.
As has been repeatedly mentioned above, there are two further, over-arching research questions related to all these constructions:
- RQ1:
- What is the quantitative range of variation within each construction under investigation?
- RQ2:
- Are there (quantitative) differences for the constructions under investigation across different varieties?
We examined all the constructions together in one elicitation study in order to get a general overview of this domain of variation of Spanish DOM. Since it is not possible to robustly test a high number of constructions in one experiment, the focus of the study was on the first two constructions, whereas the other four were tested less exhaustively and also serve as control items.
As stated in the beginning of this section, delving deeper into the technical details of the theoretical approaches cited above is not possible due to space constraints. Yet, broadly speaking, our results primarily relate to the so-called Ambiguity Thesis (von Heusinger & Kaiser 2007: 83), which states that if a language does not morphologically distinguish subjects from objects, it may develop a marker for objects that are too similar to subjects at some level of representation. This thesis has seen many implementations, be it as a weak universal in the typological-functional sense (Seržant 2019),5 where it is called the weak discrimination hypothesis, or as an antisymmetry-condition on linearization within Generative Grammar (Richards 2010), among others. Note that the account based on identity of proto-role properties by García García (2014) essentially also falls into this category, and thus the second set of constructions directly addresses this thesis. The complement small clause constructions are not directly relevant for the thesis as formulated above, but still, H5 could be interpreted as a means to distinguish the direct object from the predicative at the surface. We will pick up the overall implications of our study for this thesis in the discussion section 4.5. The alternative Transitivity Thesis is not directly addressed by our account. Rather, its putative effects are controlled for. For instance, in our canonical transitive item set, the verbs selected for the stimuli are balanced in terms of how the object is affected by the described event. The other parameters of transitivity, such as participants, aspect, punctuality, etc. (von Heusinger & Kaiser 2007: 83) are kept stable across the stimulus sets.
2 Method: the elicitation task
The elicitation task has been developed especially for the investigation of the variation of DOM across varieties of Spanish. In this paper, we report findings from a broad sample of speakers from the capital cities of four countries. The tasks required participants to spontaneously produce a sentence using linguistic material presented on a screen. They received instructions for each task type at the beginning of the experiment and were allowed to ask questions at any time throughout the experiment. Sentences were recorded with the SpeechRecorder software (Draxler & Jänsch 2004) for subsequent analysis, in this case, transcribing and annotating the utterances and quantifying the presence or absence of a-marking on direct objects. The stimulus items, datasets, further appendices, and R scripts with the analyses can be accessed on the OSF platform via the following DOI: https://doi.org/10.17605/OSF.IO/ASXCW.
2.1 Participants
In total, 174 university students from Lima, Madrid, Mexico City, and Montevideo completed the elicitation task. They were all native speakers of the local varieties and had lived in the respective cities for at least five years. They were recruited from all disciplines with the exclusion of language-related studies. Table 1 provides more detailed information on their profiles. We chose university students to obtain highly comparable groups across varieties, which would be difficult otherwise. Presumably, more variation would be found if other groups with less exposure to the (regional) standard were included. However, if regional variation is found between speakers with highly similar profiles, this would be a strong indication of deeper grammatical differences between the varieties under investigation.
Table 1: Participant groups.
Region | N | Gender | Age | |||
male | female | range | mean | SD | ||
Lima | 42 | 24 | 18 | 17–24 | 19.7 | 1.8 |
Madrid | 45 | 27 | 18 | 18–35 | 28.1 | 3.4 |
Mexico City | 42 | 23 | 19 | 18–34 | 22.7 | 3.8 |
Montevideo | 45 | 28 | 17 | 18–36 | 22.5 | 4.3 |
2.2 The stimulus materials
The experiment consisted of six sets of items, one for each context of variation discussed in Section 1.2. The present section exemplifies the stimulus materials and explains how they were constructed (Appendix A contains all items), while Section 2.3 covers the general experimental design and the task types. Overall, special attention was paid to verbal and nominal semantics, either controlling for variation or using lexical items from the same or similar lexical fields within one item set. Furthermore, no target verb or argument NP in the stimuli appeared in more than one item, and only verb forms were used that do not end with the vowel [a], since this would have made it impossible to determine whether the following DO is preceded by a marker of the same vowel quality.
Since the experiment was designed to test the marking of inanimate objects, these item sets are the largest. Item set 1 has four measure repetitions per condition and list, resulting in a total of 16 different items (cf. Table 2 which summarizes the experimental design per item set). One manipulation was to provide either a human or an inanimate NP for the second NP slot, and the other to provide different types of context sentences. In one type, the target referents were already introduced in the context sentence with plural NPs because a singular referent would most naturally be picked up with a definite, not an indefinite NP. The other type of context sentence did not mention the referents in the stimulus. Furthermore, the verbal affectedness of the direct object was controlled in the stimuli using four of the five affectedness categories proposed in von Heusinger & Kaiser (2011), which are ranked from high to low affectedness: verbs that have a direct impact on the object (kill, hit) > verbs of perception (see, hear) > verbs of pursuit (look out for, wait for) > verbs of knowledge (know, understand). (13) is an example of one complete experimental item of the canonical transitives set and Figure 1 shows how such an item appeared on the participant screen.
- (13)
- Context sentences:
- a.
- Context sentences without and with introducing the specific human DO referents:
- (i)
- En el puerto se encontraron muchas personas.
- ‘At the harbor, there were many people.’
- (ii)
- En un crucero, dos pasajeros y dos tripulantes se encontraron en el mismo bar.
- ‘On a cruise ship, two passengers and two crewmembers met in the same bar.’
- b.
- Context sentences without and with introducing the inanimate object referents:
- (i)
- En un viaje en crucero hubo mucho que descubrir.
- ‘On a cruise trip there was a lot to be discovered.’
- (ii)
- Dos pasajeros de un crucero se aproximaron a dos islas.
- ‘Two passengers of a cruise ship approached two islands.’
- c.
- Material for sentence production (human vs. inanimate object):
- (i)
- un
- a
- pasajero
- passenger
- →
- un
- a
- tripulante
- crew member
- |
- vio
- see-prf.3sg
- (i)
- un
- a
- pasajero
- passenger
- →
- una
- a
- isla
- island
- |
- vio
- see-prf.3sg
Example (14) contains one complete item of item set two. The corresponding task was to paraphrase a given sentence (i) with some additional linguistic material (ii). The precise (low-transitivity) interpretation is crucial here (as discussed in Section 1.2). For this reason, the paraphrase as an interpretational prompt is key. All the sentences to be paraphrased in (14) allocate the same amount of proto-agent properties to both arguments. This set consisted of eight items since only a reduced number of verbs falls into the class of reversible-symmetrical low-transitivity verbs. Furthermore, since it was impossible to construct meaningful sentences by systematically using the same nouns across all conditions, the nouns were changed in the condition with two inanimate arguments (14d), but the same nouns were used in different permutations in the others. For the human nouns, designations of professions or occupations were used and for the inanimates, nouns referring to concrete objects that fitted into the respective professional or occupational frame.
- (14)
- a.
- human subject – human DO
- (i)
- El alumno tomó el lugar del instructor.
- ‘The pupil took the place of the instructor.’
- (ii)
- el
- the
- alumno
- pupil
- →
- el
- the
- instructor
- instructor
- |
- sustituyó
- substitute-prf.3sg
- b.
- human subject – inanimate DO
- (i)
- El alumno se hizo cargo del trabajo de la máquina.
- ‘The pupil took on the machine’s work.’
- (ii)
- el
- the
- alumno
- pupil
- →
- la
- the
- máquina
- machine
- |
- sustituyó
- substitute-prf.3sg
- c.
- inanimate subject – human DO
- (i)
- La máquina continuó con el trabajo del alumno.
- ‘The machine continued with the pupil’s work.’
- (ii)
- la
- the
- máquina
- machine
- →
- el
- the
- alumno
- pupil
- |
- sustituyó
- substitute-prf.3sg
- d.
- inanimate subject – inanimate DO
- (i)
- Hoy en día se usa más el bolígrafo en vez del lápiz.
- ‘Nowadays, the pen is used more often instead of the pencil.’
- (ii)
- el
- the
- bolígrafo
- pen
- →
- el
- the
- lápiz
- pencil
- |
- sustituyó
- substitute-prf.3sg
The remaining four item sets consisted of four items each (cf. Table 2). Example (15) shows a complete item of the complement small clause set. In this case, there is no overt subject NP, the verb is kept constant across all manipulations and for each item, a human noun (again a profession) is inserted with a plausible predicative for the verb-object combination. The same is done with an inanimate noun. The second manipulation is to invert the order of DO and predicative and the arrow that points from the DO to the predicative.
- (15)
- a.
- DO in situ / human
- Consideraron
- consider-prf.3pl
- |
- un
- a
- informático
- computer scientist
- →
- testigo
- witness
- experto
- expert
- b.
- DO dislocated / human
- Consideraron
- consider-prf.3pl
- |
- testigo
- witness
- experto
- expert
- ←
- un
- a
- informático
- computer scientist
- c.
- DO in situ / inanimate
- Consideraron
- consider-prf.3pl
- |
- una
- a
- fábrica
- factory
- →
- empresa
- company
- pionera
- pioneer
- d.
- DO dislocated / inanimate
- Consideraron
- consider-prf.3pl
- |
- empresa
- company
- pionera
- pioneer
- ←
- una
- a
- fábrica
- factory
(16) contains an example of item set four (accusative-with-infinitive). According to García García (2014: 104f), DOM occurs more frequently with verbs of perception than with causative verbs in his corpus study. Furthermore, if the DO is considered to be more agentive, frequency of DOM should increase. In the item design, this is operationalized by using a periphrastic verb form in which the auxiliary can be either a causative or a perception verb, but the main verb remains constant. The agentivity manipulation consists in the presence vs. absence of an intensifying adverbial.
- (16)
- a.
- causative / –intense
- Hicieron
- make-prf.3pl
- |
- correr
- run-inf
- |
- un
- a
- mensajero
- messenger
- b.
- causative / +intense
- Hicieron
- make-prf.3pl
- |
- correr
- run-inf
- rápidamente
- fast
- |
- un
- a
- mensajero
- messenger
- c.
- perception / –intense
- Vieron
- see-prf.3pl
- |
- correr
- run-inf
- |
- un
- a
- mensajero
- messenger
- d.
- perception / +intense
- Vieron
- see-prf.3pl
- |
- correr
- run-inf
- rápidamente
- fast
- |
- un
- a
- mensajero
- messenger
In the tener item set (17), no stimulus for the subject was given either. The DO NP is always a kinship term in order to ensure an inalienable possession reading (H9). The manipulation of agentivity is operationalized by including a gerund which implies to be active (e.g. working) or not (e.g. resting) and an adjective which denotes a higher or lower degree of involvement, thus allowing to compare different degrees and possible sources of agentivity (H10).
- (17)
- a.
- –active / +involved
- padre
- father
- |
- muy
- very
- contento
- happy
- |
- descansando
- rest-ger
- en
- on
- el
- the
- sofá
- sofa
- |
- tengo
- have-prs.1sg
- b.
- –active / –involved
- padre
- father
- |
- muy
- very
- exhausto
- exhausted
- |
- descansando
- rest-ger
- en
- on
- el
- the
- sofá
- sofa
- |
- tengo
- have-prs.1sg
- c.
- +active / +involved
- padre
- father
- |
- muy
- very
- contento
- happy
- |
- trabajando
- work-ger
- en
- in
- la
- the
- oficina
- office
- |
- tengo
- have-prs.1sg
- d.
- +active / –involved
- padre
- father
- |
- muy
- very
- exhausto
- exhausted
- |
- trabajando
- work-ger
- en
- in
- la
- the
- oficina
- office
- |
- tengo
- have-prs.1sg
The ditransitive item set (18) uses transferential verbs and crosses animate and inanimate DOs with indefinite and possessive NPs. The animate DOs were all animals that are kept as pets and the inanimates were all artifacts. Subject and indirect object were always human and co-varied between indefinite and possessive with the DO.
- (18)
- a.
- –def / +anim
- Cristina
- C.
- |
- un
- a
- loro
- parrot
- |
- una
- a
- hermana
- sister
- |
- ayer
- yesterday
- |
- entregó
- give-prf.3sg
- b.
- –def / –anim
- Cristina
- C.
- |
- una
- a
- bicicleta
- bicycle
- |
- una
- a
- hermana
- sister
- |
- ayer
- yesterday
- |
- entregó
- give-prf.3sg
- c.
- +def / +anim
- Cristina
- C.
- |
- su
- her
- loro
- parrot
- |
- su
- her
- hermana
- sister
- |
- ayer
- yesterday
- |
- entregó
- give-prf.3sg
- d.
- +def / –anim
- Cristina
- C.
- |
- su
- her
- bicicleta
- bicycle
- |
- su
- her
- hermana
- sister
- |
- ayer
- yesterday
- |
- entregó
- give-prf.3sg
2.3 The experimental design
As seen in the preceding section, each item set was created according to a 2×2 factorial experimental design. The items of all sets were distributed across four lists according to the Latin Square. Table 2 gives an overview of the factors within each set and the number of items each participant saw in each condition. All but the last item set use human-denoting nouns for the animacy category and only for the ditransitives we used animal-denoting nouns.
Table 2: Experimental manipulations per construction, number of items per list and condition.
Item set (construction) | Factor A | Factor B | |
1. Canonical transitives | DO | human DO | inanimate DO |
discourse given discourse new |
4 4 |
4 4 |
|
2. Reversible-symmetrical predicates | SUBJ | human DO | inanimate DO |
human subject inanimate subject |
2 2 |
2 2 |
|
3. Complement small clauses | DO | human DO | inanimate DO |
adjacent to verb not adjacent to verb |
1 1 |
1 1 |
|
4. Accusative-with-infinitive structures | V type | adv. modifier | no adv. modifier |
causative verb perception verb |
1 1 |
1 1 |
|
5. tener with secondary predicate | DO | + active | – active |
+ involved – involved |
1 1 |
1 1 |
|
6. Ditransitive sentences | DO | animal DO | inanimate DO |
indefinite NP def. possessive NP |
1 1 |
1 1 |
Participants had to perform four slightly different tasks during the elicitation experiment. For item set 1 (canonical transitives), a context sentence was presented together with some additional unconnected words (in red letters). Figure 1 gives one example of how Task 1 was prompted by written stimuli and displayed on the participants’ screen. Participants had to produce a sentence using the material in red. For item set 2 (reversible-symmetrical predicates), a sentence (in black) was presented together with some unconnected words (in red) and participants were asked to paraphrase the presented sentence with the material in red. For item set 3 (complement small clauses), unconnected words and phrases were presented (in red) and participants were asked to build a sentence with this material while maintaining the relative order of the presented words. For item sets 4–6, again unconnected words were presented (in red) and participants had to construct a sentence, this time without any restrictions regarding word order. In all tasks, participants were explicitly allowed to add more words to the sentence and to arrange the presented material and the additionally included words as they liked (with the exception of item set 3).
As can be seen in Figure 1, context sentences or sentences to be paraphrased were presented in black. The unconnected words for sentence production were presented in a separate line below in red. In most cases (see Section 2.2), a vertical bar separated words or phrases. In some item sets, an arrow was placed between two nominal arguments as a non-verbal implicit strategy to induce transitivity, pointing from the potential subject to the potential object or, in the complement small clause set, cf. (15), from the potential object NP to the predicative complement. Participants were not explicitly told about the function of the arrow. If the question arose, it was explained that it represented a connection in meaning between the two elements which had to be involved with one another in the sentence.
2.4 Procedure
The elicitation experiments were conducted in closed rooms at the Pontificia Universidad Católica del Perú, the Universidad Nacional Autónoma de México, the Universidad Complutense de Madrid, and the Universidad de la República, Montevideo, always following the same procedure. Experimenter and participant were seated across a table. While the experimenter navigated the experimental software from a laptop, the participant saw only the instructions and the stimulus material on a separate screen. The audio data was recorded by a directional microphone (RØDE NGT4), connected via an audio interface (ZOOM U-22). During the instructions, the four tasks were introduced. Participants were told that they were free to add words to the ones presented on the screen, but that they had to use all of the presented ones and without modifying their form. Also, they were allowed to arrange the words as they liked (except for the stimuli of item set 3). During the experiment, participants were given time to read the complete instructions and stimulus material on the screen and to imagine a possible response. Once they had the sentence in mind and gave the experimenter a signal, the microphone was activated and their response was recorded. Participants were able to ask questions between the recordings, but the experimenter would not comment on possible sentences they created. The experimenter would only interfere in the following cases: (i) if a participant distributed the stimulus words in more than one sentence (including coordinated structures); (ii) if a participant heavily modified the stimulus material; (iii) if a participant changed the predefined word order in item set 3.
2.5 Data processing and analysis
All recorded sentences were transcribed semi-orthographically by one author and double-checked by another. Only trials that maintained the intended argument structure were included in the analysis. In some cases it was difficult to decide on the basis of auditory perception alone whether the marker was present or absent. In such cases, we looked at the spectrograms for the presence of vocalic formants. If this still did not lead to a clear result, the trial was excluded. Table 3 shows the total number of trials from all participants across item sets. As can be seen, around 70% of all trials could be used for the canonical transitive dataset (1), and around 65% of the reversible-symmetrical set (2). Item set 3 (complement small clauses) and 5 (tener) have less than 50% usable trials. With these stimuli, participants frequently did not follow the intended verbalization strategy, opting for alternative sentence structures. Still, the first two datasets, which are most relevant for the research questions of this paper, contain roughly 500 and 230 observations per city, respectively.
Table 3: Number of trials per item set.
Item set | Total trials | Usable for analysis | DOM-marker uncertain | Non-target |
1. Canonical transitives | 2784 | 2003 (71.95%) |
42 (1.51%) |
739 (26.54%) |
2. Reversible-symmetrical predicates | 1392 | 917 (65.88%) |
27 (1.94%) |
448 (32.18%) |
3. Complement small clauses | 696 | 239 (34.34%) |
6 (0.86%) |
451 (64.79%) |
4. Accusative-with-infinitive | 696 | 549 (78.88%) |
2 (0.29%) |
145 (20.83%) |
5. tener | 696 | 227 (32.61%) |
12 (1.72%) |
457 (65.66%) |
6. Ditransitives | 696 | 471 (67.67%) |
6 (0.86%) |
219 (31.47%) |
As for statistical analysis, we fitted Generalized Linear Mixed Models (GLMM) for item sets 1 and 2 with the glmer-function from the package lme4 (v. 1.1.35.1, Bates et al. 2015) in R (R Core Team 2024). GLMMs permit modeling fixed as well as random effects, hence accounting for the variation introduced by different participants and the lexical material used for the items in the repeated measure design. Though the glmer-function provides p-values, these may not be as reliable, as discussed e.g. by Winter (2020). Thus, we obtained p-values for the fixed effects via likelihood ratio tests (LRT) with nested models. Conditional and marginal R2 values were calculated to estimate the amount of variance in the data accounted for by the fixed and random effects using r.squaredGLMM from the MuMIn package (v. 1.47.5, Bartoń 2009). The GLMMs for datasets 1 and 2 predict the likelihood of production of unmarked DOs and they were run with scaled sum contrasts for all two-level variables and reverse Helmert contrasts for the four-level variable region (for more details, see the analysis scripts in Appendix B). For further exploration of the data, we used conditional inference trees (CItrees), as suggested by Tagliamonte & Baayen (2012); Levshina (2020). For the remaining datasets we used a combination of random forests and CItrees. Both were fitted via the party package (v. 1.3.14, Hothorn et al. 2006; Strobl et al. 2008), using the functions ctree, cforest. This approach is useful for smaller datasets or when observations are too unevenly distributed across the factors, such that a regression model cannot be found (Tagliamonte & Baayen 2012). CItrees employ recursive partitioning and aim to find the significant splits (following the levels of the independent variables) that best characterize the data (e.g. Tagliamonte & Baayen 2012: 159). Random forests essentially create a large number of trees, each employing a subset of the data and testing the predictive accuracy of the employed predictors in the subset against the respective unused data (Levshina 2020). The forest then aggregates the results of all trees to evaluate the overall importance of each predictor via permutation-based conditional variable importance scores (via the varimp-function from party). A brief explanation of variable importance scores is provided in Appendix C.
3 Results
3.1 Canonical transitives
Two hypotheses were raised with respect to item set 1. H1 suggests that a-marking on inanimates in canonical transitive sentences should occur more often in Montevideo and Mexico City than in Lima and Madrid. H2, in turn, predicts increased marking when the referent has already been introduced in the context. As can be seen in Figure 2, there is partial evidence for both hypotheses. Regarding H1, Madrid shows the lowest percentage of inanimate marking (slightly below 5%), whereas Lima shows a pattern similar to the other American cities with percentages above 10%.
Regarding H2, the expected tendency is only found in Mexico City and no difference was found at all for Madrid and Lima, whereas the opposite occurred in Montevideo: here, participants marked more human DOs when they were not previously mentioned than when they were. It should also be noted that marking for human indefinites is always above 75%, leaving little room for variation.
Table 4: Canonical transitives model parameters and likelihood ratio test results.
Note: Model estimates and standard error (SE) are shown per predictor. P-values were obtained via LRTs. For Region (full), no estimate/SE is given, since this was only tested via LRT by excluding all three main contrasts for region.
Predictor | GLMM | LRT | ||
Estimate | SE | 𝝌2 (df) | p-value | |
(Intercept) | 0.215 | 0.211 | – | – |
Animacy | 5.379 | 0.244 | (1) = 1439 | <.001 |
Discourse status | 0.129 | 0.165 | (1) = 0.614 | .434 |
Region1 (Montevideo vs. Mexico City) | 0.002 | 0.168 | (1) = 0.0002 | .989 |
Region2 (Lima vs. Montevideo/Mexico City) | –0.121 | 0.099 | (1) = 1.490 | .222 |
Region3 (Madrid vs. LATAM) | 0.065 | 0.073 | (1) = 0.776 | .378 |
Region (full) | – | – | (3) = 2.225 | .527 |
Animacy:Discourse status | –0.119 | 0.323 | (1) = 0.137 | .712 |
Animacy:Region1 | –0.217 | 0.239 | (1) = 0.827 | .363 |
Animacy:Region2 | 0.025 | 0.143 | (1) = 0.03 | .863 |
Animacy:Region3 | 0.405 | 0.118 | (1) = 13.621 | <.001 |
Discourse status:Region1 | –0.544 | 0.217 | (1) = 6.419 | .011 |
Discourse status:Region2 | 0.055 | 0.130 | (1) = 0.181 | .671 |
Discourse status:Region3 | 0.005 | 0.103 | (1) = 0.003 | .960 |
For the statistical analysis, we fitted a binomial GLMM to the dataset with production of DOM (yes/no) as the binary dependent variable. Animacy, discourse status, and region were added as fixed effects and participants and items as random effects. The fixed effects were fitted in a three-way interaction, hence all two-way interactions were also included and the bobyqa optimizer was used6. Since the three-way interaction did not improve the model in comparison to the model without it, we used the latter as our “full model”. As shown in Table 4, in this analysis, we found a main effect of animacy and two interaction effects: one regarding the contrast of Madrid (vs. LATAM cities) and animacy (cf. Animacy:Region3 in Table 4), indicating that the difference between Madrid and the other cities varies depending on the animacy of the object, and one concerning the contrast of discourse status vs. region 1 (cf. Table 4), indicating that the difference between Montevideo and Mexico City varies depending on the discourse status of the referent. As seen in Table 4, none of the other contrasts reached significance level. As for explanatory power, the fixed effects accounted for 59% of the data, including the random effects improved the model to 74% (R2m = 0.59, R2c = 0.74).
The results from the statistical analysis suggest, therefore, that animacy plays the most important role overall and that region is interdependent with the other factors: the contrast of a-marking of inanimates in Madrid compared to the other cities in Figure 2 is statistically significant, as is the contrast of marking on discourse-given DOs in Montevideo vs. Mexico City. To further explore the data structure, we fitted a conditional inference tree with the three fixed effects from the GLMM (animacy, discourse status, and region) as predictors. The splitting algorithm found animacy to be the best predictor. However, once humans and inanimates are considered separately, region only plays a role for the inanimate DOs. Furthermore, the algorithm separates Madrid from the three American cities (see Figure 3). Together with the result from the GLMM, this suggests that the region effect stems only from the inanimate DOs. The CItree does not find a relevant split for Montevideo associated with discourse status. While some regional effect for animacy was expected by H1, the discourse status effect for Montevideo in the GLMM is surprising, especially since it points in the wrong direction: discourse-new referents should receive less marking, not more. We will return to this issue in the discussion.
3.2 Reversible-symmetrical predicates
For the reversible-symmetrical predicates, H3 stated that in low-transitivity situations (both subject and DO inanimate) marking should be almost obligatory. Therefore, no difference between human and inanimate DOs with respect to a-marking (H4) is expected. Figure 4 shows the result for this dataset. The high frequencies of marking on inanimates are in stark contrast to the inanimates in the previous dataset. However, while it could be expected that human definite DOs are always marked, the data show slight margins of variation in these conditions as well. Mexico City stands out with a conspicuously high figure of unmarked human DOs. Regarding the two hypotheses, the results are less categorical than expected. Near-obligatoriness is not reached for inanimates in low-transitivity sentences. Still, inanimates are marked with very high frequency. Also contrary to expectation, animacy seems to play a role since human DOs receive higher rates of marking throughout.
Statistical analysis was performed in parallel to dataset 1. In the binomial GLMM, production of DOM again was the binary dependent variable. We first tried a model with three fixed effects (animacy of the subject, animacy of the object, and region) and all possible interactions. The random effects again were participants and items. However, it was not possible to fit a model with the three-way interaction due to convergence problems. Therefore, we took the model with the three two-way interactions as our full model (cf. Appendix B for details). As seen in Table 5, we found significant main effects for animacy of the DO and region 2 (i.e. the contrast between Lima and the other two American cities), as well as an interaction effect between animacy of subject and region 2. Additionally, the main effect of subject animacy nearly reached significance level (cf. Table 5). The fixed effects accounted for 20% of the variation, with added random effects the percentage increased to 60% (R2m = 0.2, R2c = 0.59).
Table 5: Reversible-symmetrical predicates model parameters and likelihood ratio test results.
Predictor | GLMM | LRT | ||
Estimate | SE | 𝝌2 (df) | p-value | |
(Intercept) | –2.849 | 0.377 | – | – |
Object animacy | 2.289 | 0.306 | (1) = 79.234 | <.001 |
Subject animacy | –0.535 | 0.287 | (1) = 3.595 | .058 |
Region1 (Montevideo vs. Mexico City) | –0.137 | 0.258 | (1) = 0.281 | .596 |
Region2 (Lima vs. Montevideo/Mexico City) | –0.358 | 0.164 | (1) = 4.879 | .027 |
Region3 (Madrid vs. LATAM) | –0.110 | 0.120 | (1) = 0.860 | .354 |
Region (full) | – | – | (3) = 5.891 | .117 |
Object animacy:Subject animacy | 0.153 | 0.569 | (1) = 0.073 | .787 |
Object animacy:Region1 | 0.351 | 0.356 | (1) = 0.972 | .324 |
Object animacy:Region2 | –0.215 | 0.236 | (1) = 0.805 | .369 |
Object animacy:Region3 | 0.208 | 0.185 | (1) = 1.397 | .237 |
Subject animacy:Region1 | –0.099 | 0.332 | (1) = 0.089 | .765 |
Subject animacy:Region2 | –0.492 | 0.227 | (1) = 5.017 | .025 |
Subject animacy:Region3 | –0.097 | 0.143 | (1) = 0.458 | .498 |
We again fitted a CItree to this dataset for further exploration, using the three fixed effects of the GLMM as predictors. Figure 5 shows that animacy of the DO is again the strongest predictor. Further factors only seem to be relevant when inanimate DOs are considered separately. For the inanimates, region is selected as a predictor, and for Madrid and Lima, also animacy of subject was chosen.
In summary, humans are more frequently marked overall, against H4. Regional differences were not found to be as strong as in the first dataset, but again, they only seem to play a role for inanimate DOs. Although regional variation is limited, marking is not categorical despite the claims in the literature, an issue to which we will also return in the discussion section 4.2.
3.3 Further results
Given that the remaining datasets are more reduced and also show less variation, the results will be reported here only briefly, comparing some of the conditions to the other datasets. Despite the small samples, some of the found tendencies are rather robust. Table 6 gives an overview of the quantitative findings, i.e. the percentage of a-marking per item set, condition, and region. As can be seen, there is little variation. Due to the low number of observations (cf. Table 3), it was not possible to fit GLMMs to these datasets. Therefore, we determined the relevant factors by first growing random forests (number of trees always 1500) and then mapping the predictors to CItrees.
Table 6: Production of DOM in datasets 3–6, ‘+’ and ‘–’ signals categorical use or absence of marking.
Item set | Factors | Region | |||
Madrid | Lima | Mexico City | Montevideo | ||
3. Complement small clause | adjacent/hum. | + | + | + | 91% |
adjacent/inan. | 87% | + | 92% | 85% | |
not adj./hum. | 95% | 75% | + | 87% | |
not adj./inan. | 68% | + | 50% | 75% | |
4. Accusative-with-infinitive | caus./adv. | + | + | + | 94% |
caus./no adv. | + | + | + | + | |
percept./adv. | 98% | + | 97% | 98% | |
percept./no adv. | + | + | 97% | 91% | |
5. tener | +involved/+active | + | + | + | 94% |
+involved/–active | 88% | 46% | 94% | 86% | |
–involved/+active | + | 78% | + | 93% | |
–involved/–active | + | 91% | + | + | |
6. Ditransitives | indef./animal | – | – | – | – |
indef./inan. | – | – | – | – | |
def./animal | 11% | 13% | 10% | – | |
def./inan. | – | – | – | – |
In the complement small clause dataset (15) marking of inanimate DOs is high, as expected (generally above 70%, one outlier at 50%). In addition, three out of four varieties exhibit a tendency towards increased marking, at least descriptively, when the predicative is verb-adjacent and precedes the DO, rather than when the linear order is inverted (H5). Only Lima has a different pattern, showing almost categorical marking. A more robust follow-up study is necessary to further solidify this finding. Regarding H6, the compulsiveness of marking indefinite animates in such sentences is only found in the adjacent configuration. In the inverse order, there is a margin of variation similar to the inanimate DOs in dataset 1 (except for Mexico City). For the statistical analysis, we included animacy of the DO, relative order of DO and predicative, as well as region into our random forest. From this model, we calculated variable importance scores. As a result, the most important predictor was animacy of DO (0.009), followed by relative order of the two constituents (0.006), whereas region only received a score of 0.002. In a CItree, animacy of DO is again found to be the strongest predictor (p = .003). The order of constituents as predicted by H5 became relevant only once inanimate DOs were considered separately (p = .031). Region is not selected by the CItree.
The accusative-with-infinitive dataset (16) contains very little variation. DOs are marked in over 90% of cases across all conditions, despite being indefinite and not introduced into the discourse. Since contrasts are only marginal, this impedes any interpretation of possible effects of the manipulations. Out of all datasets, it is the one with the highest rates of a-marking. For all manipulations (adverb, verb type, and region), the variable importance score based on a random forest was 0. The CItree also did not find relevant splits in the data.
Given that tener usually blocks DOM, the overall marking of DOs with secondary predicates is remarkably high, in most conditions and varieties over 90%. However, the small size of this dataset complicates interpretation of the differences across conditions. Lima stands out in this dataset, showing low percentages of marking in several conditions. The random forest produced the following variable important scores: region (0.009), involvement of DO (0.008), and activity of DO (0.004). The corresponding CItree, however, only found one split for region (Lima vs. the other cities) at p = .006.
The ditransitive dataset contains the least variation of all. Here, even definite and animate DOs (denoting animals) are overwhelmingly unmarked (around 10% in three varieties, never in Montevideo), inanimates are never marked. This provides strong evidence in favor of H11, despite the rather low overall number of cases. Variable importance scores calculated from a random forest were zero for all factors (animacy of DO, (in)definiteness of DO, and region). A CItree still found the contrasts visible in Table 6, namely animacy of DO (p = .002) and once only animate DOs were considered, also (in)definiteness (p = .01). Thus, while the random forest advises not to consider the effects in this dataset as statistically reliable, the splits found by the CItree still hint that future studies with more complete datasets might find the descriptively observed contrasts to be also statistically relevant.
As an interim summary, the percentages and effects shown in these sections give a first tentative answer to RQ1 from Section 1.2 (quantitative range of variation per construction). RQ2 clearly receives a positive answer – variation between varieties for several of the investigated constructions has been detected and statistically verified in a highly comparable dataset, even for speaker groups that are in regular contact with the standard language. The next section further discusses the results and relates them to the hypotheses from Section 1.2.
4 Discussion
This experimental study aimed to answer a series of questions about Spanish DOM in configurations that are theoretically highly relevant, but difficult to study in corpora and for which hardly any empirical data were available. As seen in Section 3, this approach has produced quantitative results that bear on crucial theoretical questions and a highly comparable dataset for different varieties of Spanish, albeit some questions require further and more refined experiments. It goes without saying that samples of university students are not fully representative for the entire population. However, if detectable differences can be reported even for speaker groups with considerable exposure to standardization, it is to be expected that the contrasts will be even stronger in more diversified samples, or samples of speakers with less exposure to normative Spanish, and that the contrasts identified are indeed deeply rooted in the grammars of those varieties. We first comment on the results concerning the different constructions in Sections 4.1–4.3 and then derive some general conclusions on the regional differences in Section 4.4.
4.1 Canonical transitive sentences
As has been shown in Section 1.1, previous studies do either not state clearly enough which types of syntactic configuration entered into their quantitative analysis of differential marking on inanimate DOs, or they focus only on particular configurations with high rates of marking. In the first case, this means that we do not know how many structures with different patterns of marking have been conflated. In the second, we do not know how much data remains unaccounted for by these accounts or how their approach would treat this data. The canonical transitive dataset in this study, in turn, provides highly controlled results for the structure that is less likely to receive marking – basic transitive SVO sentences with a human subject and an inanimate concrete DO, in which affectedness has been controlled for. Example (19) gives possible responses by participants for the four conditions of the example item in (13). With this method, we still detect non-negligible frequencies of marking in the data, at least for the American varieties. The main known factors which favor a-marking are thus excluded or controlled.
- (19)
- Un
- indf
- pasajero
- passenger
- vio
- see-perf.3sg
- a/Ø
- dom
- un
- indf
- tripulante
- crew member
- /
- una
- indf
- isla.
- island
- ‘A passenger saw a crew member / an island.’
While the percentage of DOM with inanimates in Madrid remains slightly below 5%, the other cities show at least twice this proportion, exhibiting a statistically significant contrast that has to be accounted for by any approach. As mentioned in Section 1.1, Bautista-Maldonado & Montrul (2019) used a similar task with Mexican participants (students from Ciudad del Carmen, southeast Mexico) which produced rather different results. One of their goals was to test empirically whether the claims about the extension of DOM to inanimates in this variety could be corroborated. However, they found only 3% of a-marking on definite inanimates and a mere 1% on indefinite inanimates. They relied on pictures instead of context sentences, presenting a transitive verb and two NPs to be used with it, which is quite similar to our study. However, they used different verbs for animate and inanimate DOs. While the verbs in our study can all be used with animate and inanimate DOs in principle, out of the twelve verbs that Bautista-Maldonado & Montrul (2019: 46) used with inanimate DOs, six would never be used with human DOs in the relevant sense: leer ‘read’, comprar ‘buy’, firmar ‘sign’, beber ‘drink’, comer ‘eat’, escribir ‘write’. This might not be the only reason for the lack of marking, but it seems likely that if those verbs subcategorize for inanimate DOs, DOM has not yet extended to them.7 In itself, this might be an interesting finding. However, the authors do not compare these verbs to the other six that do not have this restricted subcategorization pattern. Rather, they report a follow-up acceptability study, in which they indeed find higher acceptability for inanimate DOs, concluding that their data confirm the diachronic trend towards extension on marking of inanimates. Comparing our dataset with Bautista-Maldonado and Montrul’s, it seems that expansion of DOM to inanimates is more advanced on verbs that do not subcategorize for inanimates. This would be a plausible pathway for the extension of the marker because DOM is already available on these, but not necessarily on the others. If correct, it also provides a new verb-based distinction to be explored in future studies: one could hypothesize that based on their subcategorization properties, verbs would interact differently with a-marking. Furthermore, the verbs in the stimuli in Bautista-Maldonado & Montrul (2019) also do not have the same degree of affectedness in the animate vs. inanimate conditions. Since affectedness is overall lower in the inanimate stimuli, this might also have contributed to low occurrences of DOM.
Returning to further results from our dataset, the expected contrast for discourse status of the DO could only be observed in Mexico City. However, we did not find an interaction effect for this combination in the GLMM. Instead, we found one for Montevideo, where already mentioned referents were marked less frequently than discourse-new ones, contrary to expectation. We could not yet determine what underlies this effect based on our dataset. We found no clear patterns regarding specific items or the degree of affectedness of the verbs. It could be a participant effect, since only 19 out of 42 ever omitted DOM in this condition, albeit no further pattern arose from the personal data (age, gender, place of living, field of studies, etc.). Thus, we leave this question for further studies. A more interesting finding from our perspective is that discourse-new indefinite human DOs showed a very high rate of DOM overall, close to those where the referents were already introduced in the context. We presume that this might be due to the chosen tense of the stimulus verb, i.e. the pretérito indefinido. The sentences thus make propositions about a completed event in the past, which presupposes the existence of the discourse participants, and even though they are not introduced explicitly, they are highly individualized conceptually. Bautista-Maldonado & Montrul (2019) also used the pretérito indefinido in their stimuli, and referents were not introduced by a context sentence either. They also found rather high frequencies of DOM on indefinite humans (71%), which is quite close to our findings for Mexico (78%). Studies that used other tenses report somewhat lower production rates. In Zeugin (2025: 189–194, 281f), production of DOM with animate and non-specific indefinites is 70%, in Benito Galdeano (2024: 324) 62%. Both use the present tense in their stimuli. In fact, the task in the latter study appears somewhat biased against marking: it was a cloze task in which only the verb was missing. Speakers could in principle introduce both the verb and the DOM marker, but since the latter belongs to the arguments, participants could be inclined to focus on the verb rather than on the object. Given this, the overall production rate suggests that although marking seems to be optional, it is still largely favored by speakers.
4.2 Reversible-symmetrical predicates
As discussed in the literature review, it has been claimed that in sentences with reversible-symmetrical predicates DOs are obligatorily marked (H3). This claim has not been corroborated in the elicitation experiment. Example (20) gives possible responses by participants for the four conditions of the example item in (14).
- (20)
- a.
- El
- def
- alumno
- pupil
- sustituyó
- substitute-perf.3sg
- {al
- def+dom
- /
- el
- def
- instructor}
- instructor
- /
- {a/Ø
- dom
- la
- def
- máquina.}
- machine
- ‘The pupil substituted the instructor / the machine.’
- b.
- La
- def
- máquina
- machine
- sustituyó
- substitute-perf.3sg
- {al
- def+dom
- /
- el
- def
- alumno}
- pupil
- ‘The machine substituted the pupil.’
- c.
- El
- def
- bolígrafo
- pen
- sustituyó
- substitute-perf.3sg
- {al
- def+dom
- /
- el
- def
- lápiz.}
- pencil
- ‘The pen substituted the pencil.’
Despite showing substantial frequencies of marking even on (definite) inanimates, the values for marking vary across regions and conditions, ranging from 63% to 76%. Near obligatoriness at 90% is only reached in Lima and only when both subject and object were inanimates. The contrast to the inanimates in the canonical transitive dataset is remarkable, suggesting a considerable impact of verb type. Although in that dataset the DOs were indefinites, it is unlikely that this difference can account for the strong effect. Note that in Bautista-Maldonado & Montrul (2019), definiteness only increased a-marking by two percent, from one to three. Here, we have a magnitude of 50% or more. As for H4, the data do not support the claim that animacy of the DO is irrelevant for this type of verbs either. In fact, it is the strongest of the factors we considered. Subject animacy and region were only relevant in one specific interaction for Lima. As the split in the CItree in Figure 5 suggests, this effect concerns only inanimate DOs, which are more frequently marked when the subject is also inanimate than when it is human. The Ctree also finds this split for Madrid, but since the GLMM found no interaction for this region we are reluctant to derive strong conclusions from this finding. In summary, we find two effects of animacy: inanimates are less frequently marked than humans overall, and at least for one region the animacy of the subject also plays a role for DOM. This suggests that the account of García García (2014) should include the notion of animacy in two ways: the overall higher marking of animate DOs also for this type of predicates, and what has been called a ‘global’ effect in Laca (2006), or ‘relative animacy’ in Tippets (2011). One possibility would be to argue that in these cases, animacy contrasts override the assignment of a-marking based on proto-roles. In his account, García García only discusses configurations of low transitivity in which both arguments are either animate or inanimate. However, our results show that animacy asymmetries between the arguments in low-transitivity configurations are possible, and that in these cases, the proto-role account does not hold. Thus, at least for some varieties, a decreasing agentivity cline between subject and DO seems to reduce the rate of a-marking. Finally, all approaches have to account for high preference but not obligatoriety of DOM with this kind of predicates.
4.3 Further findings
The smaller datasets highlight that DOM in Spanish is very construction-specific and can be influenced by other elements in the sentence beside the verb, subject, and DO. A second predicative DO boosts marking on indefinites, even if they are inanimate. Example (21) gives possible responses by participants for the four conditions of the example item in (15) from the small clause item set.
- (21)
- a.
- Consideraron
- consider-perf.3sg
- a/Ø
- dom
- un
- indf
- informático
- computer scientist
- testigo
- witness
- experto.
- expert
- b.
- Consideraron testigo experto a/Ø un informático.
- ‘They considered a computer scientist to be an expert witness.’
- c.
- Consideraron
- consider-perf.3sg
- a/Ø
- dom
- una
- indf
- fábrica
- factory
- empresa
- company
- pionera.
- pioneer
- d.
- Consideraron empresa pionera a/Ø una fábrica.
- ‘They considered a factory to be a pioneering company.’
Regarding the claim in López (2012: 23), see (H6), we only found obligatoriness of marking on human DOs when the predicative is verb-adjacent. In the inverse order frequencies drop, which is not accounted for by López (2012). It remains to be seen whether his account can be modified to accommodate these findings. The effect of order on inanimates is indeed found descriptively in three out of four varieties, and also selected by the CItree as a relevant predictor in our data. The drop is considerable in all regions except for Lima, but not as pronounced as in the corpus study of García García (2014: 103), who accounts for this contrast in terms of identifiability of the DO. If the DO is verb-adjacent, it is more easily identified as such, whereas if the predicative is adjacent, the marker is needed to make clear which of the two is the actual DO.
For further studies, the development of more suitable tasks for the accusative-with-infinitive and tener constructions is required, given that less than 50% of responses were usable for analysis. Example (22) gives possible responses by participants for the four conditions of the accusative-with-infinitive example item in (16).
- (22)
- a.
- Hicieron
- make-perf.3sg
- correr
- run-inf
- (rápidamente)
- fast
- a/Ø
- dom
- un
- indf
- mensajero.
- messenger
- ‘They made a messenger run (fast).’
- b.
- Vieron
- see-perf.3sg
- correr
- (rápidamente)
- a/Ø
- un
- mensajero.
- ‘They saw a messenger run (fast).’
The results in the accusative-with-infinitive dataset (16) at least seem to be clear enough to assess H7: as predicted by López (2012: 56), human indefinite DOs were practically categorically marked, with only the slightest margin of variation. This implies that the results are inconclusive with respect to H8 (agentivity favors marking – causative vs. perception verbs). Given that García García’s observation of this effect was based on inanimate objects, a follow-up experiment could use these rather than animate objects in order to further investigate this hypothesis.
Example (23) gives possible responses by participants for the four conditions of the tener example item in (17).
- (23)
- a.
- Tengo
- have-pres.1sg
- a/Ø
- dom
- mi
- my
- padre
- father
- muy
- very
- contento
- happy
- /
- exhausto
- exhausted
- trabajando
- working
- en
- in
- la
- def
- oficina.
- office
- b.
- Tengo
- have-pres.1sg
- a/Ø
- dom
- mi
- my
- padre
- father
- muy
- very
- contento
- happy
- /
- exhausto
- exhausted
- descansando
- resting
- en
- in
- el
- def
- sofá.
- sofa
- ‘I have my father happily/exhaustedly working in the office/resting on the sofa.’
The results showed that using kinship terms for inalienable possession did not lead to low rates of DOM (against H9). As has been pointed out in the results section, the findings are inconclusive with respect to region or the agentivity of the DO. Thus, the main conclusion for this dataset is that inalienable possession might not play a major role, contrary to the claim in Rodríguez Mondoñedo (2007). Given that tener usually blocks DOM, the rather high production rates in our dataset need to be accounted for. One straightforward possibility might be that we used secondary predicates in the stimuli, making these constructions rather similar to the small clause complement structures, for which high rates of marking are predicted by the literature and also found in our results.8
The ditransitive dataset clearly stands out as the one with almost categorical blocking of DOM. Example (24) gives possible responses by participants for the four conditions of the example item in (18).
- (24)
- a.
- Ayer,
- yesterday
- Cristina
- C.
- entregó
- give-perf.3sg
- a/Ø
- dom
- {un
- indf
- loro}
- parrot
- /
- {una
- indf
- bicicleta}
- bicycle
- a
- to
- una
- indf
- hermana.
- sister
- b.
- Ayer,
- yesterday
- Cristina
- C.
- entregó
- give-perf.3sg
- a/Ø
- dom
- {su
- her
- loro}
- parrot
- /
- {su
- her
- bicicleta}
- bicycle
- a
- to
- su
- her
- hermana.
- sister
- ‘Yesterday, Cristina gave a/her parrot to a/her sister.’
Definite (animal-denoting) animates are the only DOs that showed some degree of marking at all in three regions. This is a much lower number than the already “surprisingly” low acceptance of 40% in von Heusinger et al. (2016). As has been pointed out in Footnote 1, animal-denoting DOs show a broader range of variation, and therefore our data is not directly comparable with those in von Heusinger et al. (2016). Weissenrieder (1990: 225) reports 72% of overall marking in a novel in which domesticated animals figure prominently, and 18% in a novel in which mainly wild animals appear. Egetenmeyer (2019: 475ff) also reports a small-scale corpus search in which 33% of animal DOs were marked. The domestic animals in Weissenrieder (1990) seem to be the best point of comparison since we also used domesticated animals (pets). Weissenrieder’s dataset allows for an even closer comparison since it also reports the number for marking of definite singular DOs, which is 85%. Taking this as a baseline, the roughly 10% of marked DOs indeed suggest a strong blocking effect also for animal-denoting DOs, showing that the blocking effect seems to affect all animate DOs.
Summing up, our first research question about the quantitative range of variation within each construction has been answered in detail in the previous sections. In the following section, we therefore turn to our RQ2 and discuss the observed regional differences.
4.4 Regional differences
In comparison to the construction-specific effects and the grammatical manipulations, the effects of the regional differences might appear to be rather small. Still, the contrasts are statistically significant and either separate the American varieties from the Peninsular (dataset 1), or Lima (and Madrid?) from Montevideo and Mexico City (dataset 2). Remember that, in order to obtain a highly comparable dataset, our participants were all university students with high exposure to normative grammar. We expect, therefore, that if contrasts are found in this sample, they will become even stronger once further parts of the populations are included. Our findings from the canonical transitive dataset seem to confirm the observations in the sociolinguistic literature that certain American varieties of Spanish show increasing a-marking on inanimates. Our findings match the numbers reported by Tippets (2011) quite well, while additionally being much more comparable across varieties and better controlled. Therefore, our findings clearly support the idea that Peninsular Spanish is more conservative with respect to the extension of a-marking to inanimates in canonical SVO sentences and verbs that subcategorize both for animate and inanimate DOs. We also provide new data for Lima and show that in this respect, it follows the trend of the other American regions. In the reversible-symmetrical dataset we only found one interaction for animacy of DO and region, namely Lima (and perhaps Madrid). There, inanimate DOs were more frequently marked if the subject was inanimate also, rather than human, an effect of relative animacy (cf. Tippets 2011). This did not occur in Mexico City and Montevideo. In Tippets (2011), relative animacy is one of the two topmost factors for all investigated varieties. Our data suggest, however, that this general statement is too broad and that relative animacy does not have the same effect across all DOM-sensitive constructions. Since this is an unexpected finding, further studies are necessary for explaining why exactly Lima (and possibly Madrid) behave differently in this case. Remember that in studies such as Lapesa (1981) and Cerrón Palomino (2003), the Lima variety is considered to be closer to Peninsular Spanish than other American varieties. This observation might be a starting point for future research, but it should also be pointed out that the claims in those studies are not based on DOM data.
4.5 Broader implications
As pointed out towards the end of Section 1.2, our data have potential implications for the so-called Ambiguity/Discrimination Thesis. The high production rates in the reversible-symmetrical dataset suggest that the need for distinctness between highly similar argument NPs strongly influences DOM production. Production rates are also high in the complement small clause dataset, where two post-verbal NPs must be distinguished, and marking furthermore becomes categorical when the DO is not adjacent to the verb. This is strong evidence for the Ambiguity Thesis. However, the data also show that disambiguation/discrimination is not the only determining factor since animacy of the DO also has an effect on DOM production in these constructions, which seems to line up with the idea of a “weak force” in Seržant (2019). In our opinion, this observation and the other construction-specific findings of our study further strengthen the view reiterated in the literature that DOM is determined by multiple factors, for which an over-arching account is still missing. While our experiments control for the possible effects of transitivity, they do not test the Transitivity Thesis directly. However, since we systematically controlled for affectedness in the canonical transitive dataset, we actually have production data for verbs of different degrees of affectedness (see design of item set 1 in the second paragraph of Section 2.2). Although they are not part of the factorial design of the study, it is worth noting that we see a positive correlation tendency between DOM and said degrees of affectedness, suggesting a potential effect of transitivity, at least for human DOs (the overall percentage of DOM in the verb class with lowest affectedness is 82.6%, and in the one with highest affectedness 91.5%).
Another important – and perhaps somewhat more controversial – implication of our study is that it points towards the importance of inherent variability, suggesting that accounts based on categorical rules or restrictions may fail to capture a key property of DOM, at least in Spanish. Within each highly controlled construction-specific item set, our data show variability within and across participants and items which more often than not accounts for more of the variation than the actual experimental manipulations, as indicated by the differences between the marginal and conditional R2 values in the results. Although, generally speaking, attempts have been made to bridge what has been called the “variation gap” in the theoretical literature (Adger 2006; Parrott 2007; Nevins & Parrott 2010), we are not aware of any formal account of Spanish DOM that addresses this kind of variation.
5 Conclusion
This study has presented a highly comparable dataset for oral production of DOM across four varieties of Spanish which was obtained with an experimental elicitation task. The dataset is unique because it focused on the production of configurations that have a high relevance in the theoretical literature. These constructions are, however, difficult to study with corpora of spontaneous spoken language. Therefore, the results presented in this paper allow for a more reliable empirical assessment of several claims in the literature regarding the variation of DOM in Spanish inanimate (and animate) DOs. As a general result we would like to highlight that DOM may be influenced by predicate properties as strongly or perhaps even stronger than by argument properties in certain syntactic configurations. Thus, our data partly confirm the results from the corpus study of García García (2014) with an experimental approach, but also observing more nuanced interactions of verbal and nominal properties. The primary construction-specific findings are that
in canonical transitive sentences, inanimate DOs are marked with higher frequency in the American varieties, confirming and solidifying observations from the sociolinguistic literature;
there are little to no regional differences with respect to the other constructions;
inanimate DOs in reversible-symmetrical predicates are preferably marked, though not obligatorily, and animacy asymmetries between the arguments can lead to less a-marking in some varieties;
in complement small clauses, indefinite and human DOs are marked categorically if the predicative is adjacent to the verb, as claimed in the literature;
in accusative-with-infinitive constructions, indefinite and human DOs are categorically marked, even if not introduced into the discourse;
if tener is followed by a secondary predicate, the DO is preferably marked and inalienable possession does not reduce marking;
ditransitive structures have a strong blocking effect on DOM, also for animal-denoting DOs.
Besides these main results, there are many new findings and observations with theoretical relevance, as discussed above. Overall, the data also confirm the inherent variability of Spanish DOM in many configurations which, despite fine-grained and careful measurements, could not be entirely reduced to clear-cut grammatical properties or external variables. As such, this remains as a major challenge for any theoretical approach. Given the success of the developed experimental method, the dataset could easily be extended to other varieties by replicating the experiment in other regions. Similarly, using this method, other relevant configurations could be empirically tested, leading to a more complete picture of the range of variation and hopefully also to a more comprehensive understanding of differential object marking in Spanish.
Abbreviations
All examples are glossed according to the Leipzig glossing rules. Abbreviations occurring in the text: adv. = adverb, anim. = animate, caus. = causative, CItree = conditional inference tree, def. = definite, do = direct object, dom = Differential Object Marking, glmm = Generalized Linear Mixed Model, hum. = human, inan. = inanimate, indef. = indefinite, lrt = likelihood ratio test, percept. = perception, sd = standard deviation
Data availability
All data related to this study can be found at: https://doi.org/10.17605/OSF.IO/ASXCW. This includes the experimental stimuli, all data sets and R code used in the analyses, and additional information on conditional inference trees and random forests.
Ethics and consent
The experimental work has been performed in accordance with the Declaration of Helsinki. The studies have been approved by the Ethics Committee of the Faculty of Philosophy of the University of Zurich (reference number 18.10.3).
Funding information
This research was funded in part by the Swiss National Science Foundation (SNF) 184923.
This research was funded in part by the Austrian Science Fund (FWF) 10.55776/F1003. For open access purposes, the author has applied a CC BY public copyright license to any author accepted manuscript version arising from this submission.
Acknowledgements
The authors would like to thank Larissa Binder, Piero Costa, Patricia de Ramos, Joaquín Ginés, Chantal Melis, Idanely Mora Peralta, Ismail Prada, Luis Fernando Rubio, and Iliana Quintanar Zárate for their help at different steps of the experimental work. We would also like to thank three anonymous reviewers of Glossa who helped to improve the paper.
Competing interests
The authors have no competing interests to declare.
Notes
- Animate entities are typically humans or animals. As Egetenmeyer (2019: 55) points out, most of the literature on Spanish DOM does not explicitly distinguish between different types of animate DOs, although DOs denoting animals show a broader range of variation than those denoting humans (Egetenmeyer 2019: 475–480). In this paper, we follow common practice and use the broader term animacy despite mostly discussing human-denoting DOs. However, we make the distinction when discussing the stimuli since these include both human and animal-denoting DOs. We also use the less precise but considerably shorter and reader-friendly expression animate DO, or simply animate instead of DO (or noun) denoting an animate entity. [^]
- Notably, there is not only expansion of marking on inanimates but apparently also retraction of marking on animates: Alfaraz (2011) reports a decrease in Cuban Spanish in the second half of the 20th century, and Caro Reina et al. (2021) report data from the 19th and 20th century that seem to confirm this observation. [^]
- Camacho (2023) claims that marking of both animate and inanimate DOs can be explained by his implementation of a syntactic labeling algorithm. Nonetheless, he still has to stipulate that this operation interacts with different lexical features of the verb, depending on the animacy of the DO, and that this difference only occurs with certain verbs, in interaction with the “idiocratic semantics” of these (Camacho 2023: 203). [^]
- García García (2014: 50) gives examples showing that this is even the case with inanimate nouns. [^]
- “Weak” is used in opposition to strong universals because these may be overridden by or interact with other constraints in a given language. [^]
- First, the default optimizer was used, yet convergence issues arose. Before removing theoretically motivated elements from the model, we tested if a change in optimizer helped with convergence. The allFit-function from lmer was used, which refits the model with different optimizers, reporting estimates and possible problems for each. This was inspected and a change in optimizer was deemed acceptable if no convergence problem was reported for it and the estimates and log-likelihoods were near-identical across optimizers. [^]
- Another reason might be that their participants were students from Humanities and Education, thus perhaps more aware of the grammatical nuances in the stimuli, whereas we explicitly excluded participants from these areas. [^]
- We would like to thank an anonymous reviewer for this observation. [^]
References
Adger, David. 2006. Combinatorial variability. Journal of Linguistics 42. 503–530. DOI: http://doi.org/10.1017/S002222670600418X
Alfaraz, Gabriela G. 2011. Accusative object marking: A change in progress in Cuban Spanish? Spanish in Context 8(2). 213–234. DOI: http://doi.org/10.1075/sic.8.2.02alf
Balasch, Sonia. 2011a. Estudio sociolingüístico de la marca diferencial de objeto directo (DOM) en dos variedades del español contemporáneo. Albuquerque, NM: University of New Mexico dissertation.
Balasch, Sonia. 2011b. Factors determining Spanish differential object marking within its domain of variation. In Michnowicz, Jim & Dodsworth, Robin (eds.), Selected proceedings of the 5th workshop on Spanish sociolinguistics, 113–124. Somerville, MA: Cascadilla Proceedings Project.
Barraza, Georgina. 2003. Evolución del objeto directo inanimado en español. Mexico City: UNAM MA thesis.
Barrenechea, Ana María & Orecchi, Teresa. 1977. La duplicación de objetos directos e indirectos en el español hablado en Buenos Aires. In Lope Blanch, Juan M. (ed.), Estudios sobre el español hablado en las principales ciudades de América, 351–381. México: Universidad Autónoma de México.
Bartoń, Kamil. 2009. Mumin: Multi-model inference. https://CRAN.R-project.org/package=MuMIn.
Bates, Douglas & Mächler, Martin & Bolker, Ben & Walker, Steve. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48. DOI: http://doi.org/10.18637/jss.v067.i01
Bautista-Maldonado, Salvador & Montrul, Silvina. 2019. An experimental investigation of differential object marking in Mexican Spanish. Spanish in Context 16(1). 22–50. DOI: http://doi.org/10.1075/sic.00025.bau
Benito Galdeano, Rut. 2024. L’estabilitat referencial i la dominància lingüística en el marcatge diferencial d’objecte en el bilingüisme català-espanyol. Barcelona: Universitat Pompeu Fabra dissertation.
Bleam, Tonia. 2005. The role of semantic type in differential object marking. Belgian Journal of Linguistics 19(1). 3–27. DOI: http://doi.org/10.1075/bjl.19.03ble
Bossong, Georg. 1985. Empirische Universalienforschung: Differentielle Objektmarkierung in den neuiranischen Sprachen. Tübingen: Narr.
Bossong, Georg. 1991. Differential object marking in Romance and beyond. In Wanner, Dieter & Kibbee, Douglas A. (eds.), New analyses in Romance linguistics: Selected papers from the XVIII linguistic symposium on Romance languages, Urbana-Champaign, April 7–9, 1988, 143–170. Amsterdam: Benjamins. DOI: http://doi.org/10.1075/cilt.69.14bos
Buyse, Kris. 1998. The Spanish prepositional accusative: What grammars say versus what corpora tell us about it. Leuvense Bijdragen 87. 371–385.
Camacho, Rafael. 2023. Differential object marking in inanimate objects in Spanish. Borealis: An International Journal of Hispanic Linguistics 12(1). 165–206. DOI: http://doi.org/10.7557/1.12.1.6951
Caro Reina, Javier & García García, Marco & von Heusinger, Klaus. 2021. Differential object marking in Cuban Spanish. In Kabatek, Johannes & Obrist, Philipp & Wall, Albert (eds.), Differential object marking in Romance: The third wave, 339–368. Berlin, Boston: De Gruyter. DOI: http://doi.org/10.1515/9783110716207-012
Cerrón Palomino, Rodolfo. 2003. Castellano andino: Aspectos sociolingüísticos, pedagógicos y gramaticales. Lima: Fondo Editorial Pontificia Universidad Católica del Perú y GTZ Cooperación Técnica Alemana.
Company Company, Concepción. 2002. El avance diacrónico de la marcación prepositiva en objetos directos inanimados. In Bernabé, Alberto & Berenguer, José Antonio & Cantarero, Margarita & De Torres, José Carlos (eds.), Actas del II congreso de la sociedad española de lingüística. 146–154. Madrid: CSIC.
Dowty, David. 1991. Thematic proto-roles and argument selection. Language 67(3). 547–619. DOI: http://doi.org/10.1353/lan.1991.0021
Draxler, Christoph & Jänsch, Klaus. 2004. Speechrecorder: A universal platform independent multi-channel audio recording software. In Proceedings of the IV international conference on language resources and evaluation, 559–562. Lisbon: European Language Resources Association.
Egetenmeyer, Jakob. 2019. Der Verbalanschluss im Spanischen: Kognitiv-syntaktische Analyse nominaler und satzwertiger Akkusativobjekte. Berlin, Boston: De Gruyter. DOI: http://doi.org/10.1515/9783110595826
Fábregas, Antonio. 2013. Differential object marking in Spanish: State of the art. Borealis 2. 1–80. DOI: http://doi.org/10.7557/1.2.2.2603
García García, Marco. 2014. Differentielle Objektmarkierung bei unbelebten Objekten im Spanischen. Berlin, Boston: De Gruyter. DOI: http://doi.org/10.1515/9783110290974
García García, Marco. 2015. Entwicklung und historischer Stillstand: Zur DOM im Spanischen. In Bernsen, Michael & Eggert, Elmar & Schrott, Angela (eds.), Historische Sprachwissenschaft als philologische Kulturwissenschaft: Festschrift für Franz Lebsanft zum 60. Geburtstag, 317–333. Bonn: Bonn University Press/V&R Unipress. DOI: http://doi.org/10.14220/9783737004473.317
Gerards, David. 2023. Differential object marking in the Romance languages. In Oxford Research Encyclopedia of Linguistics. DOI: http://doi.org/10.1093/acrefore/9780199384655.013.648
Haspelmath, Martin. 2021. Role-reference associations and the explanation of argument coding splits. Linguistics 59(1). 123–174. DOI: http://doi.org/10.1515/ling-2020-0252
Hothorn, Torsten & Hornik, Kurt & Zeileis, Achim. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674. DOI: http://doi.org/10.1198/106186006X133933
Kabatek, Johannes. 2016. Wohin strebt die differentielle Objektmarkierung im Spanischen? Romanistisches Jahrbuch 67(1). 211–239. DOI: http://doi.org/10.1515/roja-2016-0015
Kabatek, Johannes & Obrist, Philipp & Wall, Albert (eds.). 2021. Differential object marking in Romance: The third wave. Berlin, Boston: De Gruyter. DOI: http://doi.org/10.1515/9783110716207
Laca, Brenda. 2006. El objeto directo: La marcación preposicional. In Company Company, Concepción (ed.), Sintaxis histórica de la lengua española, vol. 1, 423–478. México City: Fondo de Cultura Económica (México), Universidad Nacional Autónoma de México (UNAM).
Lapesa, Rafael. 1981. Historia de la lengua española. 9th ed. Madrid: Editorial Gredos.
Leonetti, Manuel. 2004. Specificity and differential object marking in Spanish. Catalan Journal of Linguistics 3. 75–114. DOI: http://doi.org/10.5565/rev/catjl.106
Levshina, Natalia. 2020. Conditional inference trees and random forests. In Paquot, Magali & Gries, Stefan Th. (eds.), A practical handbook of corpus linguistics, 611–643. Cham: Springer. DOI: http://doi.org/10.1007/978-3-030-46216-1_25
López, Luis. 2012. Indefinite objects: Scrambling, choice functions and differential marking. Cambridge, MA: MIT Press. DOI: http://doi.org/10.7551/mitpress/9165.001.0001
Mayer, Elisabeth & Sánchez, Liliana. 2021. Emerging DOM patterns in clitic doubling and dislocated structures in Peruvian-Spanish contact varieties. In Kabatek, Johannes & Obrist, Philipp & Wall, Albert (eds.), Differential object marking in Romance: The third wave, 103–138. Berlin, Boston: De Gruyter. DOI: http://doi.org/10.1515/9783110716207-005
Nevins, Andrew & Parrott, Jeffrey. 2010. Variable rules meet impoverishment theory: Patterns of agreement leveling in English varieties. Lingua 120. 1135–1159. DOI: http://doi.org/10.1016/j.lingua.2008.05.008
Parrott, Jeffrey K. 2007. Distributed morphological mechanisms of Labovian variation in morphosyntax. Washington, DC: Georgetown University dissertation.
R Core Team. 2024. R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria. https://www.R-project.org/.
Richards, Norvin. 2010. Uttering trees. Cambridge, MA: MIT Press. DOI: http://doi.org/10.7551/mitpress/9780262013765.001.0001
Rodríguez Mondoñedo, Miguel. 2007. The syntax of objects: Agree and differential object marking. Storrs, CT: University of Connecticut dissertation.
Sanz, Blanca. 2011. La ausencia de marcación de caso en los objetos directos con referente humano en posición inicial. Revista Signos 44(76). 183–197. DOI: http://doi.org/10.4067/S0718-09342011000200006
Seržant, Ilja A. 2019. Weak universal forces: The discriminatory function of case in differential object marking systems. In Schmidtke-Bode, Karsten & Levshina, Natalia & Michaelis, Susanne Maria & Seržant, Ilja A. (eds.), Explanation in typology: Diachronic sources, functional motivations and the nature of the evidence, 149–178. Berlin: Language Science Press. DOI: http://doi.org/10.5281/zenodo.2583816
Strobl, Carolin & Boulesteix, Anne-Laure & Kneib, Thomas & Augustin, Thomas & Zeileis, Achim. 2008. Conditional variable importance for random forests. BMC Bioinformatics 9(307). DOI: http://doi.org/10.1186/1471-2105-9-307
Tagliamonte, Sali A. & Baayen, R. Harald. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2). 135–178. DOI: http://doi.org/10.1017/S0954394512000129
Tippets, Ian. 2011. Differential object marking: Quantitative evidence for underlying hierarchical constraints across Spanish dialects. In Ortiz-López, Luis A. (ed.), Selected proceedings of the 13th Hispanic linguistics symposium, 107–117. Somerville, MA: Cascadilla Proceedings Project.
von Heusinger, Klaus. 2018. The diachronic development of differential object marking in Spanish ditransitive constructions. In Seržant, Ilja A. & Witzlack-Makarevich, Alena (eds.), Diachrony of differential argument marking, 315–344. Berlin: Language Science Press.
von Heusinger, Klaus & Kaiser, Georg A. 2007. Differential object marking and the lexical semantics of verbs in Spanish. In Kaiser, Georg A. & Leonetti, Manuel (eds.), Proceedings of the workshop: Definiteness, specificity and animacy in Ibero-Romance languages (Arbeitspapier 122), 85–110. Konstanz: Universität Konstanz.
von Heusinger, Klaus & Kaiser, Georg A. 2011. Affectedness and differential object marking in Spanish. Morphology 21. 593–617. DOI: http://doi.org/10.1007/s11525-010-9177-y
von Heusinger, Klaus & Romero Heredero, Diego & Kaiser, Georg A. 2016. Differential object marking in Spanish ditransitive constructions: An empirical approach. In Fischer, Susann & Navarro, Mario (eds.), Proceedings of the VII Nereus international workshop: Clitic doubling and other issues of the syntax/semantic interface in Romance dps (Arbeitspapier 128), 43–64. Konstanz: Universität Konstanz.
Wall, Albert & Obrist, Philipp. 2021. Multilingualism effects in an elicitation study on differential object marking in Cusco (Peru) and Misiones (Argentina). In Kabatek, Johannes & Obrist, Philipp & Wall, Albert (eds.), Differential object marking in Romance: The third wave, 139–172. Berlin, Boston: De Gruyter. DOI: http://doi.org/10.1515/9783110716207-006
Weissenrieder, Maureen. 1990. Variable uses of the direct-object marker a. Hispania 73(1). 223–231. DOI: http://doi.org/10.2307/343010
Weissenrieder, Maureen. 1991. A functional approach to the accusative ‘a’. Hispania 74(1). 146–156. DOI: http://doi.org/10.2307/344574
Winter, Bodo. 2020. Statistics for linguists: An introduction using R. New York: Routledge. DOI: http://doi.org/10.4324/9781315165547
Witzlack-Makarevich, Alena & Seržant, Ilja. 2018. Differential argument marking: Patterns of variation. In Seržant, Ilja & Witzlack-Makarevich, Alena (eds.), Diachrony of differential argument marking, 1–40. Berlin: Language Science Press.
Zdrojewski, Pablo. 2020. La gramaticalización de objetos inanimados en dos variedades del español de la Romania Nova. Cuadernos de la ALFAL 12(2). 448–466.
Zeugin, Senta. 2025. Differential object marking and bidirectional crosslinguistic influence: An experimental study on Romanian-Spanish bilingualism. Zürich: Universität Zürich dissertation.