1 Introduction

A key requirement for a text to be coherent is referential continuity. A major means to meet this requirement is by referring to the same set of referents across stretches of text. Since languages provide a variety of anaphoric expressions for this purpose, establishing referential continuity requires from speakers1 two types of choices – which referents to mention next and which expressions to use for referring to the selected referents. For communication to be successful, hearers, in turn, have to keep track of the referents referred to by the speaker. The complexity of this task depends on, among other factors, the particular anaphoric expressions that were chosen by the speaker. Pronouns, for example, provide little descriptive content and are therefore often ambiguous as to their intended referents. Determining the referent that the speaker intended for a pronoun can therefore be a challenging task. Ambiguity can be avoided by the use of definite descriptions, but at the cost of greater syntactic and lexical complexity. Many languages – although not English – provide an alternative to definite descriptions: Anaphoric d(emonstrative)-pronouns, which are structurally as simple as the more common p(ersonal)-pronouns. An example from German is given in (1).

    1. (1)
    1. a.
    1. Peteri
    2. P.
    1. hat
    2. has
    1. auf
    2. at
    1. der
    2. the
    1. Konferenz
    2. conference
    1. einen
    2. a
    1. Kollegenj
    2. colleague
    1. getroffen.
    2. met.
    1. ‘Peter met a colleague at the conference.’
    1.  
    1. b.
    1. Eri/(j)/Derj/(i)
    2. he/he-DEM
    1. hat
    2. has
    1. sich
    2. himself
    1. sehr
    2. much
    1. darüber
    2. about it
    1. gefreut.
    2. pleased
    1. ‘He was very pleased about it.’

Experimental results (see summaries in Ellert 2013 and Bader & Portele 2019) as well as corpus data (Bosch et al. 2003; Portele & Bader 2016) show that under a wide range of circumstances, the preferred antecedent of a p-pronoun is the referent of the subject NP whereas a d-pronoun preferentially refers to the referent of the object NP. Thus, p-pronouns and d-pronouns often provide means to reduce the ambiguity problem that arises with a single pronoun only. In addition to the antecedent’s grammatical function, its linear position and its discourse status affect the resolution of ambiguous pronouns, as shown by a large body of research (e.g., Gernsbacher & Hargreaves 1988; Crawley & Stevenson 1990; Crawley et al. 1990; Comrie 1997). In line with the previous literature (Kehler et al. 2008; Fukumura & van Gompel 2010), we use the term “structural” bias as an umbrella term encompassing these factors. The prototypical antecedent of a p-pronoun is a subject, occurs in sentence-initial position, and is the sentence topic. The converse holds for d-pronouns, for which the prototypical antecedent is an object, occupies a sentence-final position, and is not the topic. Thus, the same structural factors are involved in the interpretation of p- and d-pronouns, but their weight seems to be pronoun specific, as shown for Finnish by Kaiser & Trueswell (2008). For p-pronouns, Kaiser & Trueswell found the antecedent’s syntactic function to be most important. For d-pronouns, in contrast, the antecedent’s linear position and/or topic status were decisive. Based on these findings, Kaiser & Trueswell have argued in favor of a form-specific approach, according to which different types of anaphoric expressions can be associated with different constraints on interpretation. Their results have been replicated for German by Bader & Portele (2019).

Pronoun resolution is subject not only to structural biases but to semantic biases as well, where semantic biases encompass all effects due to linguistic meaning and world knowledge (such as semantic role and event-structure; e.g., Schumacher et al. 2015 and Stevenson et al. 1994). A prime example of a semantic bias affecting pronoun resolution is provided by so-called implicit causality verbs, which evoke strong expectations as to which referent causes the event described by the verb (e.g., Garvey & Caramazza 1974; Au 1986; Rudolph & Försterling 1997; Bott & Solstad 2014).

Psych verbs, for example, are typically associated with the expectation that the stimulus argument is responsible for the psychological state denoted by the verb, independent of whether the stimulus is the subject or object of the verb. When participants are asked to complete sentence fragments starting with a pronoun as in (2), the pronoun’s preferred antecedent is John in both (2a) and (2b) (e.g., Stevenson et al. 1994).

    1. (2)
    1. a.
    1. Peter admired John because he _______________________
    1.  
    1. b.
    1. John impressed Peter because he _______________________

The preferred antecedent NP John denotes the stimulus argument in both (2a) and (2b), but it is the object in (2a) and the subject in (2b). Thus, pronoun resolution in this case varies with the meaning of the individual verbs and not with the structural properties of the potential antecedents, such as their syntactic function or linear position.

In sum, research on pronoun resolution has established that pronoun interpretation is subject to semantic as well as structural biases, and that neither can be reduced to the other. This has led to a shift from theories that capitalize either on semantic factors (e.g., Hobbs 1979) or on structural factors (e.g., Smyth 1994; Crawley et al. 1990) to theories that integrate the different factors. Most of these theories are built on the insight that pronoun interpretation and pronoun production are systematically related. For example, when asked to complete sentence fragments as in (2) but without the pronoun (a so-called no-pronoun prompt), the stimulus argument John is mentioned next most of the time both when it occurs as object as in (2a) and when it occurs as subject as in (2b) (e.g., Stevenson et al. 1994; Fukumura & van Gompel 2010; Bittner 2019 for German). This finds a parallel in the interpretive preferences discussed above insofar as a pronoun following the same contexts is preferentially interpreted as coreferent with the stimulus argument independently of whether the stimulus is realized as subject or object. Similar considerations hold for structural factors. For example, without strong semantic biases, personal pronouns are preferentially interpreted as coreferent with subjects when providing pronoun prompts (e.g., Kaiser & Trueswell 2008). Conversely, with a no-pronoun prompt the preceding subject is taken up in the continuation by using a personal pronoun in the majority of cases (e.g., Stevenson et al. 1994).

Several theories of pronoun resolution have been advanced that account for both semantic and structural biases on pronoun resolution by linking pronoun interpretation to pronoun production (e.g., Stevenson et al. 1994; Arnold 2001; Kehler et al. 2008). Here, we consider only the Bayesian theory of pronoun resolution developed by Kehler and colleagues (Kehler et al. 2008; Kehler & Rohde 2013; Rohde & Kehler 2014) because this theory can be applied to d-pronouns in addition to p-pronouns without any further assumptions.

Building on the work of Stevenson et al. (1994), the Bayesian theory proposes that interpretation preferences are a function of top-down next-mention biases and bottom-up pronominalization biases. These biases are combined by Bayes rule in the way shown in (3).

    1. (3)
    1. P(referentpronoun)=P(pronounreferent)P(referent)referentreferentsP(pronounreferent)P(referent)

On encountering a pronoun, the reader needs to determine which referent out of a set of available referents is the most likely referent intended by the writer. For each referent, the term P(referent|pronoun) on the left side of (3) gives the probability that referent is the referent intended for pronoun. In order to arrive at the most likely interpretation of the pronoun, the reader has to determine for which referent this probability is highest. According to Bayes Rule, P(referent|pronoun) can be computed from the production probabilities on the right side of (3). P(referent) is the probability that referent is mentioned next and thus quantifies next-mention biases. P(pronoun|referent) is the probability that a pronoun is used for referring to referent and thus quantifies pronominalization biases. The probabilities on the right side of Bayes Rule can be estimated in experiments with no-pronoun prompts, where participants can freely choose which referent to mention, allowing an estimate of P(referent), and which expression to use for referring to the selected referent, allowing an estimate of P(pronoun|referent).

The strong version of the Bayesian theory of pronoun resolution makes the additional claim of a division of labor between semantic and structural biases (Kehler & Rohde 2013; Rohde 2019). The next-mention bias P(referent) is claimed to reflect semantic biases imposed by world knowledge, verb semantics, and coherence relations. Structural biases, in contrast, are found to govern the decision of using a pronoun or a different expression for each referent and thus establish the probability p(pronoun|referent). More specifically, Rohde & Kehler (2014) have advanced the Topichood Hypothesis which claims that the decision to use a pronoun or not is a function of topichood. This hypothesis will be tested in Experiment 1.

The interaction of semantic and syntactic biases has been thoroughly investigated for p-pronouns, whereas research on d-pronouns focused on structural biases. Studies investigating how semantic biases affect d-pronouns are rare, as reviewed below. The current study therefore had two major aims. The empirical aim was to provide new data on how semantic and structural factors affect the interpretation of d-pronouns. The theoretical aim was to test whether the Bayesian theory of pronoun resolution captures the interpretation of d-pronouns as well as the interpretation of p-pronouns.

To sum up, the experiments presented in this paper explore the role of semantic bias in pronoun production and pronoun resolution, taking both p-pronouns and d-pronouns into account. Four experiments presented participants with short contexts that varied the coherence relation connecting continuations and contexts. In Experiment 1, contexts appear with a no-pronoun prompt in order to test the effect of coherence on the next mention bias and on pronoun production. Because d-pronouns were produced rarely in Experiment 1, Experiment 2 focuses on the choice between p- and d-pronouns using a constrained continuation task. In Experiment 3, contexts appear with a pronoun prompt to study how coherence affects the interpretation of p- and d-pronouns. Experiment 4 presents acceptability data for the continuations that were provided by participants in Experiment 3.

2 Semantic bias in the interpretation of d-pronouns

Järvikivi et al.’s (2017) visual-world study of Finnish p- and d-pronouns is the first study comparing effects of semantic bias on p- and d-pronouns.2 Järvikivi et al. presented context sentences containing an implicit causality verb as in (4).

    1. (4)
    1. a.
    1. Vladimir Putin pelotti/pelkäsi George Bushia Valkoisessa talossa.
    2. ‘Vladimir Putin frightened/feared George Bush at the White House.’
    1.  
    1. b.
    1. Koska hän/tämä oli kuluneen viikon aikana antanut useaan otteeseen ymmärtää, ettei maiden Irakin suhteissa olisi näkemyseroja.
    2. ‘Because he/DEM had during the past week given many times the impression that there would be no differences of opinion concerning the countries’ relations with Iraq.’

Järvikivi et al. (2017) manipulated the semantic bias of the context sentences by including either a subject-experiencer verb (pelkäsi ‘feared’ in (4a)) or an object-experiencer verb (pelotti ‘frightened’ in (4a)). The experiment yielded two major results. First, implicit causality affected the p-pronoun hän and the d-pronoun tämä in similar ways. For both pronouns, there were more looks to the stimulus than to the experiencer, in accordance with prior findings for implicit causality verbs. Second, order-of-mention had opposite effects on the two pronouns. Whereas a first-mention preference was observed for the p-pronoun, the d-pronoun preferred the second-mentioned referent as its antecedent. Järvikivi et al.’s results suggest that semantic and structural biases play different roles in pronoun resolution: Whereas the former pull p- and d-pronouns in the same direction, the latter have complementary effects on p- and d-pronouns. This is naturally captured by the Bayesian theory because semantic bias affects the pronoun-independent next-mention bias whereas structural biases (in this case linear position) govern the choice of a specific pronominal form.

The on-line visual world data presented in Järvikivi et al. (2017) do not show how often each referent was chosen in the interpretation of each pronoun. This issue was addressed by Portele & Bader (2020) investigating experimental stimuli as in (5)/(6).

    1. (5)
    1. Context:
    1.  
    1. a.
    1. Vor
    2. before
    1. kurzem
    2. shortly
    1. wurde
    2. was
    1. der
    2. the
    1. Neubau
    2. reconstruction
    1. des
    2. of-the
    1. Museums
    2. museum
    1. fertiggestellt.
    2. completed
    1. ‘Recently, the reconstruction of the museum was completed.’
    1.  
    1. b.
    1. Bei
    2. at
    1. der
    2. the
    1. Eröffnungsfeier
    2. ceremony
    1. hielt
    2. held
    1. ein
    2. a
    1. bekannter
    2. well-known
    1. Kunstprofessor
    2. art-professor
    1. eine
    2. a
    1. Rede.
    2. speech
    1. At the opening ceremony, a well-known professor of art delivered a speech.
    1.  
    1. c.
    1. Der
    2. the
    1. Professor
    2. professor
    1. war
    2. was
    1. sehr
    2. very
    1. beeindruckt
    2. impressed
    1. von
    2. by
    1. dem
    2. the
    1. Museumsleiter
    2. museum director
    1. ‘The professor was very impressed by the museum’s director.’
    1. (6)
    1. Prompt:
    1.  
    1. a.
    1. Er/Der hat nämlich … ‘This was for the reason that he/he-DEM …’
    1.  
    1. b.
    1. Er/Der hat deshalb … ‘This had the consequence that he/he-DEM …’

In order to identify the topic of the sentence preceding the pronoun prompt in an unambiguous way, all contexts consisted of three sentences, following Kaiser & Trueswell (2008). After an initial scene-setting sentence, the second context sentence introduced a first referent. The third and final context sentence always contained a psych adjective with an experiencer argument realized as the subject NP and a stimulus argument as a prepositional object. The experiencer subject referred back to the referent introduced in the second context sentence. The stimulus object of the psych predicate was a second character newly introduced in the third context sentence. According to all major definitions of sentence topics (Reinhart 1981; Grosz et al. 1995; Lambrecht 1996), the referent introduced in the second sentence (5b) and taken up by a definite NP in the final sentence (5c) is the sentence topic of the final context sentence. The boxes above the subject and object NP of the third context sentence show how the two referents fare with regard to three major structural biases – syntactic function, order of mention, and topichood.

In one experiment, contexts were followed by a pronoun prompt containing either the p-pronoun er (‘he’) or the d-pronoun der (‘he-DEM’), followed by either the causal discourse marker nämlich (‘cause’) or the consequential discourse marker deshalb (‘therefore’).3 This experiment yielded two major results. As expected from the literature, the preferred antecedent of the p-pronoun depended on the specific coherence relation. For both coherence relations, about 80% of all continuations had the p-pronoun refer to the semantically favored referent – the experiencer subject with a consequence relation and the stimulus object with a causal relation. Second, the interpretation of the d-pronoun was also affected by the coherence relation, but the preference did not switch, being always in agreement with the purported object orientation of d-pronouns. The object preference of the d-pronoun was almost absolute when the stimulus object was favored semantically by a cause relation. When the experiencer subject was favored semantically by a consequence relation, the d-pronoun still showed a preference for the stimulus object, although a weaker one in comparison to sentences with a cause relation. In sum, the experiment of Portele & Bader (2020) found that the preferred antecedent of the p-pronoun varied with the coherence relation whereas the preferred antecedent of the d-pronoun was always the object referent and only the strength of the preference varied with the coherence relation.

A major question raised by these results is why the coherence manipulation did not result in a preference reversal for the d-pronoun although it did so for the p-pronoun. As pointed out above, Kaiser & Trueswell (2008) found that the various structural factors are differentially weighted for p- and d-pronouns, a finding replicated for German by Bader & Portele (2019). In the spirit of the form-specific approach proposed by Kaiser & Trueswell (2008), it is possible that semantic and structural biases are also differentially weighted for p- and d-pronouns. For p-pronouns, semantic biases, in particular implicit causality/consequentiality, must have more weight than structural biases. For d-pronouns, in contrast, topichood could have the strongest weight. This assumption would be in line with the often made claim that d-pronouns must refer to non-topical antecedents (Comrie 1997; Bosch & Umbach 2007; Hinterwimmer 2015).4 Since the object was always non-topical in Portele & Bader (2020), a dominant anti-topic bias would predict the lack of a preference reversal for d-pronouns but would still leave room for minor influences of semantic bias, modulating the strength of the observed object preference. Alternatively, it could be the object status itself that prevented participants from construing the experiencer subject as antectedent of the d-pronoun (see Patil et al. 2020).

An alternative account is provided by theories postulating a close connection between pronoun interpretation and pronoun production. Instead of a principled difference between p- and d-pronouns, the particular properties of the contexts investigated by Portele & Bader (2020) may be responsible for why coherence led to a preference reversal for p-pronouns, but not for d-pronouns. Relevant data for such an account come from Portele & Bader’s (2020) second experiment, which presented participants with the same contexts as shown in (5) followed by a no-pronoun prompt, that is, a blank line. The results of this second experiment showed a strong semantic bias induced by a cause relation, as evidenced by over 80% continuations with the stimulus referent taken up again. With a consequence relation, as expected the experiencer argument was the referent that was mentioned most often, but the preference was weak, reaching 46% of all cases. The stimulus was mentioned next in 30%, and the remaining continuations either contained no reference to experiencer or stimulus at all, or a joint reference to both of them. Since the experiencer was the subject in the materials of Portele & Bader (2020), the semantic bias of the consequence relation toward the experiencer subject may have been too weak to overcome the d-pronoun’s structural bias toward the object referent. An additional finding of this experiment was that p-pronouns were used more often for referring to the subject than to the object, but objects were referred to with a p-pronoun in still 10% of all cases. D-pronouns, in contrast, were used rarely, and they always referred to the object and never to the subject. This would be compatible both with a relatively strict anti-topic constraint for d-pronouns and with a strong object orientation of d-pronouns (see discussion in Kaiser & Fedele 2019). However, since the subject was maximally prominent and the object maximally non-prominent in these materials, this finding is also compatible with a more gradient preference of d-pronouns referring to less prominent referents. If interpretation preferences match production preferences, the strong prominence asymmetry in the materials of Portele & Bader (2020) may have contributed to the lack of a preference reversal in the case of d-pronouns.

In sum, the finding that the coherence manipulation applied by Portele & Bader (2020) reversed the preferred antecedent of the p-pronoun from subject to object whereas the preferred antecedent of the d-pronoun remained the object could be due to a principled difference between p- and d-pronouns or to specific properties of the materials investigated by Portele & Bader (2020). In order to resolve this issue, we ran a series of experiments using contexts that are similar to those used by Portele & Bader (2020) but differ in some crucial respects. An example is provided in (7).

    1. (7)
    1. a.
    1. Vor
    2. before
    1. kurzem
    2. shortly
    1. wurde
    2. was
    1. der
    2. the
    1. Neubau
    2. reconstruction
    1. des
    2. of-the
    1. Museums
    2. museum
    1. fertiggestellt.
    2. completed
    1. ‘Recently, the reconstruction of the museum was completed.’
    1.  
    1. b.
    1. Bei
    2. at
    1. der
    2. the
    1. Feier
    2. ceremony
    1. war
    2. was
    1. ein
    2. a
    1. international
    2. internationally
    1. bekannter
    2. known
    1. Journalist
    2. journalist
    1. anwesend.
    2. present
    1. ‘At the opening ceremony, an internationally known journalist was present.’
    1.  
    1. c.
    1. Ein
    2. a
    1. innovativer
    2. innovative
    1. Aussteller
    2. exhibitor
    1. hat
    2. has
    1. den
    2. the
    1. Journalisten
    2. journalist
    1. besonders
    2. especially
    1. beeindruckt.
    2. impressed
    1. ‘An innovative exhibitor impressed the journalist especially.’

The context in (7) differs from the one in (5) in two major ways. First, instead of a psychological adjective with an experiencer subject and a stimulus realized as PP object, the final context sentence contains an object-experiencer verb, that is, a psych verb with the stimulus realized as subject and the experiencer as direct object. This change makes the context more similar to contexts used in prior research on how implicit causality and consequentiality affect pronoun processing (e.g., Stevenson et al. 1994; Crinean & Garnham 2006; Fukumura & van Gompel 2010; Rohde & Kehler 2014). Based on this research, we expect that the causal and the consequential discourse marker induce strong semantic biases of about equal strength.

Second, the referent introduced in the second context sentence again fills the experiencer role in the third context sentence, but because the third sentence now contains an object-experiencer verb, this referent is no longer the subject but the object of the sentence. Since the object referent is the only referent of the final context sentence already given in the preceding sentence, it is the sentence topic according to the sources cited above. Thus, in contrast to the materials of Portele & Bader (2020), the sentence topic is now in object position. Due to this change, neither of the two referents of the third context sentence is strongly favored by structural biases, as shown by the boxes above each referent in (7c). Whereas subjecthood and initial position make the subject referent more prominent, the prominence of the object referent is increased by being the topic. The risk that effects of semantic biases are masked by structural biases is therefore lowered.

3 Experiment 1

Experiment 1 was run to determine next-mention preferences and pronominalization rates induced by contexts as in (7). To this end, Experiment 1 presents contexts as in (7) followed by a no-pronoun prompt. Experiment 1 thereby allows us to validate that contexts as in (7) differ from the contexts used by Portele & Bader (2020) in the way discussed in the introduction. In particular, because subjecthood and topichood are not confounded in Experiment 1, it becomes possible to determine which of these two factors is the more important determinant for pronominalization. In addition, Experiment 1 yields production data needed for testing the Bayesian theory of pronoun resolution.

In order to have participants produce continuations standing in either a cause or consequence relation, a question asking for a reason or a consequence was inserted between the final context sentence and the prompt, as illustrated in (8) (see Kehler & Rohde 2017 for a similar procedure). This kind of manipulation differs from continuation studies using within-sentence coherence markers (e.g., a complementizer like because). Using within-sentence markers was not an option for the present study. Since d-pronouns are even less common in subordinate compared to main clauses, it was important that participants would produce main clauses. Questions were thus the most natural way to induce the relevant coherence relation.

    1. (8)
    1. Ein
    2. a
    1. unheimlicher
    2. scary
    1. Zauberer
    2. magician
    1. hat
    2. has
    1. den
    2. the
    1. Jungen
    2. boy
    1. dabei
    2. thereby
    1. geängstigt.
    2. frightened
    1. ‘A scary magician frightened the boy in the course of this.’
    2. a. Was war der Grund dafür? ‘What was the reason for this?’
    3. b. Was war die Folge davon? ‘What was the consequence of this?’
    4. Prompt: _____________________________________

Participants’ task was to write down a sensible answer to the question in the form of a full sentence, with no further constraints regarding form or content of the answer. Participants were thus free to mention whatever referent(s) they thought would make a good answer, choosing whatever referential form they considered appropriate.

With regard to the next-mention bias, prior research on psych verbs and coherence relations (e.g., Stevenson et al. 1994; Fukumura & van Gompel 2010; Holler & Suckow 2016) has found that participants preferentially mention the stimulus subject in continuations giving a cause for the psych verb clause whereas the experiencer object is preferentially mentioned when the continuation stands in a consequence relation with the preceding clause.

With regard to the choice of referential expressions, prior studies of psych verb contexts have provided evidence for structural factors as main or only determinants of pronoun choice. Stevenson et al. (1994) found that most of the time pronouns were used for referring back to subject antecedents. Their third experiment included both a causal as well as a consequential discourse marker, as shown in (9).

    1. (9)
    1. a.
    1. Geoff admired Ken because/so…
    1.  
    1. b.
    1. Ken impressed Geoff because/so…

Participants’ choice of a referent was determined by an interaction of verb semantics and coherence relation. Whereas participants preferred to talk about the stimulus Ken in because completions, they preferred to mention the experiencer Geoff in so completions. The choice of using a pronoun, however, was determined by the referent’s syntactic function. Whereas pronouns were used to talk about subjects, objects were taken up in completions by using a name. Higher pronoun rates when referring back to subject compared to object referents were also found by Fukumura & van Gompel (2010) for sentences with two referents of different gender and for German by Holler & Suckow (2016).

Based on the often made observation of a close association between subjecthood and topichood, Rohde & Kehler (2014: 919) have proposed the Topichood Hypothesis which states that pronouns are used to indicate maintenance of the current topic, and that the seeming effect of syntactic function found in prior research is the result of the close although not absolute association of topic status with syntactic functions.5 In their second experiment, Rohde & Kehler (2014) had participants continue active (10a) and passive sentences (10b) including object-experiencer verbs with a no-pronoun prompt.

    1. (10)
    1. a.
    1. Amanda amazed Brittany. __________________
    1.  
    1. b.
    1. Brittany was amazed by Amanda. __________________

Rohde & Kehler assume that the likelihood of interpreting the subject as topic is higher in passive than in active sentences because a main function of passive formation is to promote the underlying object to the syntactic subject and thus the preferred topic position in English. In accordance with this assumption, the pronominalization rate of the subject was higher for passive than for active clauses, whereas the pronoun rate for non-subjects did not differ significantly with voice. The results thus support the Topichood Hypothesis according to which topichood, and not grammatical function, drives pronominalization.

As shown in (7), in the contexts of Experiment 1 the object is the topic and the subject is a non-topic. Topichood and subjecthood are thus not confounded, which allows Experiment 1 to tease apart syntactic function and topic status as determinants of pronoun production. There are three main possibilities. First, if syntactic functions govern pronoun production (e.g., Stevenson et al. 1994; Fukumura & van Gompel 2010), the likelihood of using a p-pronoun should be higher for subject antecedents than for object antecedents. Second, if topichood is the main determinant of pronoun production (Rohde & Kehler 2014), participants should choose pronouns more often to refer to the object of the preceding sentence, since the experiencer object is always the topic in Experiment 1. A third possibility is that syntactic function and topic status jointly govern the use of pronouns. In this case, the rate of pronominalization should be more evenly distributed across subject and object because syntactic function and topic status pull in different directions (subject = non-topic, object = topic).

Research on contexts not involving implicit causality verbs has provided evidence that pronoun production is also influenced by likelihood of reference. Arnold (2001) investigated the choice of referential expressions following sentences with transfer of possession verbs (e.g., ‘Peter gave Mary a book’ versus ‘Mary received a book from Peter’) and found that participants were more likely to re-mention the goal character (‘Mary’) than the source character (‘Peter’). Additionally, participants were more likely to produce pronouns to refer to goal characters than to source characters. This finding was replicated by Rosa & Arnold (2017). Vogels (2019) found evidence for a next-mention effect on the choice of referring expressions in Dutch, but Hoek et al. (2021) did not find evidence for predictability influencing the rate of pronominalizations in contexts involving three referents in English. In more recent work and different from the previous studies mentioned above, Weatherford & Arnold (2021) could extend effects of likelihood of reference on pronoun production to implicit causality contexts. The authors found that pronominalization rates were higher for the more likely stimulus referent compared to the experiencer, but this finding was limited to object referents. However, our materials only include experiencer objects, which are less likely to be mentioned again in causal contexts. Due to this difference and the mixed results for implicit causality contexts, we remain agnostic as to whether pronominalization rates will be influenced by the likelihood of reference. If semantic biases in terms of likelihood of reference influence pronoun production, we should find an interaction between coherence relation and referent for the production of pronouns in Experiment 1. In causal contexts, the stimulus is most expected in the continuation, which should result in a higher pronoun rate for stimuli than for experiencers, whereas the likelihood of being re-mentioned is higher for experiencers in consequential contexts, resulting in the prediction of more pronouns used for experiencers than stimuli.

While the interpretation of d-pronouns has been the subject of numerous experimental studies, only a few studies have looked at their production. For German, Bader & Portele (2019) found that d-pronouns occur rarely in written continuations. Among the possible reasons for this finding are preferences in terms of register or style. In addition to the d-pronoun der (literally ‘the’) as used in the interpretation experiments above, German has a further demonstrative pronoun (dieser ‘this’). This pronoun is regularly observed in experiments with no-pronoun prompts, that is, when participants are free to choose a referential expression. The pronoun der is more frequently used in informal and spoken language rather than in formal settings; dieser, in contrast, is associated with a formal writing style (e.g., Ahrenholz 2007; Weinert 2011; Patil et al. 2020). Experimental research comparing the interpretation of der and dieser has not revealed any differences as far as semantic and structural factors are concerned (Patil et al. 2020), so we will subsume both under the term ‘d-pronoun’ in the following. From the linguistic literature on German d-pronouns, we expect participants to use them to refer back to non-topics or to objects. Furthermore, corpus studies have shown that d-pronouns are used more often for co-reference with objects than with subjects (Bosch et al. 2003; Portele & Bader 2016). In these studies, information structure was not annotated, so preferences due to topichood and preferences due to syntactic function could not be teased apart. In the experimental materials of Experiment 1, however, the object is the topic. Thus, under the anti-topic hypothesis, d-pronouns should be mainly used to talk about the non-topic subject referent. In contrast, if the grammatical function of the antecedent is the structural main determinant in the production of d-pronouns, participants should choose them more often for object than subject referents, even though they are topics.

3.1 Method

3.1.1 Participants

Forty-two students of the Goethe University Frankfurt participated in Experiment 1 for course credit. All participants were native speakers of German and naive with respect to the purpose of the experiment.

3.1.2 Materials

Twenty contexts, each consisting of three sentences, were created for Experiment 1 (see the supplementary materials for all of the experimental items used). A complete example is shown in Table 1. The first sentence set up a scene for the following sentences. The second context sentence introduced a male referent by means of an indefinite NP. The third context sentence always contained an object-experiencer verb, with the male referent of the second context sentence as experiencer object. The stimulus subject of the psych verb was a second male character newly introduced in the third context sentence by means of an indefinite NP. Only male referents occurred in the context sentences because only masculine third-person pronouns are unambiguous in German with respect to case and number. The context was followed by a question asking either for the cause or the consequence of the final context sentence. A blank line was provided for participants to write down an answer to this question.

Table 1

A complete stimulus item for Experiment 1 including example continuations given by participants in the lower part.

Context sentences
    1. [C1]
    1. Vor
    2. before
    1. kurzem
    2. shortly
    1. wurde
    2. was
    1. der
    2. the
    1. Neubau
    2. reconstruction
    1. des
    2. of-the
    1. Museums
    2. museum
    1. für
    2. for
    1. moderne
    2. modern
    1. Kunst
    2. art
    1. fertig gestellt.
    2. completed
    1. [C2]
    1. Bei
    2. at
    1. der
    2. the
    1. Eröffnungsfeier
    2. opening-ceremony
    1. war
    2. was
    1. ein
    2. a
    1. international
    2. internationally
    1. bekannter
    2. known
    1. Journalist
    2. journalist
    1. anwesend.
    2. present
    1. [C3]
    1. Ein
    2. a
    1. innovativer
    2. innovative
    1. Ausstellungsmacher
    2. exhibition-organizer
    1. hat
    2. has
    1. den
    2. the
    1. Journalisten
    2. journalist
    1. besonders
    2. especially
    1. beeindruckt.
    2. impressed

‘Recently, the reconstruction of the museum of modern art was completed. At the opening ceremony, an internationally known journalist was present. An innovative exhibition organizer impressed the journalist especially.’
Continuation Prompt:
    1. Cause:
    1. Was war der Grund dafür? ‘What was the reason for this?’______________________
    1. Consequence:
    1. Was war die Folge davon?’ ‘What was the consequence of this? _______________
Condition Referent category Completion
Cause Stimulus subject Er hatte ein beeindruckendes Werk aus Spiegeln gebaut.
‘He had built an impressive work out of mirrors.’
Der Ausstellungsmacher sprach auf Latein.
‘The exhibition organizer spoke in Latin.’
None Die Präsentation war charismatisch.
‘The presentation was charismatic.’
Consequence Experiencer object Er schrieb einen langen Artikel über den Ausstellungsmacher.
‘He wrote a long article about the exhibition organizer.’
Der Journalist stellte dem Ausstellungsmacher viele Fragen.
‘The journalist asked the exhibition organizer a lot of questions.’
Both Sie haben beschlossen zusammen zu arbeiten.
‘They decided to work together.’

Combining each context with one of two questions (cause versus consequence) resulted in two different versions of each experimental item. The 20 items were distributed across two lists according to a Latin square design. Each list contained exactly one version of each item and an equal number of items in each condition. 20 filler items, which also contained female entities and temporal connectives, were added to each experimental list in such a way that experimental items were always separated by one filler item.

3.1.3 Procedure

Participants received a written questionnaire. They were asked to read each context and the following question and then to answer the question by writing a natural-sounding sentence. Participants completed the questionnaires during regular class sessions. Although all students in class got a questionnaire, there was no obligation to fill it out completely. Questionnaires that were returned with more than three items without a continuation were not considered for further analysis. Completing a questionnaire took about 20 minutes.

3.1.4 Scoring

Eighteen out of 840 trials had to be excluded because either no continuation was given or the continuation was either semantically deviant (e.g., containing a feminine pronoun despite no feminine antecedent in the context, or giving a reason instead of a consequence) or was not a full clause. The remaining 822 continuations were scored with regard to which of the referents given in the context was mentioned first in the continuation and what kind of referential expression was used for this purpose. With regard to the choice of a referent, each continuation was scored by a student assistant and the second author using five categories: The experiencer was mentioned first, the stimulus was mentioned first, experiencer and stimulus were mentioned together using a single NP, neither experiencer nor stimulus was mentioned, or the continuation was ambiguous as to the intended referent. The two scorers agreed in 95% of all cases (Krippendorff’s alpha = 0.919). The 5% continuations for which there was no agreement were removed from the following analyses.

For all continuations that contained either the experiencer or the stimulus as first-mentioned referent, we scored the linguistic expression used for referring back to that referent according to the following three categories: p-pronoun er, d-pronoun (der or dieser), definite NP (e.g., der Junge). Sample continuations including different referential forms given by participants for both discourse relations are shown in the lower part of Table 1. In addition to the linguistic expression, we coded the syntactic function (subject or not) and the position (sentence-initial or not) of each referent. As shown in Table 2, over 90% of all NPs referring back to the stimulus or experiencer occurred as subject in clause-initial position. When we talk about references to the stimulus or the experiencer without qualification in the following, we only mean continuations in which reference was made by a subject NP in initial position. These continuations correspond to the pronoun prompt used in Experiment 3 and thus provide the data for testing the relationship between pronoun interpretation and production in Experiment 3.

Table 2

Syntactic function and position within the continuation sentence of the NP referring back to the stimulus or experiencer argument of the final context sentence in Experiment 1.

Coherence: Cause Coherence: Consequence
Clause-initial Not clause-initial Clause-initial Not clause-initial
Subject 363 (96.3%) 8 (2.1%) 291 (90.4%) 24 (7.5%)
Object 5 (1.3%) 1 (0.3%) 3 (0.9%) 4 (1.2%)

3.2 Results

All statistical analyses reported in this paper were conducted using the statistics software R (R Core Team 2020). For the inferential statistics involving proportions, we computed generalized mixed models using the R package lme4 (Bates et al. 2015). Main factors and interaction terms were entered as fixed effects into the models, using effect coding (0.5 vs. –0.5). In addition, we included random effects for items and subjects with maximal random slopes supported by the data, following the strategy proposed in Bates et al. (2015).

3.2.1 Choice of referent

Figure 1 shows how often the referents in the context were mentioned first when answering the question following the context. When the question asked for a cause, a clear majority of 86.6% of all continuations contained a sentence-initial subject referring to the stimulus subject of the preceding clause. References to the experiencer object were already rare at 7%, and all other categories together occurred in less than 7% of all continuations. When the question asked for a consequence, the majority of the continuations contained a reference to the experiencer object. With 56.6%, the bias toward the experiencer was weaker than the stimulus bias observed with a cause relation. References to the stimulus subject occurred in 19.0% of all consequence continuations. The about 25% remaining continuations were distributed across continuations with no reference to a context referent (11.4%), reference to the experiencer when it was not the sentence-initial subject (6.8%), or reference to both referents of the context (4.9%). A mixed-effects model with reference to the experiencer object as the dependent variable revealed that coherence had a significant effect ( β^=3.285 , SE = 0.263, z-value = 12.49, p < 0.001).

Figure 1
Figure 1

Percentages of referents from the context re-mentioned in the continuations of Experiment 1. ‘S-Stimulus’ and ‘O-Experiencer’ refer to continuations in which the respective argument was taken up as a sentence-initial subject. Other references to the stimulus or experiencer are denoted by ‘S-Stimulus*’ and ‘O-Experiencer*’, respectively.

3.2.2 Choice of referential expression

Figure 2 shows how often the different referential forms were used for referring to the stimulus subject or the experiencer object following either a cause or consequence question. The vast majority of all references were made with a p-pronoun or a definite NP. Because pronouns are of main interest to the current study, the mixed-effects model summarized in Table 3 had proportions of p-pronouns as the dependent variable. As shown in Table 3, the proportions of p-pronouns were significantly affected by the two main factors Coherence and Referent; the interaction between the two factors was not significant. The rate of p-pronouns was higher when the question asked for a cause than when it asked for a consequence (62% versus 40%), and it was higher when referring to the stimulus subject than when referring to the experiencer object (55% versus 48%).

Figure 2
Figure 2

Percentages of referential forms used to refer to the stimulus subject or the experiencer object of the final context sentence in Experiment 1.

Table 3

Generalized mixed model for Experiment 1 with use of a p-pronoun as dependent variable.

β^ SE z-value Pr (>|z|)
Intercept –0.0899 0.3431 –0.26
Coherence 1.2366 0.3270 3.78 p < 0.01
Referent 0.7551 0.3347 2.26 p < 0.05
Coherence × Referent 0.3039 0.6419 0.47 n.s.

As found in other research investigating the production of p- and d-pronouns (Bader & Portele 2019; Portele & Bader 2020), the rate of d-pronouns was quite low in Experiment 1. Overall, only 3.7% (n = 24) of all continuations contained a d-pronoun, with about equal numbers of der pronouns (n = 11) and dieser pronouns (n = 13). By far the largest number of d-pronouns was used for referring to the stimulus subject with a cause relation (n = 19, 5.5% of all references to the stimulus after a cause question), followed by references to the experiencer object with a consequence relation (n = 4, 1.8% of all references to the experiencer after a consequence question).

3.3 Discussion

With regard to likelihood of next mention, Experiment 1 replicates prior findings from experiments investigating psych verbs. A preference to mention the stimulus was found with a cause relation whereas a preference to mention the experiencer showed up with a consequence relation. As in Portele & Bader (2020), the next-mention preference was stronger with a cause than with a consequence relation, but the difference was substantially reduced. With a cause relation, a vast majority of about 88% continuations contained a reference to the stimulus subject; with a consequence relation, the preference in favor of the experiencer object was at about 60%. We conjecture that the strength of the next-mention bias differs between cause and consequence relations because mental states, as described by the psych verbs used in our experimental materials, typically have a single cause attributed to the stimulus argument but varied consequences that can arise for both the experiencer and the stimulus argument (e.g., Crinean & Garnham 2006). So, in example (8) from above (a magician frightening a boy), the frightening can have consequences for the experiencer argument (e.g., the boy starting crying) as well as the stimulus argument (e.g., the magician being fired).

With regard to the choice of referential expressions, Experiment 1 found that p-pronouns were used more often when the next-mentioned referent was the stimulus subject than when it was the experiencer object. This effect is in line with prior studies claiming that syntactic functions govern pronominalization (Stevenson et al. 1994; Fukumura & van Gompel 2010). Because the object was the sentence topic and the subject was non-topical in Experiment 1, finding a higher pronominalization rate for subjects contradicts the Topichood Hypothesis of Rohde & Kehler (2014), which predicted a higher pronoun rate for references to the object (=topic) than references to the subject (=non-topic). However, with a pronoun rate of 55% for subject references and 48% for object references, the difference was much smaller than expected from the prior literature. For example, in Stevenson et al’s (1994) Experiment 1, in which the continuation was a separate main clause as in Experiment 1, the pronominalization rate was 64% for subjects and 21% for objects. In Portele & Bader (2020), a pronoun was used for over 80% of all references to the subject but for only about 10% of all references to the object.

We hypothesize that the rate of pronominalization was only slightly higher for subjects than for objects in Experiment 1 because subjecthood and topichood were not aligned, and pronominalization is driven both by syntactic function, as claimed by Stevenson et al. (1994) and Fukumura & van Gompel (2010), and by topic status, as claimed by Rohde & Kehler (2014). Thus, when the subject is the topic and the object is accordingly non-topical, as in most experiments, a large difference in pronominalization results. In Experiment 1, in contrast, the object was the topic and the subject was non-topical. Thus, subjecthood favored pronominal references to the subject and topichood favored pronominal references to the object, resulting in pronominalization rates near 50% for both subject and object. The fact that the pronominalization rate was still somewhat higher for the subject than for the object suggests that subjecthood is higher weighted than topichood as determinant of pronominalization.

In line with Stevenson et al. (1994) and Fukumura & van Gompel (2010), our results showed no evidence for the hypothesis that likelihood of being mentioned next affects participants’ pronoun production. This hypothesis predicted an interaction between coherence and referent because the stimulus subject was more likely in causal continuations, whereas the experiencer object was more likely in consequential continuations. However, the predicted interaction was not observed. Note that Weatherford & Arnold (2021) found an effect of next mention bias in implicit causality contexts only for object referents. Since in our materials, the object was always the less likely experiencer argument in the causal condition, we refrain from comparing the findings at this point.

An effect of coherence was also found in participants’ choice of referential forms. As in Experiment 2 of Portele & Bader (2020), p-pronouns were produced more frequently following cause questions than after consequence questions. This finding replicates a higher pronominalization rate in continuations following because than so obtained by Fukumura & van Gompel (2010). Fukumura & van Gompel explained their finding with reference to syntactic differences between because and so clauses. Their argumentation does not apply to the current study, because our participants produced main clauses following questions in both the causal and consequential condition, so there were no syntactic differences between the two coherence conditions. Our results therefore suggest that in addition to structural biases, semantic influences due to coherence can affect the choice of referential forms.

As in prior research, the number of d-pronouns was low, preventing firm conclusions. What is most remarkable in the current context is that d-pronouns were mainly used for making reference to the subject referent, in contrast to Portele & Bader (2020), where d-pronouns were used exclusively to refer to the object referent. Since the subject was the topic in Portele & Bader (2020) whereas the object was the topic in Experiment 1, this is accordance with the claim that d-pronouns are typically used for referring to the non-topic. Given the low numbers of demonstratives, we will explore this issue further in Experiment 2.

4 Experiment 2

The low numbers of d-pronouns produced in Experiment 1 make it difficult to reliably assess their production pattern. In order to increase the number of d-pronouns, Experiment 2 differs from Experiment 1 in two ways. First, as in Fukumura & van Gompel (2010), one referent of the context was printed in red color in order to signal that the continuation sentence should contain a reference to this referent. This has the effect that less preferred referents are mentioned as often as highly preferred referents, providing an equal number of data points for all referents. Second, participants were asked to use only the p-pronoun er or the d-pronoun der when referring to the marked referent.6 Since definite NPs were no longer allowed, this will increase the number of p- and d-pronouns in participant’s continuations.

4.1 Method

4.1.1 Participants

21 students at the Goethe University Frankfurt participated in Experiment 2 for course credit. All participants were native speakers of German and naive with respect to the purpose of the experiment. None of the participants had already participated in Experiment 1.

4.1.2 Materials

Experiment 2 presented the same contexts that were presented in Experiment 1, but with a different prompt. In order to elicit continuations standing either in a cause or a consequence relation to the final context sentence, Experiment 2 used discourse markers as shown in (11).

    1. (11)

The discourse marker nämlich (‘(be)cause’) establishes a causal relationship to the preceding sentence. Syntactically, nämlich is special insofar as it cannot appear sentence-initially but must appear at some later position. Since nämlich is the only German causal discourse marker for connecting main clauses, we used it in Experiment 2 and the following experiments despite this restriction. Because the finite verb is located in the second position in a German main clause, two text fields were included before nämlich in the continuation prompt, one for the pronoun and one for the finite verb. After the discourse marker, a further longer textfield was provided for the rest of the sentence. The continuation prompt for a consequence relation contained the discourse marker deshalb (‘therefore, hence’). This discourse marker can appear clause initially but also at later positions. For reasons of parallelism, prompts with deshalb were constructed in the same way as prompts for nämlich.

4.1.3 Procedure

Experiment 2 was run as a web experiment on Ibex Farm (Drummond et al. 2016).7 Each experimental session started with an instruction page asking participants to provide a continuation sentence for each context, using either the p-pronoun er or the d-pronoun der for referring to the referent printed in red in the preceding context.

4.2 Results

The statistical analysis followed the same procedure described for Experiment 1. Figure 3 shows the percentages of trials in which participants used a d-pronoun for referring to the referent marked in the context. Overall, participants produced 27% d-pronouns and 73% p-pronouns in their continuations. A generalized mixed-effects model with choice of d-pronoun as dependent variable, the two factors Antecedent and Coherence as fixed effects, and participants and items as random effects revealed a main effect of Coherence, with the causal discourse marker resulting in a higher proportion of d-pronouns than the consequential discourse marker (31% versus 23%; β^=0.536 , SE = 0.256, z-value = 2.09, p < 0.05). Whether the antecedent was a subject or an object did not significantly affect the rate of d-pronoun production (26% versus 28%; β^=0.139 , SE = 0.255, z-value = 2.09, n.s.). The interaction between Antecedent and Coherence was also not significant ( β^=0.153 , SE = 0.509, z-value = 0.30, n.s.).

Figure 3
Figure 3

Percentages of choices of the d-pronoun to refer to the stimulus subject or the experiencer object of the final context sentence in Experiment 2.

4.3 Discussion

Experiment 2 has yielded two major results. First, the syntactic function of the antecedent did not affect how often p- and d-pronouns were produced. Second, d-pronouns were used more often following a cause relation than following a consequence relation; conversely, p-pronouns were used more often following a consequence than following a cause relation. Both findings contrast with findings of Experiment 1.

With regard to the lack of an effect of syntactic function, it has to be noted that in Experiment 1 both p- and d-pronouns were produced more often for subject than for object antecedents (see Figure 2). In Experiment 2, however, percentages of p- and d-pronouns had to sum to 100%, that is, only one of the two pronouns could show the increase due to syntactic function expected from Experiment 1. The rate of d-pronouns seems to have increased more strongly for experiencer objects than for stimulus subjects, compatible with the object-orientation of d-pronouns. Since this increase necessarily decreased the p-pronoun rate, we see no effect of syntactic function at all.

The reversed coherence effect of Experiment 2 can be explained in a similar way. In Experiment 1, causal continuations showed higher pronoun rates than consequential continuations for both pronouns, with an even stronger increase for d- than for p-pronouns. In Experiment 2, d-pronouns again showed a higher rate for causal continuations, at the cost of p-pronouns, which therefore showed the opposite pattern. At this point, we do not have a clear understanding of the source of these findings. For reasons of space, we refrain from further speculations, leaving it as an open question for future research.

5 Experiment 3

Experiment 3 investigates how coherence affects the interpretation of p- and d-pronouns. This experiment uses the same contexts as the two preceding experiments, but with a pronoun prompt instead of a no-pronoun prompt. The prompts that followed the final context sentence were similar to those used in Experiment 2, with one major change. As shown in (12), the initial empty textfield was replaced by either the p-pronoun er or the d-pronoun der.

    1. (12)
    1. Ein
    2. a
    1. unheimlicher
    2. scary
    1. Zauberer
    2. magician
    1. hat
    2. has
    1. den
    2. the
    1. Jungen
    2. boy
    1. dabei
    2. thereby
    1. geängstigt.
    2. frightened
    1. ‘A scary magician frightened the boy in the course of this.’
    2. a. Er/Der ___ nämlich ___ ‘This was for the reason that he/he-DEM …’
    3. b. Er/Der ___ deshalb ____ ‘This had the consequence that he/he-DEM …’

Because Experiment 3 was run as a paper-and-pencil questionnaire, blank lines replaced the textfields for the finite verb and the rest of the sentence.

Prior work on coherence and pronoun resolution (Stevenson et al. 1994; Stevenson et al. 2000; Crinean & Garnham 2006; Järvikivi et al. 2017; Portele & Bader 2020) results in the following predictions. Both p- and d-pronouns should show a preference for the stimulus subject with a causal discourse marker but a preference for the experiencer object with a consequential discourse marker. For p-pronouns, it has often been shown that semantic biases easily override structural biases, so their final interpretation should follow the pattern predicted by the semantic bias. For d-pronouns, in contrast, it is an open question how semantic and structural biases interact, as discussed above. One possibility is that structural biases outweigh semantic biases. In this case, the preferred interpretation will depend on which structural factor is most highly weighted, independently of the particular coherence relation. If information structure is decisive, d-pronouns should preferentially refer to non-topic referents, that is, the stimulus subject. On the other hand, if syntactic function is decisive, the experiencer object should be the preferred antecedent throughout.

Alternatively, semantic and structural biases may work together as postulated by the Bayesian Theory of pronoun resolution. Table 4 summarizes the application of the Bayesian formula (repeated in (13)) to the data obtained in Experiment 1. The probabilities P(ref) and P(pro|ref) are estimated by the mean proportion of each combination of Referent, Pronoun, and Coherence (see Figures 1 and 2).

Table 4

Application of the Bayesian formula; the probabilities P(ref) and P(pro|ref) are the grand means from Experiment 1. The values given in parenthesis in the column P(ref|pro) are the corresponding probabilities based on the by-item-analysis described in the section testing the Bayesian Model (Section 5.3).

P(pro|ref) ×
Pronoun Coherence Referent P(ref) P(pro|ref) P(ref) P(ref|pro)
P-pronoun Cause Subject .875 .65 .576 .93 (.93)
Object .070 .59 .042 .07 (.07)
Consequence Subject .198 .44 .087 .28 (.24)
Object .595 .37 .220 .72 (.76)
D-pronoun Cause Subject .875 .06 .049 1.00 (1.00)
Object .070 .00 .000 .00 (.00)
Consequence Subject .198 .01 .002 .20 (.20)
Object .595 .02 .010 .80 (.80)
    1. (13)
    1. P(referentpronoun)=P(pronounreferent)P(referent)referentreferentsP(pronounreferent)P(referent)

Applying the Bayesian formula in (13) to the probabilities P(ref) and P(pro|ref) in Table 4 gives the predictions shown in the final column in Table 4. For p- and d-pronouns alike, the theory predicts a very strong stimulus subject preference with a cause relation and a strong experiencer object preference with a consequence relation, in accordance with the next-mention biases found in Experiment 1. In Section 5.3, we provide a formal test of the Bayesian formula taking by-item variation into account. For this analysis, some items had to be excluded due to missing data. The prediction derived from this data subset are shown in parenthesis in the final column of Table 4.

5.1 Method

5.1.1 Participants

Forty students of the Goethe University Frankfurt participated in Experiment 3 for course credit. All participants were native speakers of German and naive with respect to the purpose of the experiment.

5.1.2 Materials

Experiment 3 investigated the same 20 three-sentence contexts created for Experiment 1 (see Table 5 for a complete example). The only difference to Experiment 1 concerns the prompt. Instead of a question and a blank line, the contexts in Experiment 3 were followed by a pronoun prompt starting with either the p-pronoun er or the d-pronoun der. In addition, the prompt contained either the causal discourse marker nämlich or the consequential discourse marker deshalb.

Table 5

A complete stimulus item for Experiment 3 including example continuations given by participants in the lower part. Continuations investigated in Experiment 4 (acceptability rating) are printed in bold.

Context sentences
    1. [C1]
    1. Vor kurzem wurde der Neubau des Museums für moderne Kunst fertiggestellt.
    1. [C2]
    1. Bei der Eröffnungsfeier war ein international bekannter Journalist anwesend.
    1. [C3]
    1. Ein innovativer Ausstellungsmacher hat den Journalisten besonders beeindruckt.

‘Recently, the reconstruction of the museum of modern art was completed. At the opening ceremony, an internationally known journalist was present. An innovative exhibition organizer impressed the journalist especially.’

Continuation Prompt:
    1. Cause:
    1. Er/Der _______ nämlich _______ (‘This was for the reason that he/he-DEM …’)
    1. Consequence:
    1. Er/Der ______ deshalb ______ (‘This had the consequence that he/he-DEM …’)
Condition Referent category Completion
Cause Stimulus subject Er hatte nämlich etwas komplett Neues entworfen.
‘He created something completely new.’
Der ist nämlich auf ihn eingegangen.
‘he-DEM responded to him.’
Consequence Stimulus subject Er wurde deshalb interviewt.
‘He was interviewed.’
Der wurde deshalb in dessen Artikel besonders erwähnt.
‘he-DEM was particularly mentioned in his article.’
Experiencer object Er möchte deshalb ein Interview mit ihm machen.
‘He wants to have an interview with him.’
Der schrieb deshalb eine besonders gute Kritik.
‘he-DEM wrote an extraordinary good critique.’

Crossing the two factors Pronoun (p- versus d-pronoun) and Coherence (cause versus consequence) in the continuation prompt resulted in four different versions of each experimental item. The 20 items were distributed across four lists according to a Latin square design. 20 filler items that also encompassed female entities and temporal connectives were added to each list in such a way that experimental items were always separated by one filler item.

5.1.3 Procedure

Participants received a written questionnaire which they completed during regular class sessions. In the instruction, they were asked to read the contexts and then to fill out the slot for the verb and the line following the discourse marker. The instruction contained two short texts including a pronoun prompt together with an example continuation. Completing a questionnaire took about 20 minutes.

5.1.4 Scoring

For nine of the 800 elicited continuations, either no continuation was given, the continuation did not fit the intended template (e.g., the discourse marker started a new clause), or it was semantically inappropriate (giving a consequence instead of a cause). The second author and a student assistant who was naive regarding our research questions coded the remaining continuations as to whether the pronoun was coreferential with the first or second NP (stimulus subject or experiencer object) of context sentence 3. In case of uncertainty, the continuation was marked as ambiguous. The agreement rate of the two raters was 94% (Krippendorff’s alpha = 0.879). The 50 continuations for which the two raters did not agree and five continuations (0.67%) that were classified as ambiguous by both raters were excluded from the analysis. This left 736 continuations for analysis. Example continuations given by participants for both discourse markers as well as both pronouns are shown in the lower part of Table 5.

5.2 Results

Figure 4 shows percentages of observed references to the first NP/the subject of the final context sentence. When the continuation stood in a causal relation to the preceding context, a very strong subject preference was observed – about 95% of all causal continuations made reference to the stimulus subject. With a consequential relation, in contrast, only about 20% of the continuations referred to the stimulus subject, which means that about 80% contained a reference to the experiencer object. In contrast to the difference depending on coherence, Figure 4 shows almost no difference between p- and d-pronouns.

Figure 4
Figure 4

Percentages of references to the stimulus argument (the first NP/the subject) of the final context sentence in Experiment 3.

In order to test whether the 95% subject preference with a cause relation differs significantly from the 80% object preference with a consequence relation, we analyzed the data with proportions of semantically congruent continuations as the dependent variable, using the same statistical methods as in the preceding experiments. In the coherence condition ‘cause’, references to the stimulus subject are congruent. In the coherence condition ‘consequence’, references to the experiencer object are congruent.

The analysis included the full factorial design (Coherence × Pronoun) as fixed effects. The generalized mixed-effects model, which is summarized in Table 6, shows a significant main effect of Coherence. With a cause relation, about 95% of all continuations were congruent with the semantic bias whereas the percentage of congruent continuations was only 80% with a consequence relation. The main effect of Pronoun and the interaction between Pronoun and Coherence were both non-significant.

Table 6

Generalized mixed model for Experiment 3, with ‘congruent with the semantic bias’ as dependent variable.

β^ SE z-value Pr (>|z|)
Intercept 2.854 0.313 9.13 <0.001
Coherence 2.298 0.344 6.68 <0.001
Pronoun 0.175 0.335 0.52 n.s.
Coherence × Pronoun –0.370 0.669 –0.55 n.s.

5.3 Test of the Bayesian Model

In order to assess the Bayesian Model of pronoun resolution in a quantitative way, we computed the correlation between observed and predicted values following the procedure described in Rohde & Kehler (2014). First, observed and predicted sentence means in each of the four experimental conditions were computed. Observed sentence means were computed from the results yielded by Experiment 3, giving us a total of 80 cases (20 sentences, each in 4 conditions). Predicted sentence means were computed by applying the right side of the Bayesian formula to the production data yielded by Experiment 1. In 24 of the 80 combinations of sentences with Coherence Relation and Pronoun, neither of the two referents was referred to by a pronoun in Experiment 1 (i.e., P(pro|stimulus) = P(pro|experiencer) = 0, with pro = p-pro or pro = d-pro). These cases had to be excluded in order to avoid division by zero, leaving 56 cases (39 for the p-pronoun, 17 for the d-pronoun). Correlations based on participant means could not be computed because the interpretation and production data are from different experiments.

Figure 5 shows observed and predicted mean percentages of references to the stimulus subject for the 56 data points that went into the by-item test of the Bayesian model. As a comparison with Figure 4 shows, the observed results for the restricted data set are very close to the observed result of the complete data set. The overall mean values predicted from the complete data set and the restricted data set are also highly similar, as shown in Table 4. Figure 5 reveals a close fit between observed and predicted references to NP1 (R2 = 0.80, F = 222.5, df = (1, 54), p < 0.01). Given that the restricted data set matches the complete data set with regard to both observed and predicted mean values, it is unlikely that the good quantitative fit between observed data and data predicted by the Bayesian theory is an artifact of restricting the data set.

Figure 5
Figure 5

Percentages of observed and predicted references to the stimulus argument (the first NP/the subject) of the final context sentence in Experiment 3. Predicted values are based on the Bayesian model of pronoun resolution.

5.4 Discussion

For the p-pronoun, Experiment 3 found a strong bias toward the stimulus argument in the presence of a cause relation and a strong bias toward the experiencer argument in the presence of a consequence relation. This finding is in agreement with prior findings on the influence of semantic bias on pronoun resolution. For the d-pronoun, Experiment 3 showed the very same preferences as for the p-pronoun. Thus, in the presence of strong semantic biases, p- and d-pronouns no longer showed complementary preferences, in contrast to the many experiments investigating contexts with no or at least no strong semantic bias.

Our finding that semantic bias affects p- and d-pronouns in the same way replicates Järvikivi et al.’s (2017) results for p- and d-pronouns in Finnish. With regard to structural biases, the results of Experiment 3 differ from Järvikivi et al.’s results in that we did not find evidence that structural biases affected the interpretation of the two pronouns. The preference toward the first-mentioned stimulus subject in the presence of a cause relation was equally strong for p- and d-pronouns, and the same holds for the preference toward the second-mentioned experiencer object in the presence of a consequence relation. We hypothesize that the lack of a difference between p- and d-pronouns in Experiment 3 is due to the fact that we constructed contexts in such a way that neither the stimulus subject nor the experiencer object was strongly favored on structural grounds. In such a situation, semantic biases may take full control over how pronouns are interpreted. Järvikivi et al., in contrast, used one-sentence contexts in which the subject was probably understood as topic by default, so all structural biases favored the subject NP as antecedent of the p-pronoun and the object NP as antecedent of the d-pronoun.

In sum, the results yielded by Experiment 3 show a very strong preference for the stimulus subject in the presence of a cause relation and a somewhat weaker but still strong preference for the experiencer object in the presence of a consequence relation, with no difference between p- and d-pronouns. Given the prior production data, this resulted in a close fit of observed data and data predicted by the Bayesian model. We postpone a more thorough discussion of these results to the general discussion because of a possible objection against Experiment 3. In accordance with prior research on implicit causality and pronoun resolution, we found a strong preference for the stimulus subject when the prompt contained a causal discourse marker. For the p-pronoun, this preference aligns with the structural bias of p-pronouns toward subject antecedents. For the d-pronoun, in contrast, the observed preference for stimulus subjects contradicts the often-found object bias of d-pronouns. Due to the experimental set-up, participants were required to write a continuation for sentences starting with a d-pronoun in contexts that exerted a strong semantic pressure to associate the d-pronoun with a subject antecedent. It is therefore possible that participants wrote continuations in which the d-pronoun referred to the stimulus subject although such continuations were not fully acceptable to them and they would not have used a d-pronoun if they were free to choose a referential expression. In order to exclude this possibility, we ran a further experiment investigating the acceptability of selected continuations from Experiment 3.

6 Experiment 4

Experiment 4 investigates whether participants in Experiment 3 produced acceptable continuations. Maybe participants in Experiment 3 established reference of a d-pronoun towards the structurally dispreferred subject due to the combination of a strong semantic subject bias and a d-pronoun in the continuation prompt, therefore forcing them to interpret a d-pronoun as referring to a subject referent. In order to address this issue, we had a new group of participants rate the acceptability of a selection of original continuations from Experiment 3. All continuations were selected in a way that the pronoun referred to the referent favored by the coherence relation, that is, the stimulus subject with a causal relation and the experiencer object with a consequence relation. As in Experiments 2 and 3, the coherence relation was explicitly signaled by either nämlich (cause, ‘this was for the reason that’) or deshalb (consequence, ‘this had the consequence that’).

If reference to the stimulus subject by means of a d-pronoun is an artifact of the particular way Experiment 3 was set up, the acceptability ratings of the continuations produced in this experiment are expected to show an interaction between pronoun and coherence. With a consequence relation, the d-pronoun’s structural bias and the semantic bias converge on the experiencer object. Acceptability ratings should accordingly be high for d-pronouns with a consequence relation. With a cause relation, the d-pronoun’s structural bias still favors the experiencer object but the semantic bias now favors the stimulus subject. Thus, when the d-pronoun is interpreted as coreferential with the stimulus subject, the continuation makes sense but acceptability should go down because this interpretation is in conflict with the d-pronoun’s structural object orientation. If, on the other hand, the d-pronoun is interpreted as coreferential with the object, the structural constraint is met but the continuation is implausible because it attributes causality to the experiencer instead of the stimulus. This should thus result in reduced acceptability. For continuations given to p-pronoun prompts, in contrast, either the reverse is expected in virtue of the p-pronoun’s subject orientation, or no difference at all is expected because the interpretation of p-pronouns is more flexible than that of d-pronouns.

On the other hand, if the results of Experiment 3 are not distorted and reference to the stimulus subject by means of a d-pronoun is a true option offered by the grammar given appropriate contextual conditions, neither a main effect of coherence nor an interaction between pronoun and coherence is expected. In this case, d-pronouns referring to a stimulus subject that is highly expected given the preceding context should be as acceptable as d-pronouns referring to an experiencer object in a correspondingly supportive context. Analogous considerations apply to p-pronouns.

An additional prediction, which is independent of coherence and the syntactic function of the pronouns’ antecedent, is a main effect of pronoun. According to prescriptive grammars, d-pronouns referring to humans are considered impolite. In line with this prescriptive rule, Vogel (2019) found that d-pronouns are less acceptable in formal contexts than p-pronouns. It should be noted, however, that despite this prescriptive rule, corpus data show that even in written language the majority of d-pronouns have a human antecedent (Portele & Bader 2016).

6.1 Method

6.1.1 Participants

Eighteen students of the Goethe University Frankfurt, all native speakers of German, participated in Experiment 4. No participant had already participated in any of the prior experiments.

6.1.2 Materials

Experiment 4 investigates the two factors Pronoun (p- versus d-pronoun) and Coherence (cause versus consequence). The material for Experiment 4 consists of the 20 contexts from Experiment 3 together with selected continuations given by participants in Experiment 3. For each context, four continuations according to the two factors Coherence (cause versus consequence) and pronoun (p-pronoun versus d-pronoun) were randomly selected, subject to the following constraints. First, the continuations for each of the two coherence relations had to make reference to the referent semantically favored by the coherence relation (stimulus subject/cause, experiencer object/consequence). Second, the length of the continuations was restricted to 5–9 words and 30–50 characters in order to avoid an overly long questionnaire. Example items for all four conditions are included (in boldface) in Table 5. The 20 items were distributed across four lists according to a Latin square design. 20 filler items were added to each list.

6.1.3 Procedure

Like Experiment 2, Experiment 4 was run on Ibex Farm (Drummond et al. 2016). Participants saw each item on a separate browser page with the numbers 1 to 7 displayed beneath the item. A short instruction told participants that 1 corresponds to “totally unacceptable” and 7 to “totally acceptable”. In order to ease the association between the numbers and their intended meaning, each trial included the label “totally unacceptable” to the left of the 1–7 scale and the label “totally acceptable” to the right. Participants were asked to judge the acceptability of short texts consisting of two or three sentences by clicking on one of the numbers. The instructions asked participants to judge the acceptability of the last sentence in connection to its preceding sentence. We refrained from including example sentences.

6.2 Results

The mean acceptability values given by participants in Experiment 4 are shown in Figure 6. Because the rating scale was an ordinal scale, we analyzed the judgment data using the ordinal package (Christensen 2019) in R. An ordinal mixed-effects model with Antecedent and Coherence as fixed effects and participants and items as random effects revealed a significant effect of Pronoun ( β^=1.09138 , SE = 0.42114, z-value = 2.591, p < 0.01), reflecting the finding that sentences with p-pronouns were rated higher than sentences with d-pronouns. The factor Coherence ( β^=0.16868 , SE = 0.36889, z-value = 0.457, n.s.) and the interaction between Coherence and Pronoun ( β^=0.01195 , SE = 0.40398, z-value = 0.030, n.s.) were not significant.

Figure 6
Figure 6

Mean acceptability in Experiment 4 for selected continuations given by participants in Experiment 3. Reference was established toward the semantically favored referent – the stimulus subject in the cause condition and the experiencer object in the consequence condition.

6.3 Discussion

The most important finding of Experiment 4 is that coherence had no effect on acceptability, neither as a main effect nor in interaction with the factor Pronoun. Because reference was always made to the semantically preferred antecedent – the stimulus subject with a cause relation and the experiencer object with a consequence relation – this means that references to the subject and references to the object were of equal acceptability for both p- and d-pronoun. There is thus no evidence that the particular task requirements of Experiment 3 induced participants to come up with continuations that are unacceptable.

In contrast to the factor Coherence, the factor Pronoun had an effect in Experiment 4, with higher ratings for sentences containing p-pronouns than sentences with d-pronouns. This effect was expected given the prescriptive advice to avoid d-pronouns when referring to human antecedents. With mean ratings of about 4.5, the d-pronoun sentences in Experiment 4 received substantially higher ratings than the d-pronoun sentences of Vogel (2019), for which mean ratings of about 3.4 were found. This indicates that our contexts were appropriate for the use of d-pronouns. This conclusion is corroborated by the finding that the difference between the rating for sentences with p-pronouns and sentences with d-pronouns was not large, in accordance with the high rate of human referents for d-pronouns in written language use (Portele & Bader 2016). In addition to possible prescriptive influences, one reason why acceptability was still higher for p-pronouns than for d-pronouns could be that there was no (large) prominence difference between the two referents – the stimulus was the subject in first position whereas the experiencer was the sentence topic – and d-pronouns may be fully acceptable only when used for referents low in prominence.8

7 General discussion

The four experiments reported in this paper have yielded several new findings on the interpretation and production of referentially ambiguous p- and d-pronouns. With regard to interpretation, the main finding is that p- and d-pronouns are influenced by semantic bias in the same way. Experiment 3 found that both pronouns were resolved almost without exception toward the stimulus argument in the presence of a cause relation whereas the preferred antecedent of both pronouns was the experiencer argument with a consequence relation. In this case, the preference was somewhat weaker than the preference found with a cause relation, but it was still strong. Although prior work by Järvikivi et al. (2017) and Portele & Bader (2020) suggested that semantic bias influences both pronouns in the same direction, it was not known whether the strength of the semantic bias effect is the same for the two pronouns.9 For p-pronouns, a strong effect of semantic bias was expected from the prior literature. For d-pronouns, it was an open question whether semantic pressures could turn the often observed object preference into a subject preference. Experiment 3 found a clear answer to this question. The effect of semantic bias on d-pronouns was as strong as the effect on p-pronouns. Thus, semantic bias is clearly different from structural bias by affecting p- and d-pronouns in equal, not in complementary ways.

For production, Experiment 1 replicated previous work (e.g., Stevenson et al. 1994; Fukumura & van Gompel 2010) by finding that continuations given in causal contexts most often included the stimulus whereas the experiencer was mentioned more frequently in consequential contexts. Regarding the choice to pronominalize, our results support the importance of syntactic factors. Because subjecthood and topichood were not confounded in our materials, the results of Experiments 1 and 2 show that subjecthood is the major determinant of pronoun choice, which contradicts the strongest version of the Topichood Hypothesis (Rohde & Kehler 2014) according to which topichood alone governs the choice of pronouns. However, topichood still seems to have affected pronoun choice in Experiments 1 and 2, as witnessed by the finding that the pronoun rate was only slightly higher for subjects (=non-topic) than for objects (=topic), in contrast to experiments where subjecthood and topichood are aligned and pronoun rates for subject/topic antecedents are much larger than those for object/non-topic antecedents (for German, see Portele & Bader 2020). Our results therefore argue in favor of a weakened Topichood Hypothesis, with subjecthood being an additional factor influencing pronoun production. Given that Zhan et al. (2020)’s experiments on Chinese applied a similar manipulation as Rohde & Kehler (2014) but did not find a corresponding effect of topichood, it is likely that subjecthood and topichood have different weights in different languages.

As for the production of demonstratives, only tentative conclusions are possible because they were produced rarely in Experiments 1 and 2. In Experiment 1, the highest number of demonstratives occurred for stimulus subjects in cause relations (n = 19). This finding argues against the claim that demonstratives are mainly used for referents bearing the grammatical function of object. Since the object was always the topic of the previous sentence in our experiments, references to the subject are in accordance with the anti-topic bias of demonstratives. On the other hand, demonstratives were also produced for making reference to the object and thus the non-topic. In sum, our results for demonstratives argue against approaches capitalizing on a single factor (e.g., syntactic function, see Patil et al. 2020) and instead for a multifactorial approach as proposed in Bader & Portele (2019).

With regard to semantic bias due to coherence, the interpretive preferences observed in Experiment 3 are in close correspondence with the next-mention preferences found in Experiment 1. More specifically, the results presented in this paper provide supportive evidence for the Bayesian Theory of Pronoun Resolution proposed by Kehler and colleagues according to which next-mention biases and pronominalization biases make independent contributions to pronoun resolution. The fit between observed and predicted values found for Experiment 3 was close for both p- and d-pronouns, showing that the Bayesian Theory accounts for d-pronouns as well. A similar close fit was found by Bader & Portele (2019) in the first application of the Bayesian Theory to p- and d-pronouns in German, but there semantic biases were rather weak so structural factors had to bear the burden of resolving the referential ambiguity. Taken together, the results presented here and the results presented in Bader & Portele (2019) show that the Bayesian Theory accounts for the interpretation of German p- and d-pronouns under widely different combinations of structural and semantic biases (see also Portele & Bader 2020, and Patterson et al. 2022).

The finding that semantic bias can cause d-pronouns to have the same interpretative preferences as p-pronouns argues against the often made claim that p-pronouns preferentially refer to prominent referents whereas d-pronouns are confined to make reference to referents that are least prominent. According to this claim, a referent that is the preferred target of a p-pronoun must be prominent. If so, such a referent should repel reference by a d-pronoun. This is clearly not the pattern that we observed for the coherence manipulation in Experiment 3. The referent that was highly expected due to the combined effect of verb semantics and coherence relation, and thus can be considered prominent, was the preferred antecedent of both the p- and the d-pronoun. As shown by prior work, when neither referent is (strongly) favored on semantic grounds, p- and d-pronouns typically show complementary behavior, but even then identical preferences may arise for the two pronouns under certain structural configurations. A configuration of this kind is provided by object-before-subject sentences, as first shown by Kaiser & Trueswell (2008) for Finnish and later replicated for German by Bader & Portele (2019). Because p-pronouns are biased toward the subject referent whereas d-pronouns are biased toward the final referent, the same referent is preferred when the subject is the final NP in a sentence. Restricting the notion of prominence to structural factors would thus not rescue the hypothesis that p-pronouns go for prominent and d-pronouns for non-prominent antecedents.

In sum, the results presented in this paper add to the growing body of evidence suggesting that, instead of identifying different prominence factors influencing the interpretation of different types of pronouns, the perspective taken by research on pronoun resolution should shift to production biases. The success of the Bayesian Theory testifies to the fruitfulness of this perspective. For example, since p(referent), the term of the Bayesian formula that is (mainly) governed by semantic biases, is independent of the particular pronoun under consideration, it follows immediately that coherence affected p- and d-pronouns in the same way in Experiment 3. With regard to the second term of the Bayesian formula, P(pronoun|referent), we see a close link between the Bayesian theory of Kehler et al. (2008) and the form-specific approach of Kaiser & Trueswell (2008). We think that these approaches complement each other. With the term P(pronoun|referent), the Bayesian theory provides the means to capture pronoun-specific preferences. Since P(pronoun|referent) captures the probability that a speaker uses a pronoun for a given referent, pronoun-specific interpretation preferences are the result of different production biases associated with particular pronouns according to the Bayesian theory. Under this perspective, additional research into the production of p- and d-pronouns will be crucial for understanding why these two pronoun types show different interpretive preferences.

Notes

  1. For readability, we use the term “speaker” as an umbrella term for language producers in all modalities, including speaking, writing, and signing. The same holds for the term “hearer”. [^]
  2. Kaiser (2011) investigated the reverse question whether p- and d-pronouns induce different coherence relations in participants’ sentence continuations. [^]
  3. A second usage of nämlich is equivalent to English namely, as in I’ve been to the US, namely to New York and Los Angeles (example provided by an anonymous reviewer). This usage is excluded for the prompts in (6) due to the position of nämlich directly after the finite auxiliary. [^]
  4. This does not hold without exceptions (e.g., Bader & Portele 2019), but exceptions are rare and occur only under special conditions. [^]
  5. Like the current paper, Rohde & Kehler (2014) understand topic as sentence topic in the sense of Reinhart (1981), Lambrecht (1996), and others. However, as pointed out by one of the reviewers, Rohde & Kehler (2014) operationalize this notion via the first position in isolated sentences, whereas we operationalize topic via the preceding discourse. For reasons of space, we cannot discuss this issue in more detail. [^]
  6. As pointed out by one of the anonymous reviewers, this setup might make it rather obvious to participants that we are interested in the production of p- vs. d-pronouns. Although we agree, we do not think that our specific setup increased this reasoning compared to other forced-choice variants, such as choosing from a set of options. [^]
  7. Paper-pencil questionnaires were used before the Covid-19 pandemic, web-based questionnaires during the pandemic. [^]
  8. We thank an anonymous reviewer for pointing out this possibility to us. [^]
  9. While this paper was under review, the study by Patterson et al. (2022) appeared. Like our study, this study found that semantic bias affects p- and d-pronouns in similar ways, and that the Bayesian model of Kehler and colleagues accounts for the interpretation of both p- and d-pronouns. [^]

Data accessibility statement

Materials, data and R-scripts can be found on OSF: https://osf.io/ft256/?view_only=3e7d05963a3c49569ff2fd856b05b292.

Acknowledgements

The authors would like to thank Yannick Naegelen and Sebastian Walter for help with data annotation and scoring.

Competing interests

The authors have no competing interests to declare.

References

Ahrenholz, Bernt. 2007. Verweise mit Demonstrativa im gesprochenen Deutsch: Grammatik, Zweitspracherwerb und Deutsch als Fremdsprache. Berlin & New York: Walter de Gruyter. DOI:  http://doi.org/10.1515/9783110894127

Arnold, Jennifer E. 2001. The effect of thematic roles on pronoun use and frequency of reference continuation. Discourse Processes 31(2). 137–162. DOI:  http://doi.org/10.1207/S15326950DP3102_02

Au, Terry Kit-fong. 1986. A verb is worth a thousand words: The causes and consequences of interpersonal events implicit in language. Journal of Memory and Language 25(1). 104–122. DOI:  http://doi.org/10.1016/0749-596X(86)90024-0

Bader, Markus & Portele, Yvonne. 2019. The interpretation of German personal and d-pronouns. Zeitschrift für Sprachwissenschaft 38(2). 155–190. DOI:  http://doi.org/10.1515/zfs-2019-2002

Bates, Douglas & Mächler, Martin & Bolker, Ben & Walker, Steve. 2015. Fitting linear mixedeffects models using lme4. Journal of Statistical Software 67(1). 1–48. DOI:  http://doi.org/10.18637/jss.v067.i01

Bittner, Dagmar. 2019. Implicit Causality in younger and older adults. Linguistics Vanguard 5(s2). DOI:  http://doi.org/10.1515/lingvan-2018-0023

Bosch, Peter & Rozario, Tom & Zhao, Yufan. 2003. Demonstrative pronouns and personal pronouns. German ‘der’ vs. ‘er’. In Proceedings of the EACL 2003 workshop on the Computational Treatment of Anaphora. Budapest.

Bosch, Peter & Umbach, Carla. 2007. Reference determination for demonstrative pronouns. ZAS Papers in Linguistics 48. 39–51. DOI:  http://doi.org/10.21248/zaspil.48.2007.353

Bott, Oliver & Solstad, Torgrim. 2014. From Verbs to Discourse: A novel account of implicit causality. In Hemforth, Barbara & Mertins, Barbara & Fabricius-Hansen, Cathrine (eds.), Meaning and understanding across languages, 213–251. Chicago: Springer. DOI:  http://doi.org/10.1007/978-3-319-05675-3_9

Christensen, Rune Haubo B. 2019. ordinal—Regression models for ordinal data. R package version 2019. 4–25. http://www.cran.r-project.org/package=ordinal/.

Comrie, Bernard. 1997. Pragmatic binding: Demonstratives as anaphors in Dutch. In Proceedings of the Annual Meeting of the Berkeley Linguistics Society, vol. 23, 50–61. DOI:  http://doi.org/10.3765/bls.v23i1.1281

Crawley, Rosalind A. & Stevenson, Rosemary J. 1990. Reference in single sentences and in texts. Journal of Psycholinguistic Research 19(3). 191–210. DOI:  http://doi.org/10.1007/BF01077416

Crawley, Rosalind A. & Stevenson, Rosemary J. & Kleinman, David. 1990. The use of heuristic strategies in the interpretation of pronouns. Journal of Psycholinguistic Research 19(4). 245–264. DOI:  http://doi.org/10.1007/BF01077259

Crinean, Marcelle & Garnham, Alan. 2006. Implicit causality, implicit consequentiality and semantic roles. Language and Cognitive Processes 21(5). 636–648. DOI:  http://doi.org/10.1080/01690960500199763

Drummond, Alex & Von Der Malsburg, Titus & Erlewine, Michael Y. & Yoshida, Fumo & Vafaie, Mahsa. 2016. Ibex Farm. https://github.com/addrummond/ibex.

Ellert, Miriam. 2013. Information structure affects the resolution of the subject pronouns er and der in spoken German discourse. Discours. Revue de linguistique, psycholinguistique et informatique 12(12). 3–24. DOI:  http://doi.org/10.4000/discours.8756

Fukumura, Kumiko & van Gompel, Roger P. G. 2010. Choosing anaphoric expressions: Do people take into account likelihood of reference? Journal of Memory and Language 62(1). 52–66. DOI:  http://doi.org/10.1016/j.jml.2009.09.001

Garvey, Catherine & Caramazza, Alfonso. 1974. Implicit causality in verbs. Linguistic Inquiry 5(3). 459–464.

Gernsbacher, Morton Ann & Hargreaves, David J. 1988. Accessing sentence participants: The advantage of first mention. Journal of Memory and Language 27(6). 699–717. DOI:  http://doi.org/10.1016/0749-596X(88)90016-2

Grosz, Barbara J. & Joshi, Aravind K. & Weinstein, Scott. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21. 203–225. DOI:  http://doi.org/10.21236/ADA324949

Hinterwimmer, Stefan. 2015. A unified account of the properties of German demonstrative pronouns. In Grosz, Patrick & Patel-Grosz, Pritty & Yanovich, Igor (eds.), Proceedings of the workshop on pronominal semantics at NELS 40, 61–107. Amherst, MA: GSLA Publications.

Hobbs, Jerry R. 1979. Coherence and coreference. Cognitive Science 3(1). 67–90. DOI:  http://doi.org/10.1207/s15516709cog0301_4

Hoek, Jet & Kehler, Andrew & Rohde, Hannah. 2021. Pronominalization and expectations for re-mention: Modeling coreference in contexts with three referents. Frontiers in Communication 6. DOI:  http://doi.org/10.3389/fcomm.2021.674126

Holler, Anke & Suckow, Katja. 2016. How clausal linking affects noun phrase salience in pronoun resolution. In Holler, Anke & Suckow, Katja (eds.), Empirical perspectives on anaphora resolution, 61–85. Berlin & Boston: de Gruyter. DOI:  http://doi.org/10.1515/9783110464108-005

Järvikivi, Juhani & van Gompel, Roger P. G. & Hyönä, Jukka. 2017. The interplay of implicit causality, structural heuristics, and anaphor type in ambiguous pronoun resolution. Journal of Psycholinguistic Research 46(3). 525–550. DOI:  http://doi.org/10.1007/s10936-016-9451-1

Kaiser, Elsi. 2011. On the relation between coherence relations and anaphoric demonstratives in German. In Proceedings of the 2010 Annual Conference of the “Gesellschaft für Semantik”. Sinn und Bedeutung, vol. 15, 337–352.

Kaiser, Elsi & Fedele, Emily. 2019. Reference resolution: A psycholinguistic perspective. In Gundel, Jeanette & Abbott, Barbara (eds.), The Oxford Handbook of Reference, 309–336. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780199687305.013.15

Kaiser, Elsi & Trueswell, John C. 2008. Interpreting pronouns and demonstratives in Finnish: Evidence for a form-specific approach to reference resolution. Language and Cognitive Processes 23(5). 709–748. DOI:  http://doi.org/10.1080/01690960701771220

Kehler, Andrew & Kertz, Laura & Rohde, Hannah & Elman, Jeffrey L. 2008. Coherence and coreference revisited. Journal of Semantics 25(1). 1–44. DOI:  http://doi.org/10.1093/jos/ffm018

Kehler, Andrew & Rohde, Hannah. 2013. A probabilistic reconciliation of coherence-driven and centering-driven theories of pronoun interpretation. Theoretical Linguistics 39(1–2). 1–37. DOI:  http://doi.org/10.1515/tl-2013-0001

Kehler, Andrew & Rohde, Hannah. 2017. Evaluating an expectation-driven question-underdiscussion model of discourse interpretation. Discourse Processes 54(3). 219–238. DOI:  http://doi.org/10.1080/0163853X.2016.1169069

Lambrecht, Knud. 1996. Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511620607

Patil, Umesh & Bosch, Peter & Hinterwimmer, Stefan. 2020. Constraints on German diese demonstratives: Language formality and subject-avoidance. Glossa: a journal of general linguistics 5(1). 1–22. DOI:  http://doi.org/10.5334/gjgl.962

Patterson, Clare & Schumacher, Petra B. & Nicenboim, Bruno & Hagen, Johannes & Kehler, Andrew. 2022. A Bayesian Approach to German Personal and Demonstrative Pronouns. Frontiers in Psychology 12. DOI:  http://doi.org/10.3389/fpsyg.2021.672927

Portele, Yvonne & Bader, Markus. 2016. Accessibility and referential choice: Personal pronouns and d-pronouns in written German. Discours. Revue de linguistique, psycholinguistique et informatique 18. 1–41. DOI:  http://doi.org/10.4000/discours.9188

Portele, Yvonne & Bader, Markus. 2020. Coherence and the interpretation of personal and demonstrative pronouns in German. In Holler, Anke & Suckow, Katja & de la Fuente, Israel (eds.), Information structuring in discourse, 24–55. Leiden: Brill. DOI:  http://doi.org/10.1163/9789004436725_003

R Core Team. 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria. https://www.R-project.org/.

Reinhart, Tanya. 1981. Pragmatics and linguistics: An analysis of sentence topics. Philosphica 27(1). 53–94. DOI:  http://doi.org/10.21825/philosophica.82606

Rohde, Hannah. 2019. Pronouns. In Cummins, Chris & Katsos, Napoleon (eds.), The Oxford Handbook of R, 452–473. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780198791768.013.21

Rohde, Hannah & Kehler, Andrew. 2014. Grammatical and information-structural influences on pronoun production. Language, Cognition and Neuroscience 29(8). 912–927. DOI:  http://doi.org/10.1080/01690965.2013.854918

Rosa, Elise C. & Arnold, Jennifer E. 2017. Predictability affects production: Thematic roles can affect reference form selection. Journal of Memory and Language 94. 43–60. DOI:  http://doi.org/10.1016/j.jml.2016.07.007

Rudolph, Udo & Försterling, Friedrich. 1997. The psychological causality implicit in verbs: A review. Psychological Bulletin 121(2). 192–218. DOI:  http://doi.org/10.1037/0033-2909.121.2.192

Schumacher, Petra B. & Backhaus, Jana & Dangl, Manuel. 2015. Backward- and forward-looking potential of anaphors. Frontiers in psychology 6. 1–14. DOI:  http://doi.org/10.3389/fpsyg.2015.01746

Smyth, Ron. 1994. Grammatical determinants of ambiguous pronoun resolution. Journal of Psycholinguistic Research 23(3). 197–229. DOI:  http://doi.org/10.1007/BF02139085

Stevenson, Rosemary & Knott, Alistair & Oberlander, Jon & McDonald, Sharon. 2000. Interpreting pronouns and connectives: Interactions among focusing, thematic roles and coherence relations. Language and Cognitive Processes 15(3). 225–262. DOI:  http://doi.org/10.1080/016909600386048

Stevenson, Rosemary J. & Crawley, Rosalind A. & Kleinman, David. 1994. Thematic roles, focus and the representation of events. Language and Cognitive Processes 9(4). 519–548. DOI:  http://doi.org/10.1080/01690969408402130

Vogel, Ralf. 2019. Grammatical taboos:: An investigation on the impact of prescription in acceptability judgement experiments. Zeitschrift für Sprachwissenschaft 38(1). 37–79. DOI:  http://doi.org/10.1515/zfs-2019-0002

Vogels, Jorrig. 2019. Both thematic role and next-mention biases affect pronoun use in Dutch. In Goel, Ashok K. & Seifert, Colleen M. & Freksa, Christian (eds.), Proceedings of the 41st Annual Conference of the Cognitive Science Society, 3029–3035. Montreal, Q.B.: Cognitive Science Society.

Weatherford, Kathryn C. & Arnold, Jennifer E. 2021. Semantic predictability of implicit causality can affect referential form choice. Cognition 214. 104759. DOI:  http://doi.org/10.1016/j.cognition.2021.104759

Weinert, Regina. 2011. Demonstrative vs Personal and Zero Pronouns in Spoken German. German as a Foreign Language 2011(1). 71–98.

Zhan, Meilin & Levy, Roger & Kehler, Andrew. 2020. Pronoun interpretation in Mandarin Chinese follows principles of Bayesian inference. PLoS ONE 15(8). e0237012. DOI:  http://doi.org/10.1371/journal.pone.0237012