1 Introduction

Language is rife with anaphoric expressions, that is, expressions which receive their interpretation through reference to some entity in the mental model of discourse. Reference to objects,1 such as that commonly associated with personal pronouns and demonstratives (e.g., it/they/that/those),2 have frequently been the subject of study (Prince 1992; Gundel et al. 1993; Kehler & Rohde 2013; inter alia), but research on event anaphora is less common (though not non-existent; see Webber 1986; 1987; 1991; Landman & Morzycki 2002; Kehler & Ward 2007; Miller 2011; Luce et al. 2018). Do thus is one such event anaphoric construction, which has until now gone unnoticed.

(1) (Forum post)
  Most firms that pay dividends do thus on either a period of time or quarterly basis.

In the context of this paper, anaphoric reference to events is viewed as the establishment of a connection between a linguistically present referring expression (such as it or thus) and a referent in a hearer’s mental model of the unfolding discourse (Webber 1987). For the reference to be event reference, the referent must be an event (or type of event). Event referents may be introduced through linguistic means (e.g., explicit mention of an event with a verb phrase), or may be salient in the extra-linguistic context. Care should be taken not to conflate verbs or verb phrases themselves with the events that they introduce. Neo-Davidsonian approaches to event semantics, for example, treat verbs as predicates of an event argument which is existentially quantified over (Parsons 1990). Thus, verbs denote sets of events, i.e., types of events, in the same way that common nouns denote sets of objects; it is the existential closure over the event argument which introduces the event referent, in the same fashion that the indefinite determiner a introduces the object referent in a phrase like a house. Adverbials, as well, are seen as denoting sets of events, and as such, adverbial referring expressions are taken to refer to types of events. When used in conjunction with a fully lexical verb, an adverbial referring expression serves to further restrict the type of event (2).

(2) (Kehler & Ward 1999: ex. 25b)
  If you thought that the questions could be answered courteously, why didn’t you answer them so?

In (2), so refers back to the type of event denoted by courteously, further restricting the directly provided type of event denoted by answer, leading to a representation along the lines of that shown in (3).

(3) ∃e.answer(e) & courteous(e)

When adverbial referring expressions are used with main verb do, however, they provide almost all of the relevant restrictions on the event.3 This is because do itself is semantically bleached, providing little information other than that the type of event in question is in some manner agentive or volitional (Ross 1972; Culicover & Jackendoff 2005; inter alia). We can see how this works if we assume a denotation for do along the lines of (4), adapted from Hallman (2004), and apply it to (1) as in (5).

(4) ‖do‖ = λPλxλe.P(e) & agent(e,x)
(5) a. ‖pay dividends‖ = λe.pay(e) & theme(e,dividends)
  b. ‖thus‖ = λPλe.P(e) ➔ λe.pay(e) & theme(e,dividends)
  c. [‖do‖](‖thus‖) = [λPλxλe.P(e) & agent(e,x)](λe.pay(e) & theme(e,dividends)) = λxλe.pay(e) & agent(e,x) & theme(e,dividends)

In (4), do is shown to be looking for some predicate of events (denoting a type of event) for its interpretation that it will restrict to being agentive. In (5a), pay dividends denotes the set of events that are paying events and that have dividends as their theme. In (5b), thus is shown to be looking for some predicate of events for its interpretation, and it finds this predicate in the antecedent pay dividends. Thus is therefore interpreted as ‘pay dividends’. In (5c), in composing with thus, do finds the predicate of events that it was looking for via the interpretation of thus, leading to do thus itself being interpreted as ‘pay dividends’.4

Through a corpus-based analysis of naturally occurring data, this paper probes the constraints on the felicitous usage of do thus, which in turn allows for a more accurate description not only of when do thus may be used, but also of what the differences are between do thus and other superficially similar event anaphora (i.e., do so). I show that do thus and the more familiar construction do so represent distinct anaphoric constructions, and that they stand in a similar relationship to one another as that found between personal pronouns and demonstratives.

This paper begins with a brief introduction to do thus (Section 2.1), followed by a discussion of the semantics and compositionality of the phrase (Section 2.2), as well as a discussion of superficial similarities and differences between it and the better studied expression do so (Section 2.3). I then discuss differences in the referential behavior of personal pronouns and demonstratives (Section 2.4). Next, I detail how the corpus data was compiled, and the annotation scheme used (Section 3). This is followed by the results of the analysis (Section 4) and a discussion (Section 5). The paper concludes with Section 6.

2 Background

2.1 About do thus

Do thus is an event anaphoric construction composed of main verb do and the adverbial referring expression thus, which can be roughly paraphrased as ‘in this way’.56 In example (1), do thus refers back to a paying-dividends type of event (note the similarity here between thus and the demonstrative this; this similarity will re-emerge later). That do is the main verb, as opposed to auxiliary do, can be demonstrated by showing the necessity of an additional do (the auxiliary do instantiating do-support) when (1) is negated, as in (6).

(6) a. *most firms that pay dividends do not thus
  b.   most firms that pay dividends do not do thus

The adverbial nature of thus can best be seen by noting that it may appear either preverbally (7) or postverbally (1), a behavior that it has in common with many other adverbials, as well as with so in do so (Kehler & Ward 1999).

(7) (Personal website)
  As you’ve read it, this article proposes to retrace the history of the invention of the sewing machine. And, thus doing, it will unveil the name of its true creator.

As mentioned above, do thus is an understudied, or rather unstudied, event anaphoric construction. To the best of my knowledge, this construction has never been discussed in the literature. Grammars of English, such as Quirk et al. (1985) and Huddleston & Pullum (2002) make no mention of do thus, and only very briefly mention thus in its event anaphoric capacity,7 with less than 10 sentences between them (always as a more formal alternative to so). Similarly, a scan of the literature reveals only one journal article on thus (Granath 2007), and none on do thus. By comparison, both grammars, as well as numerous journal articles, discuss other event anaphoric constructions, such as do it, do this, do that and do so (Lakoff & Ross 1966; Hankamer & Sag 1976; Sag & Hankamer 1984; Kehler & Ward 1999; Houser 2010; inter alia).

One possible reason for this discrepancy is the relative scarcity of do thus. For the entire history of Modern English,8 this construction has been attested, but it has seemingly never been very frequent, at least in published material. Using Google’s Ngram Viewer (Michel et al. 2010) to compare the frequency of do thus to that of do so (Figure 1), one can see that do so has, at least for the last 400 years, been considerably more frequent than do thus, usually by several orders of magnitude. Though it may appear that do thus dropped out of use around 1750, this is only a result of comparing it to do so on the same graph. An Ngram of only do thus shows that, though low, its frequency remains above zero, a fact that is supported by the large number of naturally occurring examples that I was able to find from the 21st century (Figure 2).9

Figure 1
Figure 1

Frequency of do thus (bottom line) versus do so (top line) since 1600, from Google’s NGram Viewer.

Figure 2
Figure 2

Frequency of do thus since 1800, from Google’s NGram Viewer.

2.2 Compositionality

Constructions like do thus can be analyzed in two separate ways: compositionally or constructionally. A compositional analysis would treat do and thus each as terminal nodes in the syntactic VP whose meaning composes together to yield the meaning of the VP (8). A constructional analysis would treat the entire construction as a pro-VP, i.e., as being the only terminal node in the VP, and as not capable of being broken down into its constituent parts (9). Both types of analyses have been made for do so (see Culicover & Jackendoff (2005) and Lakoff & Ross (1966) for constructional analyses; Houser (2010) and Kehler & Ward (1999) for compositional analyses), but to the best of my knowledge, no one has claimed that do it/this/that is constructional. In the following analysis of do thus, I will be taking a compositional approach. The reasoning behind my doing so is delayed until after the semantic analyses.

(8) VP[V[do] ADV[thus]]
(9) VP[do thus]

2.2.1 The semantics of do

Do is a maximally general event denoting predicate (Kehler & Ward 1999). What this means is that do has essentially no semantic content of its own; its meaning comes almost entirely from the context of the utterance (either linguistic or extralinguistic) and world knowledge.10 The only requirements placed on the felicitous usage of do is that the type of event that it is interpreted as be capable of being conceptualized as agentive (Ross 1972).

It is this generality that is responsible for the seemingly obligatory presence of an adverbial with intransitive do (one can do well, or do poorly, or do without, but it is odd to say simply that one does), but it affects transitive do as well. Most transitive verbs require an object simply to satisfy the demands of their argument structure, but the choice of argument does not affect the meaning of the verb, e.g., in watch a dance and watch a party, the type of event denoted by watch is the same, one in which the subject is directing their gaze towards the entity denoted by the object. The same is not true for do a dance and do a party. In these examples, the event denoted is entirely dependent on the object: do a dance essentially means dance, while do a party likely means throw a party. This latter meaning could easily have been different due to contextual factors (e.g., DJ a party), but it would still be constrained to being a type of event made salient by the noun party. Since the object of transitive do is essential in guiding the hearers to the intended interpretation, it is necessary in a way that goes beyond that found with most transitive verbs.

Do’s need of further specification can be met in one of several ways, or through a combination thereof.

(10) a. Larry does some digging = digs (gerund)
  b. Matilda does a dance = dances (zero-derived noun)
  c. Neville does the dishes = washes the dishes (conventional meaning)
  d. Anne said not to eat the cookies, but I did so anyway = ate the cookies (anaphoric reference)
  e. [Ted sees Louise getting ready to feed his dog chocolate]  
    Don’t do that! = feed the dog chocolate (exophoric reference)

The most straightforward way for this to occur is by having as an object some event denoting nominal (e.g., a gerund, or a zero-derived noun), where the interpretation of do is just the event denoted by the object (10a & 10b). Other times, when do appears with an object, there may be a conventionalized meaning associated with it, or the object + context/world knowledge may serve to constrain the available interpretations (10c). Additionally, for both transitive and intransitive do, the interpretation of do may result from composition with a referring expression that refers to an event, either through anaphoric/cataphoric (10d) or exophoric reference (10e). It is this last means of specification in which thus participates.

2.2.2 The semantics of thus

In determining the semantics of thus, I follow the work of Landman & Morzycki (2002), which deals with similar referring expressions in Polish, Russian, German and Dutch. Essentially, Landman and Morzycki propose that the referring expressions that they are dealing with are anaphoric to kinds, specifically event kinds.11 Under this analysis, adverbials denote properties of events that realize a particular contextually supplied kind. The semantics that they propose are shown in (11), where ‘≤’ represents the part-of relation, meaning that the event e realizes the kind of event denoted by g(i).

(11) (Landman & Morzycki 2002: ex. 1a)
  a. On tańczył tak
    ‘He danced like that.’
  b. ‖takig = λeDs∩Dr.e ≤ g(i)Ds∩Dk
    ‖tańczył‖g = λxλeDs∩Dr.dance(e)(x)
    ‖tańczył takig = λxλeDs∩Dr.dance(e)(x) ∧ eg(i)Ds∩Dk

The idea is that dancing denotes an event (a dancing event), an adverb like wildly denotes a kind of event (a wild event-kind), and dancing wildly denotes an event of the first type that also realizes the event-kind denoted by the second type (a wild kind of dancing event). So, in dancing thus, where the intended meaning is dancing wildly, thus is anaphoric to the event-kind denoted by wildly, and when combined with dance, yields the desired denotation of a wild kind of dancing event.

This sort of analysis is readily applicable to the construction do thus.12 While dance denotes a particular kind of event which can be further restricted by an adverb (e.g., dance wildly, dance slowly), it can itself be seen as further restricting some other kind of event whose meaning is broader yet. For example, move is less specific than dance, and one could say move thus, where thus refers to a dancing type of event, and therefore leads to the interpretation of move thus as move in a dancing manner, or more succinctly, as dance. Taking this even further, since do is maximally underspecified, the phrase do thus can use thus to supply whatever type of event is intended, provided that said event can be conceived of as agentive (following ideas laid out in Miller (1990) and Kehler & Ward (1999)).

This framework can be seen as underlying the derivation of meaning sketched in (4) and (5) from Section 1. By using this framework, I obtain at a straightforward way of deriving the meaning of do thus through the composition of the meaning of it parts.

2.2.3 Arguments for compositionality

In the above discussion of do thus, I have assumed a compositional analysis over a constructional one. The reason for this is threefold. First, on the constructional analysis, do thus would be standing in for an entire VP. This is problematic because it is not always clear what that VP would be.

(12) (Forum post)
  Break your chains of Starbucks domination, grow your own coffee beans, harvest, roast them and then grind them with your teeth… brew in an old sock. After doing thus, you will really appreciate a good cup of Joe.

In (12), where there are six different verbs (break, grow, harvest, roast, grind, brew), would these all get conjoined into one VP? Notice that these six verbs do not all share the same object. While grow, harvest, roast and grind all have coffee beans as their objects, and could in principle be conjoined into one VP and then subsequently replaced with do thus, break and brew have different objects. The object of break is your chains, and the object of brew, though not overtly stated, cannot be the coffee beans, but the grounds that are the result of grinding the coffee beans. As such, it would appear that, even after conjoining VPs where possible, we are left with more than one VP. This speaks against treating do thus as simply a pro-VP.

The second reason to prefer a compositional analysis has to do with anaphoric uses of thus which are distinct from the phrase do thus.

(13) (Religious book)
  In the meantime, the entire force of the Hierarchy is thrown on the side of the nations struggling to free humanity, and on the side of those in any nation who thus work.

In (13), thus is anaphoric to a freeing-humanity type of event, or possibly a struggling-to-free-humanity type of event. When modifying work, this results in an interpretation along the lines of ‘those in any nation who (strugglingly) work to free humanity’. This process is remarkably similar to that discussed for do thus. Since one is unlikely to want to say that thus work is a construction, along the lines of do thus, the similarity would go unexplained under a constructional analysis.

The third problem with a constructional analysis is perhaps less severe, and is mentioned here mainly for completeness. The fact that thus can appear both pre- and postverbally, at the very least, would complicate a constructional analysis in a way that a compositional analysis avoids. Since do thus and thus do appear to have essentially the same meaning and function, a constructional analysis would find itself with two separate, though similar, constructions that did the same work. One could say that there is only one underlying construction (perhaps do thus) and that there are two alloconstructions, do thus and thus do, but they would need to be in free variation, unless some reason for choosing one over the other based on the environment/context could be found. Though such an explanation is workable, it is not especially satisfying. A compositional analysis, however, needs no new machinery or assumptions to explain the availability of both a pre- and postverbal position for thus; thus is a VP-level adverb, and these are exactly the places that VP-level adverbs adjoin to the syntactic structure. Any explanation for why this is so belongs to a theory of adverbial adjunction more generally, and has no particular bearing on how do thus works. Given this, and the other two problems already mentioned, a compositional analysis of do thus makes the most sense, both from a theoretical and a practical point of view.13

2.3 Do thus and do so

Since grammars only mention thus as a more formal version of so, one might suspect that thus and so have identical patterns of usage, outside of possible register differences, and therefore that the same could be said about do thus and do so. This might seem a reasonable suspicion given examples like (1), where do thus can felicitously be replaced with do so (14).

(14) Most firms that pay dividends do so on either a period of time or quarterly basis.

However, examples like (15), where substitution with so is not felicitous, call this hypothesis into question.

(15) (Online comment)
  a.   My two cents: had I designed the stage, I would have done thus: first two targets engaged with a loaner small caliber BUG, then at position B your pistol would be staged, simulating using your BUG to fight your way to the “real” gun in your car.
  b. #my two cents: had I designed the stage, I would have done so: first …

In fact, there seem to be a number of ways in which the behavior of do thus differs from that of do so, namely regarding the complexity of the antecedent, the distance from the antecedent, and the type of reference being made (i.e., anaphoric, cataphoric or exophoric).

Do so generally takes simple antecedents, i.e., antecedents consisting of only one discourse segment (DS).1415 Though there are attested instances of split antecedents for do so (Kehler & Ward 1999; Ward & Kehler 2005; Kehler & Ward 2007; Houser 2010), they are uncommon.16 In Houser’s corpus of 994 examples of do so, no instance has an antecedent made up of more than three DSs, and the average number of DSs per do so for the entire corpus is 1.05.17 By contrast, the antecedent of thus done in (16) has nine DSs.18

(16) (Blog post)
  When you carry out the above three simple steps, in background the wizard would
  -[automatically discover available options for you,] [say available logical hostnames, HA mountpoints, share options…]
  -[generate unique resource names.]
  -[create/][update required configuration files.]
  -[validate input given at each step,] [thereby reducing any fault in configuration.]
  -[automatically generate] [and execute Solaris Cluster commands.]
  -even rollback all the changes thus done, if the newly created resources fail to come up.

The antecedent of do so has also been noted as necessarily being very near to the anaphor – very often within the same sentence, though at times in the preceding sentence (Miller 2011). On average, in Houser’s 2010 data, do so was 1.68 DS away from its antecedent.19 In (17), however, the antecedent (writing an act) is at a distance of nine DSs from the clause containing do thus, suggesting that do thus is not under the same distance restrictions as do so.20

(17) (Government website)
  [I view writing an act without a definition of the words ‘suitable’ and ‘available,’] [leaving all of that discretionary power to jurisprudence or that board or a combination of both,] [extremely frustrating.] [Perhaps it is unfair on my part, Professor,] [however, I find it most frustrating] [because our history of relations with this board has not been the type of history] [that would lend itself to this kind of discretionary power.] [I honestly think] [that injured workers and those of us] [who have had dealings with them] [are going to find it very frustrating.] How can we do thus without a definition of ‘suitable’ and ‘available’?

Finally, (15) and (18) show examples of cataphoric and exophoric reference, respectively.21 Do so has generally been taken to require some form of previously mentioned linguistic antecedent (Kehler & Ward 1999; Bruening 2019), and therefore should be infelicitous with both cataphoric and exophoric reference.22

(18) (Research database)
  This job has to be done piecemeal, separately for each of the five vowels. It is done thus for the vowel A:
              REJECT IF LETTERS (END) NE ‘E’
              REJECT IF PHONES (END-1) EQ ‘EI’
              REJECT IF LETTERS (END-2) NE ‘A’

To further investigate the behavior of do thus, and to get a better idea of how robust these seeming differences with do so actually are, a corpus of nearly 2000 examples of do thus was created, with roughly a quarter of the examples selected for analysis. This is further described in Section 3. By comparing the patterns extracted from the do thus corpus with data already available on do so, we will be in a position to better understand their similarities and differences.

2.4 Personal pronouns and demonstratives

Before moving on to discussion of the corpus data for do thus, it is worth taking a moment to discuss the behavior of personal pronouns and demonstratives, as the distinction between the two will be relevant for the analysis of the data.

Personal pronouns and demonstratives have been claimed to have distinct behaviors. Many researchers claim that personal pronouns, such as it, require their referent to be highly salient (in focus for Gundel et al. (1993), highly accessible for Ariel (2001)). Contrasted with this, demonstratives are taken to have less salient referents (they are activated for Gundel et al., and of medium accessibility for Ariel). That is, personal pronouns require their referent to be at the center of a hearer’s attention, while demonstratives only require that the referent be active in short-term memory. Regarding the type of data used in this study, the measure of distance between the anaphor and the antecedent can be seen as a proxy for salience. It stands to reason that, when an antecedent immediately precedes the anaphor, the referent established by the antecedent is still clear and present in the hearers attention, and is thus quite salient and accessible (what Ariel refers to as anaphor-antecedent unity). By the same token, if the antecedent is farther back in the text, it stands to reason that its referent is less salient and accessible, and that salience and accessibility reduce as distance increases.

Another aspect on which personal pronouns and demonstratives differ has to do with the complexity of their referent. Personal pronouns have been shown to prefer simpler referents, while demonstratives have been shown to prefer more complex referents. Brown-Schmidt et al. (2005) demonstrate this difference experimentally. Participants were asked to perform a task such as placing a teacup onto a saucer. Afterwards, if they were instructed to place it on the floor, they tended to place only the cup on the floor. If they were instructed to place that on the floor, they tended to place both the cup and the saucer on the floor. Brown-Schmidt et al. interpret this as having to do with referent complexity. They suggest that there are two possible referents available: a simple referent (the cup), and a composite/complex referent (the cup + the saucer). When participants heard it, they were more likely to resolve reference to the simple referent, and when participants heard that, they were more likely to resolve reference to the complex referent.

This difference in complexity preference has been demonstrated for more than just reference to objects. On the assumption that abstract referents (such as events, situations, state-of-affairs, propositions, etc.) are inherently more complex than object referents, Çokal et al. (2018) examined the behavior of personal pronouns and demonstratives when there was both an object referent and a propositional referent available in the immediately preceding context. Participants read the context, and then read a sentence beginning with either it or that, followed by a disambiguating phrase that made clear whether the object or the proposition was the intended referent. They found longer reading times when it was disambiguated to the propositional referent, and when that was disambiguated to the object referent. They interpret these results as showing that the reader had already resolved it to the object referent, or that to the propositional referent, and had to reanalyze the situation when they encountered the disambiguating phrase. So again, we see evidence that personal pronouns prefer simple referents, and demonstratives prefer more complex referents.

As it pertains to this study, assuming that verbs are a means for introducing event referents, a single verb should introduce as simple an event referent as possible (keeping in mind that event referents are assumed to be inherently more complex than object referents). Multiple verbs would then introduce multiple event referents, and combining multiple event referents into a composite event referent yields more complex referents (akin to the cup + saucer of Brown-Schmidt et al.). In the present study, the texts are measured in discourse segments, and as such, the number of discourse segments in an antecedent can be taken as a proxy for the complexity of the referent introduced by the antecedent.

One last dimension on which personal pronouns and demonstratives differ is the way in which they participate in different types of reference (i.e., anaphoric, cataphoric and exophoric reference). While both personal pronouns and demonstratives are at home with anaphoric reference, Trnavac & Taboada (2016) have demonstrated that personal pronouns are capable of cataphoric reference in only a very prescribed set of situations (only when syntactically subordinate to their antecedent), while the demonstrative this is capable of cataphoric reference in a variety of other environments. The felicity of demonstratives with cataphoric reference has been noted by Ariel (2001) as well, who suggests that this is “due to the fact that lower accessibility markers are better cataphoric devices” (p. 59) (recall that Ariel considers demonstratives to be lower accessibility markers than personal pronouns).

It is worth noting here the similarity between the demonstrative this, known for its cataphoric behavior, and the adverbial referring expression under investigation here, thus. Not only do they look remarkably similar, but they share a common history, as thus descends from the Old English instrumental form of this (Harper 2020). The cataphoric behavior noted in example (18) is perhaps expected if this behavior can be assumed to have been inherited from this. Flambard (2018) presents evidence that the behaviors of standalone personal pronouns and demonstratives carry over to their corresponding do constructions (i.e., do it, do this, do that), and so it would be reasonable to expect that any preferences or abilities that thus may have would be present in the expression do thus as well, including felicity with cataphoric reference.

3 Methodology

3.1 Building the corpus

The corpus was constructed by searching the internet using Google Search’s exact match functionality (i.e., typing the desired string of characters within quotation marks (“…”)).23 I chose to build the corpus in this manner, rather than using existing corpora, due to the rarity of do thus constructions (see Figures 1 and 2, and the discussion in Section 2.1).24 Because do thus is so rare, even large corpora provide few results on which to perform an analysis. For example, using BYU’s iWeb corpus (Davies 2018)25 to search for doing thus returned 128 results.26 Of these, 63 were either idiomatic or ‘therefore’ uses (see fn. 7), 14 were anomalous (see exclusions below), and 12 were repeats. In effect, iWeb returned only 39 usable examples of doing thus. By contrast, 558 usable examples of doing thus were found using Google, a rather noticeable difference. Searches were performed on the forms do, doing, and done, with thus appearing both preverbally and postverbally (e.g., doing thus, thus doing).2728 To make the results more manageable, further context was sometimes added to the search strings. This context included negation (19), each subject pronoun (20), each auxiliary (21), and each preposition (with the present participle) (22). These contexts could also be combined (23).

(19) not do thus; not doing thus
(20) I/you/we/they do thus
(21) do/can/could/will/might/etc. do thus
(22) in/while/during/after/etc. doing thus
(23) he should not do thus

Using negation and other auxiliaries allowed me to easily filter out auxiliary do. While simply searching for “they do thus” will return many examples of auxiliary do (e.g., they do thus pose a significant problem), this is not a possibility if negation or auxiliaries are included in the search. “Not do thus” may return examples with auxiliary do, but the auxiliary will be in addition to main verb do (e.g., I do not do thus). Similarly, since auxiliary do cannot co-occur with other auxiliaries (e.g., *I can do do thus; *I can do not do thus), specifying the auxiliary in the search precludes the return of examples containing auxiliary do. Only the personal pronouns I/you/we/they were used when searching with only “do thus”, since he/she/it are not compatible with this form of do (see fn. 27). When searching with the auxiliary specified, all seven personal pronouns were used.

When collecting the examples, care was taken to retain a great deal of information. Since distance from do thus to its antecedent is a central focus of this study, a large amount of context was retained. I always retained the entire paragraph containing do thus, and if do thus began a paragraph, I retained the preceding paragraph as well. When the example was a blog post, I retained the thread of replies in which it was located. Further, all of the URLs for the examples were retained. Retaining the URLs helps to guard against having retained too little context. When annotating the examples, if it was ever unclear what the antecedent was, or if it was unclear if enough information had been retained, it was easy to quickly go back to the full source. In this way, it was possible to look back over several paragraphs of text to make sure that nothing relevant had been missed. Lastly, I also retained information related to authorship and publication dates. Great pain was taken to obtain this information (especially the publication date) for all of the examples. This often entailed a significant amount of time doing research on just publication details. For examples that came from a book, I sought out the date of the first edition. For examples that came from forum or blog posts, date information was much easier to acquire, as these types of posts are generally accompanied by a timestamp.

A number of criteria were used for exclusion. Any instances of do or thus that were not of interest were omitted (e.g., auxiliary do, “therefore” uses of thus). To keep the data clear and uncontroversial, examples that were recorded before the fifteen hundreds, as well as any instances that were ambiguous between an anaphoric or a consequential reading, were also excluded. Beyond this, any instances whose context was anomalous in such a way so as to cast doubt on the authenticity of the data were also excluded. Such anomalous contexts included thus followed by a bare NP (a likely typo for this), typos in other words (though poor punctuation was ignored), and any apparent non-fluency, be it due to non-native use, apparent machine translation, or machine generation (i.e., generated by a bot). The total numbers for the corpus are given in Table 1.29

Table 1

Total number of instances of do thus in the corpus, broken down by form and position.

Preverbal Postverbal Totals
thus do 12 do thus 530 542
thus doing 730 doing thus 558 1288
thus done 69 done thus 99 168
Totals 811 1187 1998

3.2 Annotation

Of the 1998 total examples collected, a sample of 389 were selected for annotation. This selection proceeded as follows: First, all of the examples were sorted by their respective forms (i.e., thus do, do thus, thus doing, doing thus, thus done, done thus). Then, using the date information discussed in the previous subsection, these six categories were sorted by century. From each form-century pairing, I randomly selected 25% for annotation.

Throughout the corpus, there are some authors who appear multiple times. There were also a number of examples which were English translations of a non-English original. I did not want to let a single author overly influence the results, and I also thought it advisable to avoid any undue influence on style that might happen during translation. I therefore avoided annotating a single author more than once, or translations at all. When an example was selected for annotation, I checked to see if 1) I had already selected an example from the same author, and 2) the example was a translation. If the answer to either of these was yes, the example was discarded and replaced with a new randomly selected example. This was done independently for all form-century pairings, i.e., a single author could supply a doing thus example and a thus done example, but not two doing thus examples. The total numbers of annotated examples can be found in Table 2.30

Table 2

Total number of annotated instances of do thus, broken down by form and position.

Preverbal Postverbal Totals
thus do 0 do thus 109 109
thus doing 109 doing thus 90 199
thus done 25 done thus 56 81
Totals 134 255 389

All 389 examples were annotated for century, genre, form, antecedent complexity, the distance between the anaphor and the antecedent, and the type of reference used. What follows is a brief justification and description of these annotations.

Century was determined in the manner discussed earlier. This information was tracked in order to see if there have been any substantial changes in the usage of do thus over time. Century was coded as: 1500s; 1600s; 1700s; 1800s; 1900s; 2000s.

Since thus is often treated as a more formal variant of so in the grammars (Section 2.1), it seemed reasonable to wonder if usage of do thus varied according to genre. I took academic writing, religious writing, legal writing, journalism (not including quotations) and instructional texts to be formal. I took blogs, forums, fiction (mostly fanfiction and drama, mainly dialog), and spoken texts (generally in quotations from journalism) to be casual. Genre was coded as: Formal; Casual.

Form was annotated in order to track any differences in usage that correlate with the form of the construction used. This seemed worth looking into, especially considering that there are claims in the literature that restrictions on do so vary with finiteness (Houser 2010, but see fn. 28). Form was coded as: Finite; Non-finite.

As has been noted, do so prefers simple antecedents, i.e., antecedents consisting of one discourse segment (DS); do thus does not appear to be so constrained (see Section 2.3). To see how robust this difference between the two anaphors is, the complexity of the antecedents of do thus was recorded. Antecedent complexity was determined by counting the number of discourse segments making up the antecedent.31 Antecedent complexity was coded as: Positive integers ≥ 1 for anaphoric and cataphoric examples; N/A for exophoric examples.

As discussed in Section 2.3, the antecedent for do so is generally located within one or two DSs of the anaphor, while do thus seems to be able to have a more distant antecedent. To see how strong this apparent difference between the two anaphors is, the distance intervening between the anaphor and its antecedent was measured by counting DSs. Distance was coded as: Positive integers ≥ 1 for anaphoric and cataphoric examples, N/A for exophoric examples.32

As has been noted (Section 2.2), reference with do so needs to be to a previously mentioned linguistic object, and as such, must be anaphoric (though note fn. 22); reference with do thus appears able to be both cataphoric and exophoric. Accordingly, to compare do thus and do so along this dimension, the type of reference for each example was noted. Type of reference was coded as: Anaphoric; Cataphoric; Exophoric.33

4 Results

4.1 Antecedent complexity

Only anaphoric and cataphoric examples were used in determining antecedent complexity. This is because, as exophoric examples have no linguistic source of reference, they do not properly have antecedents. A total of 16 examples were excluded from the antecedent complexity analysis because they were exophoric (4.11% of the total data). The overall results for antecedent complexity can be seen in Figure 3. On average, antecedents of do thus contained 4.68 discourse segments (DSs). 71.58% of the annotated examples contained two or more DSs, and 42.36% contained four or more DSs. Only 28.86% contained only one DS. Example (16) from Section 2.3 contained an antecedent made up of nine DSs. That would put (16) in the 42.36% of the sample with four or more DSs per antecedent. A closer look shows that examples like (16), with nine or more DSs per antecedent, account for 13.94% of the sample.

Figure 3
Figure 3

Proportion of anaphoric and cataphoric examples with antecedents consisting of 1, 2–3, and 4 or more discourse segments (DSs).

The distribution of antecedent complexity is shown by form (Finite vs. Non-finite) in Table 3, by genre (Formal vs. Casual) in Table 4, and by century (1500s–2000s) in Table 5.

Table 3

Distribution of antecedent complexity, in discourse segments (DSs), for finite and non-finite forms of do thus.

Form 1 DS 2 or 3 DSs 4 or more DSs Total Mean
Finite 6 (20%) 8 (26.67%) 16 (53.33%) 30 4.3
Non-finite 99 (28.86%) 102 (29.74%) 142 (41.4%) 343 4.71
Total 105 (28.15%) 110 (29.49%) 158 (42.36%) 373 4.68
Table 4

Distribution of antecedent complexity, in discourse segments (DSs), for formal and casual usage of do thus.

Genre 1 DS 2 or 3 DSs 4 or more DSs Total Mean
Formal 73 (24.33%) 90 (30%) 137 (45.67%) 300 5.03
Casual 32 (43.84%) 20 (27.4%) 21 (28.78%) 73 3.21
Total 105 (28.15%) 110 (29.49%) 158 (42.36%) 373 4.68
Table 5

Distribution of antecedent complexity, in discourse segments (DSs), over centuries.

Century 1 DS 2 or 3 DSs 4 or more DSs Total Mean
1500s 3 (8.57%) 11 (31.43%) 21 (60%) 35 6.71
1600s 12 (19.67%) 11 (18.03%) 38 (62.3%) 61 6.16
1700s 15 (26.32%) 16 (28.07%) 26 (45.61%) 57 4.19
1800s 19 (28.36%) 24 (35.82%) 24 (35.82%) 67 4.81
1900s 21 (29.58%) 25 (35.21%) 25 (35.21%) 71 4.27
2000s 35 (42.68%) 23 (28.05%) 24 (29.27%) 82 3.29
Total 105 (28.15%) 110 (29.49%) 158 (42.36%) 373 4.68

4.2 Distance from do thus to the antecedent

As with antecedent complexity, and for the same reason, only anaphoric and cataphoric examples were used in determining the distance intervening between the antecedent and the anaphor. This again resulted in 16 examples being excluded from the distance-to-the-antecedent analysis because they were exophoric (4.11% of the total data). The results for distance can be seen in Figure 4. On average, antecedents of do thus were 1.98 DSs away from the anaphor. The antecedent was within two DSs of the anaphor 80.43% of the time. For 13.14% of the examples the antecedent was three to five DSs away from the anaphor, and for 6.43% the antecedent was six or more discourse segments away. Example (17) from Section 2.3 contained an antecedent at a distance of nine DSs from do thus. That would put (17) in the 6.43% of the sample with four or more DSs per antecedent. A closer look shows that examples like (17), at a distance of nine or more DSs from do thus, account for 2.68% of the sample.

Figure 4
Figure 4

Proportion of anaphoric and cataphoric examples whose antecedents are 1–2, 3–5, and 6 or more discourse segments (DS) away from the anaphor.

The distribution of the distance between do thus and the antecedent is shown by form (Finite vs. Non-finite) in Table 6, by genre (Formal vs. Casual) in Table 7, and by century (1500s–2000s) in Table 8.

Table 6

Distribution of the distance between do thus and the antecedent, in discourse segments (DSs), for finite and non-finite forms of do thus.

Form 1 to 2 DSs 3 to 5 DSs 6 or more DSs Total Mean
Finite 28 (93.33%) 1 (3.33%) 1 (3.33%) 30 1.57
Non-finite 272 (79.3%) 48 (13.99%) 23 (6.71%) 343 2.02
Total 300 (80.43%) 49 (13.14%) 24 (6.43%) 373 1.98
Table 7

Distribution of the distance between do thus and the antecedent, in discourse segments (DSs), for formal and casual usage of do thus.

Genre 1 to 2 DSs 3 to 5 DSs 6 or more DSs Total Mean
Formal 245 (81.67%) 40 (13.33%) 15 (5%) 300 1.85
Casual 55 (75.34%) 9 (12.33%) 9 (12.33%) 73 2.53
Total 300 (80.43%) 49 (13.14%) 24 (6.43%) 373 1.98
Table 8

Distribution of the distance between do thus and the antecedent, in discourse segments (DSs), over centuries.

Century 1 to 2 DSs 3 to 5 DSs 6 or more DSs Total Mean
1500s 29 (82.86%) 2 (5.71%) 4 (11.43%) 35 2.03
1600s 50 (81.97%) 7 (11.48%) 4 (6.56%) 61 2.03
1700s 43 (75.44%) 9 (15.79%) 5 (8.77%) 57 2.3
1800s 54 (80.6%) 12 (17.91%) 1 (1.49%) 67 1.69
1900s 58 (81.69%) 8 (11.27%) 5 (7.04%) 71 2.09
2000s 66 (80.49%) 11 (13.42%) 5 (6.1%) 82 1.87
Total 300 (80.43%) 49 (13.14%) 24 (6.43%) 373 1.98

4.3 Type of reference

Results for type of reference are shown in Figure 5. Anaphoric reference is seen in 84.32% of the examples, 11.57% show cataphoric reference, and 4.11% show exophoric reference. That means that 15.68% of the examples show reference to something other than a previously mentioned linguistic object. It is important to note, as well, that exophoric reference is much rarer in written text than in spoken text – written text does not take place in the world the way that spoken text does. When speaking, I may point to something, or in some other way demonstrate what I am talking about, and thus achieve exophoric reference. With written text, even when we point to something, what we are usually pointing to is still written language, and thus not exophoric. As such, the fact that there were 16 examples of exophora (i.e., reference to a non-linguistic item) in the 389 randomly selected examples is especially striking.

Figure 5
Figure 5

Proportion of examples whose reference was anaphoric, cataphoric, and exophoric.

The distribution of the type of reference is shown by finiteness (Finite vs. Non-finite) in Table 9, by genre (Formal vs. Casual) in Table 10, and by century (1500s–2000s) in Table 11.

Table 9

Distribution of the type of reference for finite and non-finite forms of do thus.

Form Anaphoric Cataphoric Exophoric Total
Finite 30 (96.77%) 0 (0%) 1 (3.23%) 31
Non-finite 298 (83.24%) 45 (12.57%) 15 (4.19%) 358
Total 328 (84.32%) 45 (11.57%) 16 (4.11%) 389
Table 10

Distribution of the type of reference for formal and casual usage of do thus.

Genre Anaphoric Cataphoric Exophoric Total
Formal 262 (84.52%) 38 (12.26%) 10 (3.23%) 310
Casual 66 (83.54%) 7 (8.86%) 6 (7.6%) 79
Total 328 (84.32%) 45 (11.57%) 16 (4.11%) 389
Table 11

Distribution the type of reference over centuries.

Century Anaphoric Cataphoric Exophoric Total
1500s 29 (82.86%) 6 (17.14%) 0 (0%) 35
1600s 55 (87.3%) 6 (9.52%) 2 (3.18%) 63
1700s 53 (92.98%) 4 (7.02%) 0 (0%) 57
1800s 56 (83.58%) 11 (16.42%) 0 (0%) 67
1900s 62 (83.78%) 9 (12.16%) 3 (4.05%) 74
2000s 73 (78.5%) 9 (9.68%) 11 (11.83%) 93
Total 328 (84.32%) 45 (11.57%) 16 (4.11%) 389

5 Discussion

In the following sections, I compare the behavior of do thus, reported above, with that of do so. The data for do so is drawn from the corpus provided in Houser (2010). Houser collected 994 examples of do so from the American National Corpus (ANC), a collection of more than 22 million written and spoken words of American English, going back to 1990. The ANC includes diverse genres of text, ranging from very formal (e.g., biomedical reports) to very casual (e.g., tweets). Houser reports that 96.7% of his examples were from written texts and 3.3% were from spoken texts, and included the forms do so, does so, did so, doing so and done so. Though Houser does not provide any genre information, a scan of the data reveals examples that are both very formal (e.g., “Physics-based functions based on electrostatics and van der Waals interactions do not discriminate well on their own…”) and very casual (e.g., “You all are incredible. Every mother-f****** last one of you.”).

Since I will be comparing the data reported herein with the data from Houser’s corpus, it is worth noting both the similarities and the differences between the two corpora. First, some major similarities. Both corpora draw from a wide range of genres, and both contain web-based content such as blogs and forum posts; both corpora contain examples with preverbal and postverbal adverbial referring expressions (i.e., so and thus); and both corpora contain examples of finite and non-finite do, with non-finite examples comprising infinitives, participles, auxiliary+do constructions and negation+do constructions.

Differences between the corpora have mainly to do with the distributions of the forms of do, and with the timespan used. Though both Houser’s examples and my own include many similar forms of do, Houser’s data contains instances of does and did, which my data does not. Such forms account for about 15% of Houser’s examples. Though both corpora contain instances of finite and non-finite do, 21% of Houser’s examples are finite, while only 8% of the do thus examples are finite. However, if we take into account the 15% of Houser’s data that contains finite does and did, this leaves only 6% for finite do, which is very close to the 8% for finite do in the do thus data. Nevertheless, in the discussion that follows, when comparing the do thus data to Houser’s do so data, I present both the total numbers for do so together with the numbers after does so and did so are removed.

Regarding the timespan, the do thus data goes back to 1500, while Houser’s data can only possibly go back to 1990 (due to the make-up of the ANC). As such, only about a quarter of the do thus data overlap temporally with Houser’s do so data. With some exceptions, the data for do thus show relatively stable behavior over time, so that this difference in temporal coverage is not especially concerning. Still, in line with the treatment above regarding forms of do, for the following discussion, I present both the total numbers for do thus along with the numbers for just the 21st century.

5.1 Antecedent complexity

Based on the corpus data, it appears that do thus does behave differently from do so. Table 12 shows a comparison of antecedent complexity between do thus (reported here) and do so (from Houser 2010).

Table 12

Distribution of antecedent complexity, in discourse segments (DSs), for do thus (total), do so (total), do thus (2000s) and do so (minus does/did so).

Referring expression 1 DS 2 or 3 DSs 4 or more DSs Total Mean
Do thus (total) 105 (28.15%) 110 (29.49%) 158 (42.36%) 373 4.68
Do so (total) 944 (94.97%) 50 (5.03%) 0 (0%) 994 1.05
Do thus (2000s) 35 (42.68%) 23 (28.05%) 24 (29.27%) 82 3.29
Do so (minus does/did so) 812 (96.32%) 31 (3.68%) 0 (0%) 843 1.03

With respect to antecedent complexity, Houser (2010)’s data showed that do so averaged 1.05 DS per antecedent (or 1.03 without does/did so). Compare this to the 4.68 average found in the present study for do thus (or 3.29 for just the 2000s). In Houser’s data, the vast majority of examples had antecedents consisting of only one DS. In the present study, however, do thus took such simple antecedents only 28.15% of the time (or 42.65% of the time for just the 2000s). Examples like (16), with a rather complex antecedent made up of nine DSs, were the norm, not the exception, though somewhat less so for the data from just the 2000s. While do so prefers simple antecedents, do thus seems to be felicitous with antecedents of essentially any size.

Comparing finite to non-finite forms, little difference is seen regarding antecedent complexity. Finite forms of do thus had an average complexity of 4.3 DS/antecedent, while non-finite forms had an average of 4.71 DS/antecedent (Table 3). There was a bigger difference seen between formal and casual genres, with formal genres having more complex antecedents than casual genres (with averages of 5.03 DS/antecedent and 3.21 DS/antecedent, respectively) (Table 4). There appears to be some change over time, as well, with older examples having more DS/antecedent on average than more recent examples (going from 6.71 DS/antecedent in the 1500s to 3.29 DS/antecedent in the 2000s) (Table 5). This change in average complexity is driven by there being more single DS antecedents and fewer antecedents with more than four DS. The percentage of antecedents containing two or three DS has remained relatively constant (fluctuating between 18% and 35%).

5.2 Distance to the antecedent

Table 13 shows a comparison of the distance from the referring expression to the antecedent between do thus (reported here) and do so (from Houser 2010).

Table 13

Distribution of distance between the referring expression and the antecedent, in discourse segments (DSs), for do thus (total), do so (total), do thus (2000s) and do so (minus does/did so).

Referring expression 1 to 2 DSs 3 to 5 DSs 6 or more DSs Total Mean
Do thus (total) 300 (80.43%) 49 (13.14%) 24 (6.43%) 373 1.98
Do so (total) 836 (84.11%) 157 (15.8%) 1 (0.1%) 994 1.68
Do thus (2000s) 66 (80.49%) 11 (13.42%) 5 (6.1%) 82 1.87
Do so (minus does/did so) 697 (82.68%) 145 (17.2%) 1 (0.12%) 843 1.74

Though not as great as the difference between do thus and do so regarding antecedent complexity, there is still a noticeable difference between the two constructions regarding the distance to the anaphor. Both constructions average between one and two DSs of distance. This is because both constructions occur most frequently within two DSs of the antecedent. Where the difference between the two constructions is more obvious is in how distant the antecedent may be. In Houser’s data, the greatest distance of any example was six DSs (0.1%, or 0.12% without does/did so), and there was only one instance of this. By comparison, the do thus data shows 24 examples with a distance of six or more DSs (6.43%), the most distant of which was 15 DSs (the percentage is reduced to 6.1% when looking at only the data from the 2000s). This difference is all the more noteworthy when one considers the fact that Houser’s data contains about two and a half times as many examples as that reported here.

Comparing finite to non-finite forms, non-finite forms appear to be more distant from the antecedent than finite forms. Finite forms of do thus had an average distance to the antecedent of 1.57 DSs, while non-finite forms had an average distance of 2.02 (Table 6). While both finite and non-finite forms were most likely to be within two DS of the antecedent, this begins to look almost like a requirement for the finite forms, with only 6.67% of the sample being more distant than this. Compare this with 19.57% of non-finite examples being three or more DS away from the antecedent. A similar difference is seen between formal and casual genres, with casual genres having more distant antecedents than formal genres (with averages of 2.53 DSs and 1.85 DSs, respectively) (Table 7). There appears to be little change over time, with average distances fluctuating between 1.69 DSs (1800s) and 2.3 DSs (1700s) (Table 8). Unlike with antecedent complexity, no clear pattern emerges.

5.3 Type of reference

Table 14 shows a comparison of the type of reference between do thus (reported here) and do so (from Houser 2010).

Table 14

Distribution of type of reference for do thus (total), do so (total), do thus (2000s) and do so (minus does/did so).

Referring expression Anaphoric Cataphoric Exophoric Total
Do thus (total) 328 (84.32%) 45 (11.57%) 16 (4.11%) 389
Do so (total) 986 (99.2%) 8 (0.8%) 0 (0%) 994
Do thus (2000s) 73 (78.5%) 9 (9.68%) 11 (11.83%) 93
Do so (minus does/did so) 835 (99.05%) 8 (0.95%) 0 (0%) 843

Do thus and do so appear to have different restrictions on the type of reference they can participate in. As mentioned earlier, there is general agreement that do so requires its referent to have been a linguistic object in the preceding discourse, and that, as a consequence, the reference must be anaphoric. Though Houser’s data shows several instances of cataphoric reference with do so (which is in itself somewhat surprising, given the general consensus that this is not possible), the proportion of this is vanishingly small (0.8%, or 0.95% without does/did so). There are no examples of exophoric reference in Houser’s data. Though the majority of do thus examples were anaphoric, 15.68% were not (this percentage increases to 21.51% when looking at only the data from the 2000s). This number, though small, is much larger than that shown for do so.

5.4 General discussion

The preceding data provide a strong argument against any attempt to say that the referent of do thus is under the same restrictions as that of do so. Rather, it almost seems that do thus is under essentially no restrictions. It may take quite complicated antecedents, though it can also take extremely simple ones; its antecedent is often very near, but may be quite distant; it usually refers anaphorically, but is shown to occur with both cataphoric and exophoric reference as well.

Given these observations, it is clear that do thus is not merely a formal variant of do so. While both constructions allow speakers to make reference to events, the conventions regulating the usage of do thus show that it is a distinct and separate form.

The differences noted between do thus and do so are not just random, but are reminiscent of distinctions known to exist between personal pronouns and demonstratives (Section 2.4). Personal pronouns have been noted to prefer simple, highly salient/accessible referents (Gundel et al. 1993; Brown-Schmidt et al 2005; Çokal et al. 2018), and to be highly restricted in making cataphoric reference (Trnavac & Taboada 2016). Demonstratives are compatible with more complex, less salient/accessible referents, and the demonstrative this participates in cataphoric reference more freely. It appears, based on the data, that do so patterns with nominal personal pronouns, such as it, and do thus patterns with nominal demonstratives such as this/that. Noticing this allows us to complete a paradigm for English that has until now persistently had an empty cell (Table 15).

Table 15

Paradigm of English event referential constructions.

Do + nominal referring expression Do + adverbial referring expression
Personal proform do it do so
Demonstrative do this/that do thus

Previously do so had seemed unique among English referring constructions. Though bearing a resemblance to other constructions such as do it and do that, it did not pattern exactly like them – a state of affairs which has led to a fair amount of controversy as to the exact status of this construction (Lakoff & Ross 1966; Hankamer & Sag 1976; Kehler & Ward 1999; Culicover & Jackendoff 2005; inter alia).34 Now that do thus has been recognized as a construction similar to, but distinct from, do so, the reason for earlier problems of classification seem clear: Previous researchers were trying to spread do so across two categories (personal proform and demonstrative adverbial), when it actually only represents the personal proform adverbial category. Do thus is its demonstrative counterpart.

There is reason to suspect that this distinction between do so and do thus has its roots in the history of the words so and thus. According to the Online Etymology Dictionary (Harper 2020), so derives from a Proto-Indo-European reflexive pronoun stem, and is akin to Latin se ‘himself’. Thus, on the other hand, is derived from the instrumental form of Old English þis ‘this’, which itself derives from a Proto-Indo-European demonstrative stem. The distinction observed between do so and do thus may ultimately derive from an ancient Proto-Indo-European distinction between personal proforms and demonstratives that has survived to the present day.

6 Conclusion

I have shown that do thus represents a distinct event referential construction from the more well-known do so. Do thus may take complex antecedents, which suggests that its referents may be complex as well; do thus does not require its antecedent to be especially nearby, which suggests that its referent need not be especially salient/accessible (cf. Ariel (2001) and her discussion of accessibility and anaphor-antecedent unity); do thus is compatible and felicitous with all types of reference, be it anaphoric, cataphoric, or exophoric. These characteristics are in contrast with those known for do so, but are very similar to those known for demonstratives in general, suggesting that the familiar personal proform/demonstrative distinction known from nominal referring expressions can be found between adverbial referring expressions as well.

The foregoing suggests that referential mechanisms and restrictions known from reference to objects are operative for reference to events as well, which in turn can help to guide future research into event reference, in particular psycholinguistic research into event reference production and resolution. It furthermore suggests that, if such a distinction among referring expressions can be found in English, it may be found in other languages as well, which would possibly allow for more universal cross-linguistic generalization about the interaction of language, event representations, and information structure.

Appendix: Discourse segmentation in Wolf et al. (

Throughout this paper, I have relied on the method of discourse segmentation laid out in Wolf et al. (2003). Here I provide a brief overview of the procedure. The interested reader is referred to their paper, from which I have substantially borrowed here, for more in-depth discussion and elaboration.

Clauses delimited by commas or periods are usually discourse segments (DSs). Commas do not indicate a new DS if they separate elements of a complex NP, or in cases like the following:

[When, if ever, are you coming over?] (One DS)

Infinitival clauses that are not verbal complements are separate DSs, that is, if to may be replaced with in order to, the infinitival clause is a new DS:

[The code can be modified][(in order) to handle your needs.] (Two DS)

Infinitival clauses that are complements of verbs are not treated as separate DSs:

[I asked the mailman to knock twice.] (One DS)

Gerundive complements of verbs are not treated as separate DSs:

[He tricked me into buying Amway.] (One DS)

Gerundive clausal modifiers are treated as separate DSs:

[Keeping my eyes open,] [I was ready for anything.] (Two DS)

Prepositional phrases that are clausal modifiers are treated as separate DSs:

[Before going to bed,][I took off my watch.] (Two DS)

DSs can contain ellipsis. This is especially relevant for multiple VPs in one sentence:

[The dogs ate all the chicken][and <the dogs> destroyed the back yard.] (Two DS)

Elaborations are separate DSs:

[Mr. Jones,][spokesman for IBM,][said…] (Two DS)

Attributions are separate DSs:

[Mary said][the party was great.] (Two DSs)


  1. I have chosen the term “object” to talk about things, as opposed to events. The choice of terminology is a difficult one. Researchers often use terms like “entity” with the meaning that I intend, but on the view that all referents in the discourse model are entities, whether objects, events, facts, etc., this term seems problematic. A similar problem arises if one chooses the term “individual.” Though this term is commonly used to mean objects in the sense intended here, talk about the individuation of events (Davidson 1969) would suggest that events may be seen as individuals as well. One reviewer suggests the term “first-order referent” (found in, e.g., Cornish 2002) and this may indeed be a better description, but its meaning may be less than transparent to most readers. I therefore have chosen the term “object” for this concept, and will use it throughout. [^]
  2. Note that personal pronouns and demonstratives can also refer to events and other abstract referents. I do not claim that these referring expressions only refer to objects, but that the majority of the research around them has focused on their behavior with regard to object reference. Adverbial referring expressions, such as the thus in do thus, cannot refer to objects. On the view that (VP-level) adverbials are predicates of events (Parsons 1990), adverbial referring expressions must refer to an event, or more specifically to a type of event (see Section 2.2 for more discussion of this). [^]
  3. Note that the following discussion relies on treating do thus as compositional. Arguments for such a view of do thus will be presented in greater detail in Section 2.2. [^]
  4. The semantics of do thus will be more fully discussed in Section 2.2. [^]
  5. Note that this rough paraphrase is meant to be just that: a loose way of thinking about the meaning of thus. I do not intend to say that the semantics of the lexical item thus and the phrase in this way should be taken to be the same. [^]
  6. Do thus is similar in form to other do+adverbial constructions, such as do likewise/otherwise/without/as (one does). Though these constructions are worthy of investigation in their own right, they do not factor into the present discussion. [^]
  7. Thus has a number of non-event anaphoric uses, which are not the topic of this paper. Such uses include thus as a discourse or sentential connective, generally indicating consequence or narrative progression, as well as a number of idiomatic constructions (e.g., thus far, thus and such). [^]
  8. Early Modern English is generally given a start date of sometime in the 16th century. Dates given can range from the late 1400s to the early 1600s. I follow Nevalainen (2006) in treating Modern English as beginning in 1500. In constructing the corpus for this study, examples were found that predated this date. Such examples were not included, in order to keep the focus of the study on Modern English. [^]
  9. The time span has been shortened for Figure 2, relative to Figure 1, in order to clearly show the non-zero value for modern usage. [^]
  10. Do is in this regard very much a light verb, though more light than most, since most light verbs, like take, have both a fully lexical meaning (take a penny) and a bleached light verb meaning (take a shower). It is hard to determine what exactly the meaning of do would be without contextual support. For more on light verb constructions, see Jespersen (1949); Brugman (2001); Butt (2010); Wittenberg (2016); inter alia. [^]
  11. Their theory is built out of a suggestion in Hinrichs (1985) that event kinds are possible on a conceptual level. They assume an ontology of entities consisting of both kinds and eventualities, where the domain of kinds and the domain of eventualities has a non-empty intersection. This is formalized by partitioning the domain of entities (De) into two sorts, a domain of non-event individuals (Do) and a domain of eventualities (Ds). De is also partitioned into two sorts along another dimension, a domain of non-kinds (Dr) and a domain of kinds (Dk). [^]
  12. One point of departure between the theory of Landman & Mozrycki and myself is that I take all event predicates (e.g., verbs and adverbials) as denoting kinds, or types, of event. It is the existential closure over the event variable which affects the transition from a type of event to a particular event. As such, in (5), I do not claim that do must be a part of a paying-dividends kind, but that do is looking for an event predicate to feed its interpretation. It finds this predicate, through functional application, in the referent of thus. [^]
  13. A reviewer notes that do thus appears to have both a compositional and a non-compositional use, citing the ability to replace thus with in this way in only some contexts. The contexts where replacing thus with in this way work are those where there is a salient manner available:
    (i) most firms that pay dividends quickly do thus… ≈ most firms that pay dividends quickly pay dividends in this way…
    Whereas such replacement is infelicitous when the context is lacking such a manner:
    (ii) most firms that pay dividends do thus… ≈ #most firms that pay dividends pay dividends in this way…
    Note though that, in (ii) do has been replaced with pay dividends. This meaning must have come from somewhere. If thus can only refer to a manner that further restricts an event, then it must have been do itself that was anaphoric to the dividend-paying. I do not think this is correct. Do is not anaphoric, but is relatively devoid of meaning, and tries to find its interpretation in the denotation of its complement (here, thus). In (i), thus is anaphoric to a paying-dividends-quickly type of event, and in (ii), it is anaphoric to paying-dividends type of event. The reason (ii) is infelicitous is that there is essentially a double spell-out of the referent. This is not how my theory of the meaning of do thus works. Such double spell-out would not arise. [^]
  14. I conceive of antecedent complexity as a continuum, so an antecedent containing only one DS is as simple an (event) antecedent as is possible; an antecedent containing three DSs is simpler than an antecedent containing seven DSs, etc. In every felicitous example given in Miller (2011) and Miller (2013), the antecedent of do so contains only one DS. [^]
  15. The notion of discourse segment used throughout this paper is that presented in Wolf et al. (2003). A fuller account of Wolf et al.’s method for discourse segmentation is provided in the appendix, but for now, it should suffice to say that the majority of the time, counting DSs is very similar to counting verb phrases. Thus Mary went to school to learn French would be counted as two DS: [Mary went to school][to learn French]. VPs that are infinitival complements of another verb, however, are not considered to be a separate DS. Thus, Mary was told to go home consists of only one DS. The reason for using Wolf et al.’s method is two-fold. First, in research that is separate from that presented in this paper, there was a desire to annotate coherence relations between the do thus and its antecedent. Wolf et al.’s segmentation method was specifically designed for such annotation, and so seemed ideal. Second, it provides a very fine-grained and algorithmic method for discourse segmentation, and as such provides a very consistent method for making comparisons between different examples, both within the do thus corpus, as well as between the do thus corpus and the do so corpus provided by Houser (2010). [^]
  16. Kehler & Ward (1999) provide two examples, each with two DSs in the antecedent. Ward & Kehler (2005) provide one new example, with three DSs in the antecedent. Kehler & Ward (2007) provide no new examples. [^]
  17. This number was not included in Houser’s thesis. The average was computed independently, based off of the corpus data provided in his appendix. There were 44 examples with two DSs in the antecedent (4.43% of the sample) and six examples with three DSs in the antecedent (0.6%), for a total of 50 examples with more than one DS in the antecedent (5.03%). [^]
  18. Separate DSs are shown with square brackets. Examples like those presented here (i.e., (16–17)) will be discussed in more detail in the Results section. [^]
  19. This number was, like that for antecedent complexity, not included in Houser’s thesis, but computed independently. There were 513 examples where the distance was one DS (51.61% of the sample), 323 examples of two DSs (32.5%), 125 examples of three DSs (12.58%), 29 examples of four DSs (2.92%), three examples of five DSs (0.3%), and one example with six DSs (0.1%). No examples were more than six DSs distant from the antecedent. [^]
  20. Note that, if one simply counts the pairs of square brackets in (17), the total reached is eleven, not nine. This is because two of the DS are relative clauses embedded within another DS. Thus [injured workers and those of us] [who have had dealings with them] [are going to find it very frustrating] looks like three DSs, but is actually two; one larger DS that contains a smaller embedded DS. [^]
  21. Though the computer code in (18) is technically not referenced exophorically, since it is present in the text, it is clearly no longer English. The code is itself the demonstratum, and is functioning more like a picture being pointed to than actual text. It is for this reason that I treat this reference as exophoric. [^]
  22. NB: Bruening (2019) has noted that cataphoric examples of do so do indeed exist. There are even several instances of cataphoric do so in Houser (2010), accounting for 0.8% of the total number of examples. [^]
  23. Note that Google’s exact match functionality features the ability to use a wildcard character, “*”, which allows one to obtain results that include material not specifically searched for. For example, searching for “do * thus” returns results such as “when do we use thus”, or “we all do, and thus share responsibility”. Though such searches were performed, they did not return useful examples. The great majority of the results obtained in this way were instances of auxiliary do (either representing questions or negation), or were instances of do and thus appearing in separate clauses. None of the examples obtained in this way were clearly the type of example being sought, and many were clearly not the type of example being sought, and as such only examples in which do and thus were adjacent were used in the analysis. [^]
  24. For comparison, just looking at the frequency of collocations, BYU’s iWeb corpus returns a total of 1,471,011 hits for all forms of do, either preceded or followed by so. The total hits for all forms of do, either preceded or followed by thus, is 10,621. This means that, in the iWeb corpus, DO SO collocations are 138.5 times more frequent than DO THUS collocations. [^]
  25. The iWeb corpus has more than 14 billion English words. [^]
  26. For comparison with the other doing X constructions, BYU’s iWeb returns 293,908 results for doing it, 250,598 results for doing this, 112,474 results for doing that, and 272,600 results for doing so. [^]
  27. Searches were not performed with does or did. Such searches were initially attempted, but the obtained results were overwhelmingly instances of auxiliary do. Note that this does not mean that there are no 3rd person singular examples in the corpus. When using other auxiliaries or negation to filter out examples of auxiliary do, all 3rd person singular examples appear with the form do, e.g., he did not do thus, she should do thus. [^]
  28. Houser (2010) notes that the restrictions on the felicitous use of do so appear to be relaxed with non-finite forms, and this is followed in Miller (2011), where only finite forms are investigated. I have chosen to investigate both finite and non-finite forms in this study, because 1) I see no a priori reason to assume that such a relaxation also occurs with do thus, 2) the loosened restrictions which Houser discusses are those having to do with agentivity or stativity of the antecedent, topics which are not at issue here, and 3) I see reason to doubt Houser’s analysis. I do not think that it is the fact that do so is non-finite which leads to this relaxation of restrictions. Rather, I think it is the context in which the non-finite form is found which does. For example, in (i), I believe that the ability of do so to have a stative antecedent results from the construction being embedded under the verb manage.
    (i) (Houser 2010, ex. 8a)
      My grandfather knows all his grandchildren’s names, and he manages to do so despite his Alzheimer’s.
    Manage already sets up a context in which, whatever its complement is, that complement must denote some activity that is under the control of the subject. Since manage establishes this context, use of do so is licensed, despite the fact that the antecedent is stative. It is only epiphenomenal that do so is non-finite. It must be non-finite in order to be the complement of manage. [^]
  29. Regarding the imbalance in numbers, instances with the past participle done were much less frequent than instances with do or doing. Also, most instances of thus do were ambiguous between anaphoric and consequential reading. [^]
  30. The reader may notice that the numbers in Table 2 are not exactly 25% of those given in Table 1. This is due to the method of selection. For example, all 12 instances of thus do came from translated material, and so 0, not 3, were annotated. Similarly, some centuries only provided a few instances of a particular form; if there were only two instances of a form in a given century, then the choice was between 0%, 50%, or 100%, meaning that more than 25% would be annotated from that century. [^]
  31. It should be noted that this notion of complexity does not attempt to account for the varying complexity of individual lexical items. For example, it could be argued that shatter may be more complex than break, or that coerce is more complex than force. I know of no reliable way to measure this sort of lexical complexity, and so leave it aside for now. [^]
  32. A value of 1 means that the discourse segments containing the antecedent and the anaphor were adjacent to one another. A value of 0 would mean that the antecedent and the anaphor were in the same discourse segment, but this did not occur in any of the annotated examples. [^]
  33. Exophora includes the lack of occurrence of any source material in the text, as well as included but non-linguistic material in the text (e.g., images, computer code) and, in the case of drama, stage directions that would be left unspoken during performance. [^]
  34. One reason that Do so may have been so hard to categorize may lie in the type of referent that it refers to. When talking about simple and complex referents, or accessibility of referents, the notions of complexity and accessibility must be relativized. Abstract referents such as events, facts, situations, etc. are necessarily more complex than object referents – a fact reflected in terminology mentioned in fn. 1, where object referents are first-order referents, and abstract referents are higher-order referents. This inherent complexity also leads to them being inherently less accessible, in that, all things being equal, simple referents are more accessible than complex referents. Since Do so must refer back to some predicate of events, and since events are necessarily more complex and less accessible than objects, there is a tension between the type of referent that it must find (a complex one) and the type of referent that it prefers (a simple one). This leads to do so preferring the least complex/most accessible referent from the more complex/less accessible event referents available. Do so is forced into a middle ground. It may be that this state of affairs has obscured the true nature of do so. [^]


Much thanks and appreciation go to Leon Bergen, Ivano Caponigro, Grant Goodall, Andy Kehler, Eva Wittenberg, and the members of UC San Diego’s Semantics Babble group for insightful comments and discussion. That said, any claims, as well as errors, in this paper are entirely the author’s own.

Competing interests

The author has no competing interests to declare.


Ariel, Mira. 2001. Accessibility theory: An overview. In Ted Sanders, Joost Schilperoord & Wilbert Spooren (eds.), Text representation: Linguistic and psycholinguistic aspects, 29–87. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/hcp.8.04ari

Brown-Schmidt, Sara, Donna K. Byron & Michael K. Tanenhaus. 2005. Beyond salience: Interpretation of personal and demonstrative pronouns. Journal of Memory and Language 53. 292–313. DOI:  http://doi.org/10.1016/j.jml.2005.03.003

Bruening, Benjamin. 2019. Passive do so. Natural Language and Linguistic Theory 37(1). 1–49. DOI:  http://doi.org/10.1007/s11049-018-9408-1

Brugman, Claudia. 2001. Light verbs and polysemy. Language Sciences 23(4–5). 551–578. DOI:  http://doi.org/10.1016/S0388-0001(00)00036-X

Butt, Miriam. 2010. The light verb jungle: Still hacking away. Complex Predicates: Cross-Linguistic Perspectives on Event Structure (January 2010), 48–78. DOI:  http://doi.org/10.1017/CBO9780511712234.004

Çokal, Derya, Patrick Sturt & Fernanda Ferreira. 2018. Processing of it and this in written narrative discourse. Discourse Processes 55. 272–289. DOI:  http://doi.org/10.1080/0163853X.2016.1236231

Cornish, Francis. 2002. Anaphora: Lexico-textual structure, or means for utterance integration within a discourse? A critique of the Functional Grammar account. Linguistics 40(3). 469–493. DOI:  http://doi.org/10.1515/ling.2002.020

Culicover, Peter W. & Jackendoff, Ray. 2005. Simpler syntax. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199271092.001.0001

Davidson, Donald. 1969. The individuation of events. In Rescher, Nicholas (ed.), Essays in honor of Carl G. Hemper, 295–309. Dordrecht: Springer. DOI:  http://doi.org/10.1007/978-94-017-1466-2_11

Davies, Mark. 2018–. The 14 Billion Word iWeb Corpus. Available online at https://www.english-corpora.org/iWeb/.

Flambard, Gabriel. 2018. English VP anaphors: do it, do this, do that. Paris: Université Sorbonne Paris Cité dissertation.

Granath, Solveig. 2007. Size matters – or thus can meaningful structures be revealed in large corpora. In Roberta Facchinetti (ed.), Corpus linguistics 25 years on, 169–185. Amsterdam: Rodopi. DOI:  http://doi.org/10.1163/9789401204347_011

Gundel, Jeanette K., Nancy Hedberg & R. O. N. Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language 69. 274–307. DOI:  http://doi.org/10.2307/416535

Hallman, Peter. 2004. Constituency and agency in VP. West Coast Conference on Formal Linguistics (WCCFL) 23. 304–317.

Hankamer, Jorge & Ivan Sag. 1976. Deep and surface anaphora. Linguistic Inquiry 7. 391–426.

Harper, Douglas. 2020. Online Etymology Dictionary. Available online at https://www.etymonline.com/.

Hinrichs, Erhard. 1985. A compositional semantics for aktionarten and NP reference in English. Columbus, OH: The Ohio State University dissertation.

Houser, Michael John. 2010. The syntax and semantics of do so anaphors. Berkeley, CA: University of California dissertation.

Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/9781316423530

Jesperson, Otto. 1949. A Modern English grammar on historical principles. Denmark: Aalborg Stiftsbogtrykkeri.

Kehler, Andrew & Gregory Ward. 1999. On the semantics and pragmatics of ‘identifier so’. In Ken Turner (ed.), The semantics/pragmatics interface from different points of view, 233–256. Amsterdam: Elsevier.

Kehler, Andrew & Gregory Ward. 2007. Event reference and semantic transparency. Western Conference on Linguistics (WECOL) 27. 115–127.

Kehler, Andrew & Hannah Rohde. 2013. A probabilistic reconciliation of coherence-driven and centering-driven theories of pronoun interpretation. Theoretical Linguistics 39). 1–37. DOI:  http://doi.org/10.1515/tl-2013-0001

Lakoff, George & John Ross. 1966. Criterion for verb phrase constituency. Mathematical linguistics and automatic translation.

Landman, Meredith & Marcin Morzycki. 2002. Event-kinds and the representation of manner. West Coast Conference on Formal Linguistics (WCCFL) 11. 1–12.

Luce, Kanan, Jeffery Geiger, Christopher Kennedy & Ming Xiang. 2018. Interpretations of VP anaphora through reference to salient events. Linguistic Society of America (LSA) 3(1). 1–38. DOI:  http://doi.org/10.3765/plsa.v3i1.4326

Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak & Erez Lieberman Aiden. 2010. Quantitative analysis of culture using millions of digitized books. Science 331. 176–182. DOI:  http://doi.org/10.1126/science.1199644

Miller, Philip. 1990. Pseudo gapping and do so substitution. Chicago Linguistics Society (CLS) 26. 293–305.

Miller, Philip. 2011. The choice between verbal anaphors in discourse. Lecture Notes in Computer Science. 7099 LNAI, 82–95. DOI:  http://doi.org/10.1007/978-3-642-25917-3_8

Miller, Philip. 2013. Usage preferences: The case of the English verbal anaphor do so. International Conference on Head-Driven Phrase Structure Grammar 20. 121–139.

Nevalainen, Terttu. 2006. Introduction to Early Modern English. Edinburgh: Edinburgh University Press.

Parsons, Terence. 1990. Events in the semantics of English: A study in subatomic semantics. Cambridge, MA: MIT Press.

Prince, Ellen F. 1992. The ZPG letter: Subjects, definiteness, and information-status. In Mann, William C. & Thompson, Sandra A. (eds.), Discourse description: diverse analyses of a fund raising text, 295–325. Amsterdam: John Benjamins Publishing Company. DOI:  http://doi.org/10.1075/pbns.16.12pri

Ross, John. 1972. Act. In Davidson, Donald & Harman, Gilbert (eds.), Semantics of natural language, 70–126. Dordrecht: D Reidel Publishing Company. DOI:  http://doi.org/10.1007/978-94-010-2557-7_4

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. Comprehensive grammar of the English language. London: Longman

Sag, Ivan & Jorge Hankamer. 1984. Toward a theory of anaphoric processing. Linguistics and Philosophy 7. 325–345. DOI:  http://doi.org/10.1007/BF00627709

Trnavac, Radoslava & Maite Taboada. 2016. Cataphora, backgrounding and accessibility in discourse. Journal of Pragmatics 93. 68–84. DOI:  http://doi.org/10.1016/j.pragma.2015.12.008

Ward, Gregory & Andrew Kehler. 2005. Syntactic form and discourse accessibility. In António Branco, Tony McEnery & Ruslan Mitkov (eds.), Anaphora processing: Linguistic, cognitive and computational modelling, 365–384. Amsterdam: John Benjamins Publishing Company. DOI:  http://doi.org/10.1075/cilt.263.21war

Webber, Bonnie. 1986. Two steps closer to event reference. Technical report CIS-86-75, Department of Computer and Information Science, University of Pennsylvania.

Webber, Bonnie. 1987. Position Paper: Event Reference. Proceedings of the 1987 Workshop on Theoretical Issues in Natural Language Processing, 158–163. DOI:  http://doi.org/10.3115/980304.980341

Webber, Bonnie. 1991. Structure and ostension in the interpretation of discourse deixis. Language and Cognitive Processes 6(2). 107–35. DOI:  http://doi.org/10.1080/01690969108406940

Wittenberg, Eva. 2016. With light verb constructions from syntax to concepts (Potsdam Cognitive Science Series 7). Potsdam: University of Potsdam.

Wolf, Florian, Edward Gibson, Amy Fisher & Meredith Knight. 2003. A procedure for collecting a database of texts annotated with coherence relations. Database documentation.