1 Projection of co-speech gestures

Spoken utterances are frequently accompanied by manual gestures. In this paper, we focus on the meaning contributions of co-speech gestures, which co-occur simultaneously with spoken language expressions, and enrich the spoken language utterance by depicting some aspect of the denoted situation (Schlenker 2018a). For example, if the speaker points upwards while uttering the sentence, “Jane took the stairs”, the contribution of the upwards-oriented gesture is intuitively quite clear, namely we infer that Jane went up the stairs. As another example, imagine the speaker produces the co-speech gesture LARGE in Figure 1, while uttering the sentence in (1).1

(1) The philosopher brought [a bottle of beer]_LARGE to the party.

Figure 1 

The co-speech gesture LARGE.

Here too, the meaning contribution of the gesture is fairly evident, namely we infer that the philosopher brought a large bottle of beer to the party. On the other hand, consider a slightly more complex example like (2), drawn from Schlenker (2018a), wherein the LARGE gesture is embedded under the quantified expression “exactly one.”

(2) Exactly one philosopher found [a bottle he liked]_LARGE.

Schlenker (2018a) suggests that (2) gives rise to three inferences: (i) that a philosopher found a bottle that he liked, (ii) that no other philosopher found a bottle that he liked, and (iii) that the bottle the philosopher found was large. Strikingly, the gesture modifies the positive part of the meaning of the sentence (we infer that the bottle that the philosopher found was large) but not the negative part (the negative inference is stronger than no philosopher found a large bottle that he liked). As discussed in Schlenker (2018a) and Tieu et al. (2017), the question then is how precisely co-speech gestures interact with the logical structure of the sentences they co-occur with.

1.1 Theoretical background

As discussed in Schlenker (2018a) and Tieu et al. (2017), there are three possible theories of co-speech gestures one might consider: an at-issue theory, a cosuppositional theory, and a supplemental theory. We will briefly summarize the three approaches here (but see Schlenker 2018a and Tieu et al. 2017 for more detailed discussion).

According to an at-issue analysis, co-speech gestures would make at-issue contributions to the meanings of the sentences they modify. A sentence like (3a) would effectively be interpreted along the lines of (3b).

(3) a. The philosopher brought [a bottle of beer]_LARGE to the party.
  b. The philosopher brought a bottle of beer that was [this]_LARGE large to the party.

An at-issue analysis runs into problems when we consider embedded environments, for instance, when the gesture is embedded under negation, as in (4). Here, the co-speech enrichment unexpectedly projects through the negation. While (4a) conveys that the speaker didn’t bring any bottle of beer to the party, it arguably also triggers the inference that if the speaker had brought one to the party, it would have been a large one.

(4) a. The philosopher didn’t bring [a bottle of beer]_LARGE to the party.
  b. The philosopher didn’t bring a bottle of beer that was [this]_LARGE large to the party.

In contrast to (4a), (4b) seems to trigger the (defeasible) implicature that the philosopher did actually bring a bottle of beer to the party. This is because (4b) evokes the more informative alternative “The philosopher didn’t bring a bottle of beer to the party”, hence the implicature that the philosopher did bring a bottle of beer to the party. As Tieu et al. (2017) discuss, an at-issue analysis would need to account for why (4a) and (4b) give rise to distinct inferences.

Ebert & Ebert (2014) posit instead that co-speech gestures contribute supplement-like meanings, in the way that appositive relative clauses do. On their account, a sentence like (3a), repeated below in (5a), would instead be interpreted along the lines of (5b); specifically, the size of the beer bottle is not at issue.

(5) a. The philosopher brought [a bottle of beer]_LARGE to the party.
  b. The philosopher brought a bottle of beer, which (by the way) was [this]_LARGE large, to the party.

For our purposes, on the supplemental theory, co-speech gestures essentially give rise to readings that can be paraphrased with appositive relative clauses. As with the at-issue analysis, however, things are less straightforward in cases of embedding under an operator. For instance, (6a) and (7a) both support the presence of the co-speech gesture LARGE, whereas their appositive counterparts in (6b) and (7b) are generally unacceptable (see Schlenker 2018a; b for discussion; see also Ebert 2017 for experimental work aimed at investigating judgments of sentences like (6a) and (7a)).

(6) a. The philosopher didn’t bring [a bottle of beer]_LARGE to the party.
  b. ? The philosopher didn’t bring a bottle of beer, which (by the way) was [this]_LARGE large, to the party.

(7) a. No philosopher brought [a bottle of beer]_LARGE to the party.
  b. ? No philosopher brought a bottle of beer, which (by the way) was [this]_LARGE large, to the party.

As discussed by Schlenker (2018a) and Tieu et al. (2017), a supplemental theory moreover predicts that one might find universal-like inferences in quantified environments, but not existential ones. With certain assumptions about the mood of the appositive,2 (8a) might be expected to be interpreted along the lines of (8b), in which case we might obtain an inference that for each of the relevant philosophers, the bottle of beer s/he brought would have been large; however, there is no reading on which this condition would have to be satisfied for only some of the philosophers.

(8) a. None of these ten philosophers brought [a bottle of beer]_LARGE to the party.
  b. None of these ten philosophers brought a bottle of beer, which (by the way) would have been [this]_LARGE large, to the party.

An alternative to the at-issue and supplemental views is the cosuppositional theory (Schlenker 2018a; b), according to which co-speech gestures trigger presuppositions, and more specifically conditionalized presuppositions (called “cosuppositions”). Presuppositions of verbal expressions like those in (9) and (10) have been shown to project universally from “none”-NPs (Chemla 2009). On the cosuppositional view, the inferences of co-speech gestures would likewise project universally.3

(9) None of my students knew that he was incompetent.
  Each of my students was incompetent (and male)

(10) None of these ten students takes good care of his computer.
  Each of these ten students has a computer (and is male)

Schlenker (2018a; b) formalizes these intuitions within a dynamic semantics (see Heim 1983; Schlenker 2009), according to which presuppositions must be satisfied in their local contexts. In essence, co-speech gestures trigger presuppositions that their content is entailed by that of the expressions they modify:

(11) Cosuppositions triggered by co-speech gestures
  Let G be a co-speech gesture co-occurring with an expression d, and let g be the content of G. Then G triggers a presupposition dg, where ⇒ is generalized entailment (among expressions whose type ends in t).

The presuppositions triggered by co-speech gestures are conditionalized on the assertive content of the expressions they modify. This view predicts that gestural inferences, like verbal presuppositions, will display projective behavior in various linguistic environments, including questions, under negation, and quantified environments. As for how these inferences are predicted to project, this depends on whether one combines the cosuppositional view with a universal projection or an existential projection theory of presuppositions. On the universal projection theory (e.g., Heim 1983; Schlenker 2009), all quantifiers trigger a universal presupposition or something close to it (as in (9) and (10)), so we would predict that a sentence like (8a) triggers the inference that for each of the relevant ten philosophers, if they were to bring a bottle of beer, it would be large. On the other hand, if the cosuppositional analysis were combined with an existential projection theory (Beaver 2001), we would expect presuppositions to project existentially. (9) would trigger the inference that at least one of the relevant ten students was incompetent, and likewise (8a) would trigger the inference that for at least one of the relevant ten philosophers, if s/he were to bring a bottle of beer, it would be large.4

While the cosuppositional theory could in principle be reconciled with an existential or a universal theory of presupposition projection, however, only the latter pattern of projection would possibly be compatible with the supplemental theory. As discussed in Schlenker (2018a) and Tieu et al. (2017), certain assumptions about the size of the antecedent of the non-restrictive pronoun and the mood of the appositive could lead the supplemental theory to predict universal-like inferences, whereas no plausible version of the theory would predict existential projection.

A second difference between the cosuppositional theory and the supplemental theory is worth emphasizing. In the sentences we are interested in, it is very difficult to understand the supplement as making an at-issue contribution within the scope of an operator. (6b), repeated below in (12a), does not seem to be easily paraphraseable as the explicitly at-issue (4b), repeated below in (12b). Similarly, it’s not so obvious that (7b), repeated below in (13a), should be paraphrased as in (13b).

(12) a. ? The philosopher didn’t bring a bottle of beer, which (by the way) was [this]_LARGE large, to the party.
  b. The philosopher didn’t bring a bottle of beer that was [this]_LARGE large to the party.

(13) a. ? No philosopher brought a bottle of beer, which (by the way) was [this]_LARGE large, to the party.
  b. No philosopher brought a bottle of beer that was [this]_LARGE large to the party.

The source of this observation is complex and debated, as some authors have argued that under restricted conditions appositive relative clauses can take scope under other operators (Schlenker 2010). What matters for present purposes is that in the examples under investigation, at-issue, narrow scope readings of appositives seem to be very difficult to access. By contrast, all standard theories of presupposition have a mechanism of local accommodation (Heim 1983) that allows presupposition triggers to make, at some cost, an at-issue rather than a presuppositional contribution. In this connection, Schlenker (2018a) specifically argues that co-speech gestures trigger weak presuppositions (more specifically, weak cosuppositions), comparable to those of so-called soft triggers, which by definition are easily locally accommodated. To illustrate, in (14a), “realize” behaves like a bona fide presupposition trigger, and yields the inference that the President is not telling the truth. But the facts are different in (14b) (from Karttunen 1971), which means something like: If at some point I have not told the truth and I realize this, I will tell you. In this case, the factive inference appears to make an at-issue contribution within the scope of the if-clause; this possibility is what makes “realize” a soft trigger (see Karttunen 1971; Heim 1990).

(14) a. Does the President realize he is not telling the truth?
  b. If I realize later that I have not told the truth, I will confess it to everyone.

To summarize, the predictions of the at-issue analysis differ sharply from those of the supplemental theory and of the cosuppositional theory. The latter two are harder to tease apart; one reason is that there are many choice points in the supplemental theory (see Schlenker 2018a for discussion), and some combinations of them yield predictions that are rather similar to those of the cosuppositional theory (which itself comes in multiple versions depending on one’s preferred theory of presupposition projection). But there are two salient differences. First, no plausible version of the supplemental theory predicts existential projection under quantificational expressions (whereas some versions of the supplemental theory can predict universal projection). Second, in the sentences under study here, there is little plausibility to the claim that bona fide supplements such as appositive relative clauses can make an at-issue contribution within the scope of operators. By contrast, all theories of presupposition have a mechanism of local accommodation that should extend to cosuppositions, especially if co-speech gestures are analyzed as weak presupposition triggers. Finally, it bears emphasizing that several versions of the supplemental analysis would lead one to expect that co-speech gestures should be deviant in some negative environments, whereas no version of the cosuppositional analysis makes such a prediction.

1.2 Experimental background

While researchers have examined gestures from neurolinguistic and developmental perspectives (e.g., Kelly & Church 1998; McNeil et al. 2000; Mayberry & Nicoladis 2000; O’Neill et al. 2002; Holle & Gunter 2007; Özyurek et al. 2007; Alibali et al. 2009; Gullberg 2009; Kelly et al. 2009; Kidd & Holler 2009; Botting et al. 2010; Göksun et al. 2010; Cartmill et al. 2012; Dick et al. 2012; Özçalişkan & Dimitrova 2013; Özyürek 2014; Emmorey & Özyürek 2014; Hrabic et al. 2014), few studies have investigated the ways in which co-speech gestures interact with the logical structure of the sentences in which they are found.

Tieu et al. (2017) used a Truth Value Judgment Task and a Picture Selection Task to investigate the projection properties of inferences arising from the iconic co-speech gestures UP and DOWN in six different linguistic environments (plain affirmative and negative sentences, modal sentences containing “might”, and quantified sentences containing “each”, “none”, and “exactly one”), as in (15a)–(15f).

(15) a. The girl will [use the stairs]_UP.
  b. The girl will not [use the stairs]_UP.
  c. The girl might [use the stairs]_UP.
  d. Each of these three girls will [use the stairs]_UP.
  e. None of these three girls will [use the stairs]_UP.
  f. Exactly one of these three girls will [use the stairs]_UP.

In the first experiment, participants were presented with videos of a speaker producing sentences like (15a)–(15f), paired with cartoon scenarios depicting characters who might go up or down a set of stairs. Participants had to judge whether the video descriptions were true or false of the depicted scenarios. Crucially, some of the characters could be blocked from using the stairs (by a barricade); hence the scenarios could be made compatible with the kinds of conditionalized inferences predicted by the cosuppositional theory, e.g., If the girl were to use the stairs, she would go up the stairs. In a second experiment, a different group of participants was presented with the same videos paired with two pictures at a time, and participants were asked to select the picture they felt best matched the speaker’s description. Applying a reading detection analysis, Tieu et al. found evidence for existential projection of the gestural inferences from the scope of “each”, “none”, and “exactly one”, and, to some degree, local accommodation of the inferences.

As we saw in Section 1.1, both the finding of existential projection and the finding of local accommodation are rather difficult to reconcile with a supplemental analysis of co-speech gestures. On the other hand, as Tieu et al. conclude, the results can be derived by the cosuppositional analysis, in combination with an existential theory of presupposition projection along the lines of Beaver (2001).

While the results presented in Tieu et al. (2017) afford us some progress in adjudicating between possible theoretical analyses of co-speech gestures, the finding of existential projection is somewhat surprising in light of earlier experimental work showing that verbal presuppositions like those in (9) and (10) project universally from the scope of quantifiers like “none” (Chemla 2009). In the present study, we pursue the same question (how do the inferences of co-speech gestures project?) using a different methodology that more closely tracks the introspective judgments used by linguists in the recent literature on gesture projection. To anticipate, the present experiment will confirm that there are non-trivial patterns of projection, specifically of universal rather than existential projection, and also that local accommodation is an option. As we will discuss, both findings are expected on standard theories of presupposition projection such as Heim (1983).

2 Experiment

We tested the same set of gesture-speech combinations (involving the directional gestures UP and DOWN) as those described in Tieu et al. (2017).5 Rather than asking for truth value judgments or picture selections, however, we elicited inferential judgments by asking participants directly to rate how strongly the target sentences would lead them to infer the target inferences. As in Tieu et al. (2017), we tested versions where the direction was contributed only by the gesture, e.g., (16a), and at-issue counterparts where the gesture was supported by a verbally asserted phrase “in this direction”, e.g., (17).

(16) a. The boy will not [use the stairs]_DOWN.
  b. Conditional inference: If the boy were to use the stairs, he would go down the stairs.

(17) The boy will not use the stairs [in this direction]_DOWN.

A high endorsement rate would suggest that participants could indeed draw the target inference from the test sentence; but if the inference was even more strongly endorsed for the GESTURE target (16) than for the AT-ISSUE counterpart (17), we could be that much more confident that the inference was indeed a specific contribution of the co-speech gesture itself.

2.1 Methods

2.1.1 Participants

Participants were recruited through Amazon Mechanical Turk, and were paid 1.20 USD for their participation. Two participants were excluded from analysis as they did not report English as one of their native languages, leaving a total of 125 participants.

2.1.2 Procedure

Participants were directed to a web-based Inferential Judgment Task, created and hosted on the Qualtrics platform. They were presented with a series of videos, each containing a native speaker producing a test sentence. Each video was accompanied by a sentence appearing below the video, and a slider scale. Participants were told to rate the degree to which the video suggested the inference that appeared beneath the video, by dragging the cursor to fill the bar as much as needed. The ends of the scale were labelled as “Not at all” and “Very strongly”, but ultimately the ratings were linearly mapped to a scale from 0 to 100% endorsement. Figure 2 presents a screenshot from the Negation trial corresponding to (16)/(17).

Figure 2 

Screenshot of a Negation trial. The sentence produced by the speaker was “The boy will not [use the stairs]_DOWN” for the GESTURE target and “The boy will not use the stairs [in this direction]_DOWN” for the AT-ISSUE counterpart.

The task took on average 6–7 minutes to complete. The instructions that participants saw are provided in Appendix A.

2.2 Materials

One group of participants saw the GESTURE targets and the other group saw the AT-ISSUE counterparts. Each group saw 11 trials in total, corresponding to nine target trials and two gestureless controls. The nine target inferences were distributed as follows: the Unembedded, Might, and Negation targets each had a single target inference, and the three quantificational environments (Each, None, and Exactly-one) were each associated with two target inferences (an existential one and a universal one), presented on separate trials. Examples of the target sentences and their corresponding inferences are provided in Table 1. The subject NP (i.e. “the girl(s)” vs. “the boy(s)”) and the direction of the gesture (i.e. UP vs. DOWN) in the test sentences were automatically randomized across trials. In all, participants saw one repetition of each target inference, and all participants saw all six linguistic environments.

Environment Example sentence Target inference

Unembedded The boy will [use the stairs]_UP. Directional: The boy will go up the stairs.
Might The boy might [use the stairs]_UP. Conditional: If the boy were to use the stairs, he would go up the stairs.
Negation The boy will not [use the stairs]_UP. Conditional: If the boy were to use the stairs, he would go up the stairs.
Each Each of these three boys will [use the stairs]_UP. Existential: For at least one of these three boys, if he were to use the stairs, he would go up the stairs.
Universal: For each of these three boys, if he were to use the stairs, he would go up the stairs
None None of these three boys will [use the stairs]_UP. Existential: For at least one of these three boys, if he were to use the stairs, he would go up the stairs.
Universal: For each of these three boys, if he were to use the stairs, he would go up the stairs.
Exactly one Exactly one of these three boys will [use the stairs]_UP. Existential: For at least one of these three boys, if he were to use the stairs, he would go up the stairs.
Universal: For each of these three boys, if he were to use the stairs, he would go up the stairs.

Table 1

Target sentences and their corresponding inferences, for each environment, using “boy(s)” and UP as an example. There were four possible versions of each test sentence, created by alternating “boy(s)”/”girl(s)” and UP/DOWN.

Given our focus on directional gestures specifically involving the predicate “use the stairs”, we included two gesture-less controls that allowed us to ensure that the predicate “use the stairs” wasn’t inherently associated with just one of the two directions, for example, using the stairs to go up. On these control trials, participants saw the speaker produce a regular unembedded sentence without any gesture at all, e.g., (18a), and had to rate the strength of a directional inference, e.g., (18b) or (18c).

(18) a. Speaker: The boy will use the stairs.
  b. UP inference: The boy will go up the stairs.
  c. DOWN inference: The boy will go down the stairs.

If the predicate “use the stairs” was not inherently associated with one particular direction, we expected low and roughly equal rates of endorsement for the UP and DOWN inferences. Each participant received one UP control and one DOWN control (the subject NP “the boy”/“the girl” was again randomized).

2.3 Results

The data and R script for this experiment are available online at https://semanticsarchive.net/Archive/jBiMmUwM/TieuPasternakSchlenkerChemla-GestureInferences.html.

2.3.1 Gesture-less controls

In response to the gesture-less control video, participants gave a mean endorsement of 35% (SE = 2.3%) for the DOWN inference and a mean endorsement of 37% (SE = 2.5%) for the UP inference. A linear mixed effects model was fitted to the responses using the lme4 package in R (R Core Team 2016; Bates et al. 2015), with inferred direction (UP vs. DOWN) as a fixed effect and random by-participant intercepts. A model comparison revealed that the model with inferred direction as a predictor did not fare significantly better than the model without it (χ2(1) = 2.2, p = .14). Given that inferred direction had no significant effect, we can be reassured that our group of participants did not display any inherent bias for associating “use the stairs” with one particular direction; that is, using the stairs could apply equally well to going up or going down the stairs.

2.3.2 GESTURE targets and AT-ISSUE controls

Figure 3 presents the percentage of endorsement for each target inference, across the six linguistic environments. As discussed earlier, high endorsement rates would suggest that participants could indeed draw the target inferences from the test sentences. From Figure 3, we can see impressionistically that some inferences were undoubtedly endorsed, namely the ones associated with Unembedded and Might, both the universal and existential inferences for Each, and the existential inference for Exactly-one. But it would be more informative to know when the target inferences were more strongly endorsed for the gesture target than for its associated at-issue control, because this would indicate that the inference was specifically a contribution of the co-speech gesture, rather than merely a default inference triggered for irrelevant reasons.

Figure 3 

% endorsement of target inferences in each linguistic environment. Error bars represent standard error of the mean across participants. Each dot represents an individual participant’s endorsement of the given inference in the given environment (a horizontal jitter of .7 and vertical jitter of .02 were added for easier visualization).

Table 2 provides the mean endorsement for each target inference; asterisks (*) indicate inferences that were significantly more endorsed for the gesture targets than for their at-issue counterparts. For each of the target inferences, we used linear regression models to determine whether there was a significant effect of Condition (GESTURE vs. AT-ISSUE) on the inferential judgment responses; details about these models are provided in Appendix B.

Environment Target inference GESTURE endorsement AT-ISSUE endorsement GESTURE/AT-ISSUE comparison

Unembedded Directional 87% 98% F = 19, p < .001
Might Conditional* 78% 69% F = 5.6, p < .05
Negation Conditional* 41% 6% F = 34, p < .001
Each Existential 91% 95% F = 2.6, p = .11
Universal 90% 98% F = 21, p < .001
None Existential* 38% 14% F = 14, p < .001
Universal* 35% 8% F = 20, p < .001
Exactly one Existential 87% 94% F = 6.5, p < .05
Universal* 64% 44% F = 8.9, p < .01

Table 2

% endorsement of target inferences in each linguistic environment. Asterisks (*) indicate that the inference associated with the GESTURE target was endorsed significantly more than its AT-ISSUE counterpart.

Before discussing the results for each environment, a general remark will be useful. In most theories, standard presupposition projection in the sentences under consideration gives rise to readings that are at least as strong as those that would be obtained if the presupposition were just a normal entailment of the expression that triggers it.6 Technically, this holds in particular if the presupposition of an expression is also entailed by it (Klinedinst 2016 and Sudo 2013 assume that some presupposition triggers do not satisfy this assumption; we discuss the consequences of this for the cases at hand in Appendix C).

For instance, John knows that he is incompetent uncontroversially and strongly triggers the inference that John is incompetent, as in the sentence John correctly believes that he is incompetent. Such a reading can also be thought of as the one obtained when local accommodation is applied to the presupposition trigger. In a target sentence such as x will [use the stairs]_UP, the purported presupposition is that if x uses the stairs, x will go up the stairs, and when this conditional gets added to the at-issue component, we get the inference that x will go up the stairs. This is precisely the content of our at-issue controls. For this reason, whenever a target inference is entailed by our at-issue controls, it can be expected to be strongly present in the gesture targets as well.

Let us now consider our various linguistic environments, starting with the non-quantified cases.

2.3.2.1 Non-quantified cases: Unembedded, Might, Negation

We observe that the conditional inferences are more strongly endorsed for the Might and Negation gesture targets than for their respective at-issue controls. This suggests that at least part of these conditional inferences is due to the co-speech gestures. In the Unembedded condition, there is also a significant difference between the gesture targets and at-issue controls, but going in the opposite direction: endorsement of the inference is now stronger for the at-issue control. This is rather unsurprising, for two reasons. First, the target inference is entailed by the at-issue sentence, and as we noted at the outset, we therefore expect it to be strongly present in the gesture target as well. Second, however, the at-issue control contains a demonstrative (“in this direction”) which is impossible to interpret without taking the gesture into account, whereas in the target the co-speech gesture can be disregarded without affecting the grammaticality of the sentence. This plausibly explains why the relevant inference is slightly weaker in the gesture target.

2.3.2.2 Each

Turning to the quantified cases, the same observations can be made about the Each environment: just as in the Unembedded environment, the inferences are triggered from a plain positive environment, and hence they should be strongly present in both the gesture target and the at-issue control. But they are a bit weaker for the gesture target, possibly because the gesture can be ignored more easily than in the at-issue control.

2.3.2.3 None

More interestingly, the results from the None environment provide evidence that, at least in part, both the Universal and the Existential inferences are specific to the gesture target. Note that the strength of this effect is the same in the two cases (there is no interaction between the two differences that we see here; see Appendix B for details). This suggests that the existential inference is best explained as a consequence of the stronger universal inference, for if there were an independent reason to derive it, it should strictly speaking be more strongly endorsed than the universal inference. In short, the results from the None environment provide evidence for universal projection rather than existential projection.

2.3.2.4 Exactly one

Finally, the Exactly-One environment is the most complicated — and perhaps the most striking. First, note that in this case the Existential inference follows from the meaning of the sentence in the gesture target and in the at-issue control (assuming the cosuppositional inference is indeed an entailed presupposition). Accordingly, we observe the same pattern of results as in the Unembedded and Each environments: strong endorsement, but weaker in the gesture condition than in the at-issue control, which again, could be due to the gesture being more easily ignorable in the gesture target than in the at-issue control. The case of the universal inference is particularly interesting, however, since in this case it does not follow from the control, and can be present in the targets only by way of a mechanism of projection.7 It is thus striking that it is present to a greater degree in the gesture target than in the at-issue control.8 Overall, this provides strong evidence for the conclusion that the gesture specifically triggers a universal inference under Exactly-one.

2.3.3 Presence of local accommodation

While we did not explicitly gather inferential judgments regarding local accommodation, the contrast between responses to the positive environments and responses to the negative environments is strongly suggestive of the presence of local accommodation, at least in negative environments. In particular, locally accommodating the directional inference under Negation and None should lead one to reject the target inference, which in turn could explain the lower endorsement rates observed in both conditions.9

To be cautious, one might wish to rule out alternative reasons why participants might not have endorsed the inferences as strongly in the negative environments compared to the positive ones. As an alternative to local accommodation, one might posit that participants in our experiment simply ignored the gestures produced by the speaker. Consider for instance the Negation target “The boy will not [use the stairs]_UP.” By its very meaning, the at-issue control “The boy will not use the stairs [in this direction]_UP” blocks the target inference if the boy uses the stairs, he will go up the stairs. Thus the higher endorsement rate found for the gesture target might simply be due to the fact that the gesture is more easily ignored in that sentence. In fact, the possibility of ignoring the gesture can explain why, when the inference is entailed by the at-issue control and is therefore expected to be strongly present in both gesture targets and at-issue controls, it can show up more strongly in the controls, since the demonstrative this makes explicit reference to the gesture. For this reason, we can use the observed difference between the gesture targets and the at-issue controls in positive environments to evaluate the strength of the possibility of ignoring the gesture. This difference is largest in the Unembedded environment, where it reaches 11%. But the drop corresponding to the possibility of local accommodation is four times as large: the difference between the endorsement of the relevant inference in the Unembedded environment and that in the Negation environment is 46%, and hence is likely not due solely to the possibility of ignoring the gesture; rather it is suggestive of the possibility of local accommodation.10

A simpler possibility is that participants in our experiment didn’t like to endorse inferences associated with negative sentences, or, along similar lines, that negative sentences somehow primed negative ratings. This explanation seems entirely ad hoc, but given that we did not test other types of negative sentences (with or without gestures), it is hard to exclude. Still, if such an analysis is on the right track, we would expect it not to apply to the Exactly-one environment, which crucially is neither positive nor negative. This environment would therefore be the place to look, in order to assess the import of local accommodation. Consider the example in (19).

(19) Exactly one of these three girls will [use the stairs]_UP.
  a. Existentially project directional inference: Exactly one of these three girls will use the stairs, and for at least one of these three girls, if she were to use the stairs it would be in an upwards direction.
  b. Universally project directional inference: Exactly one of these three girls will use the stairs, and for each of these three girls, if she were to use the stairs it would be in an upwards direction.
  c. Locally accommodate directional inference: Exactly one of these three girls will use the stairs in an upwards direction.

We observe that for the gesture targets, the universal inference is endorsed less strongly than the existential inference. This might be because the existential inference is weaker, and thus more likely to be true than the universal inference (if only for probabilistic reasons). But this difference, which is quite large, could also be revealing the extent to which local accommodation is possible: local accommodation validates the existential but not the universal inference, and therefore would contribute to such a difference. (Another possibility is to claim that both existential and universal inferences are possible in general, an option we return to below.11)

A further issue, raised by an anonymous reviewer, concerns the interpretation of the conditionals in the Negation and None conditions. The reviewer notes that some participants might have reported their confidence in whether the girl(s) would actually use the stairs in the mentioned direction; in the negative test sentences, these participants would then report low confidence that the girls would go up or down the stairs. Given that the inferences for the Each and Exactly-one conditions used a conditional despite the affirmative aspect of the quantifiers, some participants might not have interpreted the conditionals in the Negation and None conditions as purely counterfactual.

The reviewer is right that one must be careful about the possible interpretations the participants might have had for the conditional in the target inference, and whether these could explain the results without appealing to local accommodation. One possibility one might consider, for example, is that the antecedent of the conditional was sometimes ignored (i.e. if x, y was interpreted as: y), or that the conditional was interpreted conjunctively (i.e. if x, y was interpreted as: x and y). In our cases, this would boil down to the same thing because y (e.g., … will go up the stairs) entails x (… uses the stairs). This hypothesis would indeed explain the results in negative contexts, but it seems to us to be implausible. Our inferences had the form:

(20) If the boy were to use the stairs, he would go up the stairs.

Even if one were to ignore the antecedent, the consequent simply wouldn’t have the right form to stand as an independent sentence, since the mood (“would”) is ill-suited for that purpose. A conjunctive interpretation of the entire conditional seems even less plausible.

Another possibility is that the conditional was not interpreted as subjunctive, but rather as indicative, i.e. If the boy uses the stairs, he will go up the stairs. It could then be rejected because the indicative triggers the presupposition that the boy might use the stairs, which in turn is contradicted by the negative statement. This seems implausible in view of the form we chose for the conditional: “If the boy were to use the stairs…”, and not “If the boy uses the stairs…” But one could argue that the use of the conditional in the Each condition could have led participants to make this mental correction. Even with this correction, however, we would need something non-standard, to the effect that If the boy uses the stairs, he will go up the stairs triggers the inference that the boy might use the stairs (to explain why this is rejected in negative conditions), but does not trigger the inference that the boy might also not use the stairs. The reason for this is that if the latter inference is equally triggered, we predict that under Each the target inference should be rejected too, contrary to fact. While it is not entirely impossible to entertain this possibility (as the might use the stairs inference could have a different source from the might not use the stairs inference), this alternative analysis of the data does not seem very plausible to us.

On the other hand, if the conditional was interpreted as expected — as subjunctive (as suggested by the form “if the boy were to use the stairs, he would go up the stairs”) — we would not be able to explain the decrease in endorsement rates in the critical cases, and thus the local accommodation hypothesis would be needed.

Overall then, it seems that alternative explanations for the results that do not appeal to local accommodation but to an alternative reading of the conditional inference are possible, but not very plausible. Although we did not explicitly gather inferential judgments regarding local accommodation, the overall data reveal various pieces of evidence that are suggestive of the presence of local accommodation; a future study might investigate the presence of local accommodation more directly.

Before closing, we would like to address a possible prediction that could be more systematically investigated in a future study. An anonymous reviewer suggests that some form of correlation between the judgments in the Negation, None, and Exactly-one conditions would support the existence of local accommodation, while a correlation between Negation and None independently of Exactly-one would instead favor a supplemental analysis. While evaluating the nature of the correlation may turn out to be rather complex (as we discuss below), it is indeed worth considering the predictions that the different theories make.

2.3.3.1 Cosuppositional + local accommodation theory

First, let’s assume that local accommodation is available and stable, in the sense that participants who apply local accommodation in one case are likely to do it across environments (and across the whole experiment). We should then expect to find a group of people, “local accommodators”, who: (i) reject the inference in the Negation condition, (ii) reject the universal inference in the None condition (or possibly accept the existential inference through some independent mechanism of competition), and (iii) accept the existential inference in the Exactly-one condition. We thus expect a complex, inverse correlation: people who reject the universal inference in the negative conditions accept the existential inference in the Exactly-one condition. Note that it is not completely straightforward to identify which individuals derive the existential inference specifically: one would have to construct an index that captures the fact that they accept the existential inference and reject the universal inference. We agree with the reviewer that there could be a rich prediction to explore here, but it would be a bit complicated to test: it would involve correlating yes-responses with a mixed bag of no-responses, which could moreover be obscured if yes-/no-biases turn out to be more “stable” (in the sense above) than the expected pattern.

2.3.3.2 Supplemental theory

Turning to the supplemental theory, we would first need to explain the low endorsement rate of inferences in the negative sentences; perhaps supplements are degraded under negative expressions, or people have some general tendency to reject inferences when the sentence is negative. We might further suppose that this is a stable property, such that people who show this tendency in the Negation condition also do so in the None condition. This makes the same prediction as the local accommodation approach above, as far as these two conditions are concerned. But let us note that the “rejection” rates in the negative conditions are as high as 60–65%. It seems unlikely that a general tendency to reject negative sentences would be that strong; for example, although we did not test such cases, it seems likely that participants asked to judge whether it follows from “John will not [use the stairs]_UP” that John won’t use the stairs will say “yes” at rates higher than 35%.

Moving on to the Exactly-one condition, a sophisticated supplemental theory might posit that people can freely choose between indicative and subjunctive mood for the appositive (this would not be the case in the negative conditions, where the indicative mood would be ungrammatical, see Footnote 2 and references therein). We can assume then that some people might choose the indicative mood and get an existential inference, while others might choose the subjunctive mood and obtain a universal inference, and again, this choice might be stable across the experiment. This does not lead to any predictions in terms of the relationship with the negative conditions: the reason why people reject inferences in the negative conditions would be due to some general negative bias, while the choice of answers in the Exactly-one condition would be driven by the choice of mood, which is independent from (and not relevant to) the negative conditions.

Overall, it would be quite striking to observe the correlation that is predicted by the local accommodation approach, but we think it is unlikely to come out given its shape and complexity (i.e. the difficulty of constructing the right index, the need to evaluate yes/no comparisons across conditions, and uncertainty about the stability of the relevant properties). On the other hand, the supplemental theory does not make particularly strong predictions, and is thus hard to validate. Nonetheless, the above highlights an alternative explanation for the data: a supplemental theory with (i) free choice of mood for non-negative sentences, and (ii) a 65% no-bias towards negative sentences. This is a striking possibility, but one that would require further investigation.

2.4 Discussion

We designed an inferential judgment task to assess the presence of certain directional inferences arising from the use of co-speech gestures in different linguistic environments. The results of the experiment provide evidence for the projection of the inferences of co-speech gestures. In particular, we observe that the conditional inference projects from the scope of negation, and moreover that it projects universally from the quantificational environments “none” and “exactly one”. Moreover, we observe some suggestive evidence that the inferences of co-speech gestures can be locally accommodated under negation and “none”.

How do these results bear on the existing theories of co-speech gestures? According to the at-issue analysis, a co-speech gesture behaves like an at-issue modifier that conjunctively enriches the meaning of the expression it attaches to. This theory predicts the same types of inferential patterns for our target sentences and at-issue controls. This prediction is clearly not borne out by our data: co-speech gestures modifying verbal expressions give rise to non-trivial patterns of projection.

According to the supplemental analysis, a co-speech gesture plays the same kind of semantic role as an appositive relative clause, whereas for the cosuppositional analysis, a co-speech gesture introduces a presupposition that is conditionalized on the at-issue meaning of the expression it modifies. The present results are in line with the predictions of the cosuppositional analysis, combined with “conservative” theories of presupposition projection such as Heim (1983), and the auxiliary assumption (made in Schlenker 2018a) that co-speech gestures are weak presupposition triggers. First, gestural inferences project universally out of various environments. Second, gestural inferences can to a certain extent be locally accommodated, as is expected of weak triggers.

With regards to the supplemental theory, there are certain combinations of choice points in the analysis that could deliver universal patterns of projection under quantificational elements. However, the existence of local accommodation is hard to reconcile with the behavior of appositive relative clauses in the cases at hand. Thus for the supplemental analysis to be viable, it would have to explain why co-speech gestures behave differently from appositive relative clauses in our examples, or it should challenge our discussion of the data concerning appositives, which have not been subjected to experimental study.12

An orthogonal finding from the current study is that co-speech gestures can to some extent be disregarded entirely, yielding no inference at all, unlike the case of our at-issue controls (as evidenced, for example, by the statistically significant difference in endorsement rates between the GESTURE and AT-ISSUE versions of the Unembedded targets). This need not be surprising, however: the at-issue controls would simply be incoherent if the gestures were ignored, as the demonstrative “this” would lack a denotation. By contrast, the target sentences can be entirely coherent when the co-speech gestures are disregarded.

Finally, the present results should be compared to those reported in Tieu et al. (2017). This earlier study investigated the same sentences, but used a truth value judgment task and a picture selection task instead of an inferential judgment task. Tieu et al. (2017) also found evidence of local accommodation, and they too noted that participants may have more easily disregarded the gestural inferences in the target sentences than in the at-issue controls. But an important difference is worth emphasizing: Tieu et al. (2017) found evidence of existential projection of gestural inferences, whereas the present experiment revealed evidence for universal projection.13

The results of Tieu et al. (2017) are particularly difficult to reconcile with the supplemental theory, since as noted the theory doesn’t have any natural way to explain the presence of existential inferences. The cosuppositional theory could, but for a theoretically unsatisfying reason, namely that existing analyses disagree about how presupposition projection works in quantified structures, and one of them in particular (Beaver 2001) happens to argue for existential projection. On the other hand, there may indeed be emerging empirical support for positing existential projection of presuppositions in certain environments (Zehr et al. 2016). A remaining goal for future work is therefore to reconcile the universal findings of the experiment discussed here with the existential findings of Tieu et al. (2017). Whether or not existential and universal projection are both available for presuppositions, as discussed in Zehr et al. (2016), it will be worth exploring why two different paradigms may differ in the readings they bring out. For example, since a truth value judgment task imposes a situation, perhaps it invites a disambiguation strategy that allows for access to weaker readings, even if they are not particularly prominent ones. By contrast, an inferential task might instead favor the strategy of simply going with the most prominent readings, including in cases where those prominent readings are the stronger ones. Such issues will be important to work out in future research. For an optimal test of the cosuppositional analysis, these investigations should be carried out in parallel for co-speech gestures and for standard presupposition triggers.1415

3 Conclusion

In this study, we used an inferential judgment task to investigate the projection properties of inferences arising from co-speech gestures in various linguistic environments. The present findings support a cosuppositional analysis, according to which co-speech gestures introduce presuppositions that are conditionalized on the at-issue meanings of the expressions they modify.

The present study can be seen as a complement to the earlier study by Tieu et al. (2017). On a methodological level, this new study uses a simpler inferential task that closely tracks the introspective judgments that linguists discuss in the literature. On a substantive level, it confirms both that there are non-trivial patterns of projection, and that local accommodation is an option. But it also presents clear evidence of universal projection rather than of existential projection. Both results are expected on standard theories of presupposition projection such as Heim (1983).

The long term goal will be to unify the full set of data pertaining to the projection of gestural inferences. Either the existential/universal divergence is due to an experimental bias, or one must countenance a theory of presupposition (such as that in Zehr et al. 2016) in which both existential and universal projection yield possible readings. In the latter case, one would still need to explain why these options are brought out differently by different experimental paradigms.

Finally, it is worth noting that the present findings, like those of Tieu et al. (2017), bring closer together the speech and visual modalities, with common projection patterns arising from the inferences of gestures and verbal presuppositions (see Chemla 2009 for evidence of universal projection of verbal presuppositions). The present findings are consistent with previous experimental work indicating that speakers naturally integrate semantic information conveyed by gestures. Beyond this, however, our results indicate that speakers in fact compute inferences from these gestures, and moreover that these inferences interact in specific ways with the logical structure of their linguistic environments.16

Additional File

The additional file for this article can be found as follows:

Appendices (A-C)

Co-speech gesture projection: Evidence from inferential judgments. DOI: https://doi.org/10.5334/gjgl.580.s1