In a seminal study, Bott & Noveck (
An utterance of (1a) can be interpreted in (at least) two ways.
(1) | a. | It is warm outside. |
b. | It is hot outside. |
On its
Most current theories assume that the one-sided interpretation corresponds to the
Scalar inferences are commonly explained as a variety of
According to this implicature-based explanation, the one-sided interpretation is theoretically prior to the two-sided interpretation, since the one-sided interpretation serves as a premise in the reasoning process that ultimately leads to the scalar inference (and, consequently, the two-sided interpretation). An important question is whether the theoretical priority of the literal interpretation is reflected in listeners’ cognitive processing, i.e., whether the computation of scalar inferences is associated with a
Levinson (
Proponents of
Several more recent proposals side with relevance theory in assuming that the presence of a processing cost for scalar inferencing varies with certain methodological and contextual factors. However, they do not necessarily commit to the relevance-theoretic assumption that relevance is paramount in deciding whether or not a processing cost will be observed. Thus, e.g., these proposals have argued that the presence of a processing cost depends on the question under discussion (
Testing these different theories about the processing of scalar inferences requires operationalising the notion of a processing cost. Various proposals have been made in this respect, focusing on participants’ eye movements (e.g.,
In sentence verification studies, participants are presented with a sentence and have to decide whether that sentence is true or false in a given situation. This situation can be presented pictorially or correspond to participants’ world knowledge. To carry out the verification process, it is often assumed that participants represent both the sentence and the situation in a common format, e.g., a proposition. In addition, participants initialise a truth index that tracks the truth value of the sentence. Sentence verification then consists in systematically manipulating and comparing the representations associated with the sentence and the situation, and carrying out operations on the truth index (cf.
To examine whether the computation of scalar inferences is associated with a processing cost, Bott & Noveck (
(2) | a. | Some dogs are mammals. |
b. | Some parrots are birds. |
These sentences are true when interpreted literally, since, e.g., there are dogs that are mammals, but they are false when the scalar inference is computed and ‘some’ is interpreted as ‘some but not all’, since, in fact, all dogs are mammals. Hence, participants’ truth judgements to these underinformative sentences are indicative of whether or not they computed a scalar inference.
In Bott and Noveck’s Exp. 3, participants gave intuitive truth judgements to sentences such as (2). Many participants were ambivalent about the truth of underinformative sentences like these, varying their responses across structurally similar trials. Comparing the verification times of these ambivalent participants, Bott and Noveck found that it took participants significantly longer to answer ‘false’ (i.e., the answer suggesting a two-sided interpretation) than ‘true’ (i.e., the answer suggesting a literal interpretation). This difference in verification times was absent in a control condition with sentences that were unambiguously true or false, as in (3).
(3) | a. | Some mammals are dogs. |
b. | Some dogs are birds. |
The pattern of results that Bott and Noveck observed suggests that the computation of scalar inferences is associated with a processing cost, at least in out-of-the-blue contexts. This conclusion is in line with relevance theory and several more recent approaches (e.g.,
In what follows, we refer to Bott and Noveck’s finding that participants take significantly longer to reject underinformative sentences like (2) than to accept them as the
For example, Chevallier and colleagues (
(4) | There is a sun or a train. |
In the target condition, the display for (4) showed both a sun and a train. Here, the sentence is literally true but false if the scalar inference is computed and ‘or’ is interpreted as excluding ‘and’. As in Bott and Noveck’s study, many participants vacillated between responding with ‘true’ or ‘false’ in the target condition. Unlike Bott and Noveck’s study, however, Chevallier and colleagues did not observe a significant difference in verification times between ‘true’ and ‘false’ answers.
Even more challenging data comes from studies testing the processing of scalar words in negative sentences, as in (5) (
(5) | a. | Not all dogs are insects. |
b. | Not all parrots are mammals. |
On their literal interpretation, the sentences in (5) merely convey that there are dogs that are not insects, and that there are parrots that are not mammals. So on their literal interpretation, these sentences are true. However, the sentences in (5) may give rise to the
Cremers & Chemla (
(6) | a. | Not all mammals are dogs. |
b. | Not all dogs are mammals. |
Romoli & Schwarz (
To obtain a more comprehensive picture of the generalisability of the B&N effect, van Tiel and colleagues (
Materials used by van Tiel et al. (
Some of the socks are pink. | |||
The battery is low. | |||
In line with Bott and Noveck’s study, van Tiel and colleagues found that, in the case of ‘some’, participants were significantly slower to answer ‘false’ than ‘true’ in the target condition, whereas no difference in verification times was observed in the control condition. Van Tiel and colleagues also observed a B&N effect for ‘or’ (in contrast with the aforementioned study by
To explain this pattern of results, van Tiel and colleagues rely on the notion of
Polarity is a fundamental but multifarious construct that refers to the fact that some words in natural language are positive while others are negative (cf.
In linguistics, polarity is usually operationalised in terms of
Van Tiel and colleagues rely on this characterisation, which they call the
However, in addition to the scalarity criterion, there are various other diagnostics of linguistic polarity. Since adjectives are of particular interest for the current study, we focus here on two ways of diagnosing the linguistic polarity of adjectives. Both of these diagnostics build on a standard assumption in linguistics that many adjectives are members of antonym pairs where one member is positive and the other is negative (e.g.,
A first way of diagnosing adjectival polarity involves the interpretation of ‘how’ questions. In particular, ‘how’ questions involving positive adjectives tend to be neutral, whereas those involving negative adjectives tend to presuppose that the adjective holds (e.g.,
(7) | a. | How long is a day on Venus? |
b. | How short is a day on Venus? |
Whereas (7a) is neutral about whether days on Venus are long or short, (7b) intuitively suggests that the speaker believes they are short. This observation suggests that ‘long’ is positive, while ‘short’ is negative. A direct consequence of the fact that negative adjectives are biasing in ‘how’ questions is that they are less likely to occur in such questions than positive adjectives—e.g., the phrase ‘how long’ is much more frequent in the ENCOW16A corpus (a web corpus consisting of almost 17 billion tokens, cf.
A second way of linguistically delineating positive and negative adjectives looks at ratio phrases, such as ‘twice as’ and ‘half as’ (cf.
(8) | a. | She is twice as old as him. |
b. | ?He is twice as young as her. |
In line with this observation, Sassoon (
In psychology, polarity is usually defined in terms of
The psychological notion of polarity as subjective valence also reverberates in natural language in several ways (e.g.,
In most cases, the linguistic and psychological notions of polarity go hand in hand, but not always. For example, as we just saw, from a linguistic perspective, ‘old’ is positive while ‘young’ is negative. From a psychological perspective, the converse holds: ‘young’ is positive and ‘old’ is negative (e.g., in the study by
In order to explain why only positive scalar words are associated with a B&N effect, van Tiel and colleagues rely on the observation that verification times are systematically affected by the polarity of the sentence. To illustrate, consider the three sentences in (9). These sentences vary in their polarity: (9a) is positive, (9b) contains the implicitly negative word ‘below’, and (9c) contains the explicit sentential negation ‘not’. In what follows, we will conveniently refer to these three types of sentences as
(9) | a. | The star is above the cross. |
b. | The cross is below the star. | |
c. | The cross is not above the star. |
Clark & Chase (
(10) | positive < implicit negative < explicit negative |
Clark and Chase’s findings have been replicated in numerous studies (e.g.,
To see how the generalisation in (10) may explain van Tiel and colleagues’ observation that only positive scalar words are associated with a B&N effect, consider their target sentence for the positive scalar word ‘some’ and its scalar inference in (11).
(11) | Some of the socks are pink. |
⟿ Not all of the socks are pink. |
Participants who interpreted the target sentence literally only had to verify a positive sentence, whereas participants who arrived at a two-sided interpretation also had to verify the explicitly negative scalar inference. Given the generalisation in (11), we may expect that verifying the explicitly negative scalar inference leads to elevated response times compared to the literal interpretation. So, even if we assume that participants who arrived at a two-sided interpretation of the target sentence verified the literal interpretation and the scalar inference in parallel, it follows that they are expected to be significantly slowed down compared to participants who interpreted the sentence literally.
Now consider the target sentence for the implicitly negative scalar word ‘scarce’ along with its scalar inference in (12).
(12) | Red flowers are scarce. |
⟿ Red flowers are not absent. |
Participants who arrived at a literal interpretation of the target sentence had to verify an implicitly negative sentence, since ‘scarce’ is implicitly negative. What about the scalar inference? Superficially, the scalar inference appears to involve a double negation in ‘not absent’. Hence, intuitively, one might suppose that the verification of the scalar inference should take longer than verifying the literal interpretation. However, van Tiel and colleagues contend that the scalar inference in this case is verified at least as fast as the literal interpretation. There are various arguments that may support this proposal.
First, it has been found that, in at least some cases, sentences containing two negative elements are processed more rapidly than the corresponding sentences with a single negative element. For example, Sherman (
In any case, if we assume that the scalar inference in (12) can be verified at least as rapidly as the literal interpretation, and if we furthermore assume that participants who compute the scalar inference verify both the literal interpretation and the scalar inference in parallel, it follows that participants who computed the scalar inference should be equally fast as participants who arrived at a literal interpretation. One might even expect participants who derived the scalar inference to be faster than participants who arrived at the literal interpretation, since the positivity of the scalar inference seems to entail that it should be verified faster than the implicitly negative literal interpretation, and since the sentence may be judged false as soon as the scalar inference is verified. However, psycholinguistic studies have consistently shown that ‘false’ responses to positive sentences are generally slightly delayed compared to ‘true’ responses, which might mitigate that verification time advantage for positives relative to implicit negatives, and thus lead to roughly equal verification times for literal and two-sided interpretations (e.g.,
This polarity-based explanation also harmonises with the previously discussed findings on indirect scalar inferences. To illustrate, (13) shows a target sentence from Cremers and Chemla’s (
(13) | Not all dogs are insects. |
⟿ It’s not the case that not some dogs are insects. |
Again, the proposal is that the scalar inference is verified more rapidly than the literal interpretation, either because the double negation is eliminated or because ‘not all’ is statistically associated with ‘some’ (rather than the equivalent ‘not not some’). The reason that (13) gave rise to the reverse B&N effect—rather than the absence of any effect, as van Tiel and colleagues found for cases like ‘scarce’—is that the target sentence contains an explicit negation. As noted before, explicit negatives take longer to verify than implicit negatives. Hence, participants who accepted the target sentence had to verify the more time-consuming literal interpretation, whereas participants who arrived at the two-sided interpretation verified both the literal interpretation and the (positive) scalar inference in parallel. In the latter case, participants could respond with ‘false’ as soon as they realised that the scalar inference was false, which took less time than verifying the literal interpretation and responding with ‘true’.
Predictions of the polarity-based explanation about the polarity properties of the literal interpretation and the scalar inference, and about the B&N effect.
positive (e.g., ‘some’) | positive | expl. negative | present |
impl. negative (e.g., ‘scarce’) | impl. negative | positive | absent |
expl. negative (e.g., ‘not all’) | expl. negative | positive | reversed |
However, the current support for the polarity hypothesis is comparatively thin, consisting solely of the data for ‘low’ and ‘scarce’ (as well as perhaps earlier data on indirect scalar inferences). Moreover, in addition to being the only negative scalar words tested by van Tiel and colleagues, ‘low’ and ‘scarce’ were also the only
In this study, we test the hypothesis that only positive scalar words give rise to the B&N effect in a more comprehensive and systematic way by investigating the processing of 16 adjectival scalar words of both positive and negative polarity. Rather than relying on one subjective diagnostic to classify scalar words in a binary way as either positive or negative, we combined the outcomes of four objectively measurable diagnostics for polarity to obtain a gradient measure of polarity. Consequently, we tested whether this gradient measure of polarity predicted the presence or absence of a B&N effect. In the next section, we describe our study in more detail.
Our study tested 16 adjectival scales: ⟨ajar, open⟩, ⟨breezy, windy⟩, ⟨chubby, fat⟩, ⟨content, happy⟩, ⟨cool, cold⟩, ⟨drizzly, rainy⟩, ⟨fair, good⟩, ⟨low, empty⟩, ⟨mediocre, bad⟩, ⟨passable, good⟩, ⟨ripe, overripe⟩, ⟨scarce, absent⟩, ⟨sleepy, asleep⟩, ⟨unlikely, impossible⟩, ⟨warm, hot⟩, and ⟨youthful, young⟩. For each scale, we constructed a simple sentence containing the weaker scalar word, and, for each sentence, we created three images: a target image where the sentence was literally true but where its scalar inference was false, and two control images where the sentence was unambiguously true or false. See the Appendix for an overview of the sentences and images that we tested.
Participants in the experiment first saw the sentence. Once they finished reading the sentence, they pressed the space bar to see the image. At that point, they had to indicate whether they felt the sentence they had just read was a good or bad description of the corresponding image. We measured their verification times (i.e., the time between image onset and the point at which one of the response buttons was pressed) to establish the presence or absence of a B&N effect, i.e., to determine whether or not verification times were slower for ‘false’ than for ‘true’ answers in the target condition, vis-à-vis the control condition.
To test the polarity hypothesis—i.e., the idea that only positive scalar words give rise to the B&N effect—we had to determine the polarity of the scalar words in our study. Here, we focus on the stronger word on the scale, since the polarity-based explanation crucially makes reference to the polarity of the negated alternative, though if the literature is right, all words on a scale should share the same polarity (i.e., the
The first three diagnostics reflect the linguistic notion of polarity as markedness; the last two diagnostics reflect the psychological notion of polarity as subjective valence.
Van Tiel and colleagues focused on the scalarity criterion in their study. However, in our study, we wanted to avoid using this criterion for two reasons. First, not all of the scalar words that we tested make reference to a clearly identifiable measurement scale. For example, in the case of ‘open’, it is unclear whether the underlying measurement scale is about openness or closedness. Second, and relatedly, the scalarity criterion crucially relies on researchers’ intuitions, which are not always reliable.
Rather than relying on one specific construal of polarity, or even one specific diagnostic measure, we made use of each of the remaining four diagnostics in the list. Unlike the scalarity criterion, these four diagnostics can be operationalised using objective data. We assume here that each of the four diagnostics offers an approximation of a fundamental latent construct of polarity, and that, by combining these diagnostics, we are able to obtain a relatively reliable estimate of that construct. Crucially, we assume that polarity is gradient rather than binary; that is, words can be positive or negative to varying degrees. In making this decision, we do not want to question the value of focusing on one specific construal of polarity, e.g., in terms of markedness. However, even in the extensive line of work focusing on linguistic polarity, it has been observed that there is no fail-proof way of establishing the polarity of a word that consistently accords with linguists’ intuitions (e.g.,
Hence, for each of the stronger scalar words in our experiment—as well as their corresponding antonyms—we obtained four measures, corresponding to the last four diagnostics in the list above: (i) their frequency in the phrase ‘how [adjective]’, (ii) their frequency in the phrases ‘twice as [adjective]’ and ‘half as [adjective]’, (iii) their valence ratings as reported by Mohammad (
Lexical scales tested in the experiment and antonyms of the stronger scalemate.
⟨ajar, open⟩ | closed | 1.52 | 2.58 | 1.09 | 1.18 | |
⟨breezy, windy⟩ | calm | 0.38 | 0.86 | 0.74 | –0.99 | |
⟨chubby, fat⟩ | skinny | 0.59 | 1.21 | 1.28 | 1.06 | |
⟨content, happy⟩ | sad | 1.10 | 2.11 | 4.44 | 1.11 | 2.18 |
⟨cool, cold⟩ | hot | 0.99 | 0.79 | 0.81 | 0.99 | –0.36 |
⟨drizzly, rainy⟩ | dry | 0.35 | 1.68 | 0.81 | –1.27 | |
⟨fair, good⟩ | bad | 1.06 | 1.13 | 7.50 | 1.11 | 2.12 |
⟨low, empty⟩ | full | 0.89 | 1.00 | 0.31 | 0.87 | –1.57 |
⟨mediocre, bad⟩ | good | 0.94 | 0.88 | 0.13 | 0.90 | –0.69 |
⟨passable, good⟩ | bad | 1.06 | 1.13 | 7.50 | 1.11 | 2.12 |
⟨ripe, overripe⟩ | unripe | 0.72* | 0.96 | –0.28 | ||
⟨scarce, absent⟩ | present | 0.69 | 0.24 | 0.80 | –1.31 | |
⟨sleepy, asleep⟩ | awake | 0.82 | 0.91 | 1.04 | –0.03 | |
⟨unlikely, impossible⟩ | possible | 1.12 | 0.00 | 0.22 | 0.88 | –1.83 |
⟨warm, hot⟩ | cold | 1.00 | 1.27 | 1.23 | 1.01 | 0.46 |
⟨youthful, young⟩ | old | 0.85 | 0.20 | 1.98 | 0.97 | –0.85 |
Next, we carried out a principal component analysis based on these ratio values.
The values on the first principal component are shown in
We used the values from the first principal component to test the polarity hypothesis, which predicts that the B&N effect should interact with polarity. Recall that the B&N effect consists of slower verification times when responding ‘false’ than ‘true’ in the target condition vis-à-vis the control condition; i.e., the B&N effect consists in an interaction effect on verification times between condition (target vs. control) and response (‘true’ vs. ‘false’). Hence, the polarity hypothesis predicts a significant three-way interaction between condition (target vs. control), response (‘true’ vs. ‘false’), and polarity, such that the relative increase in verification times for ‘false’ compared to ‘true’ responses in the target condition increases with increasing polarity of the adjective.
The next section describes our experiment in more detail, followed by the results. All data and analysis files can be accessed at
50 participants were recruited on Amazon’s Mechanical Turk. 20 participants were female, the remaining 30 male. Participants’ mean age was 39 (standard deviation: 11, range: 22–69). All participants indicated that they were native speakers of English. Participants were paid $1.50 for their participation.
As mentioned above, the materials consisted of 16 adjectival scales. For each scale, we created a simple sentence containing the weaker scalar word. For each sentence, we created three types of images: one image where the sentence was unambiguously true, one image where it was unambiguously false, and one image where the sentence was true on its literal interpretation but false if its scalar inference was derived.
The materials were pretested in two experiments with 25 participants each. Based on these pretests, we made several adjustments to the sentences and images to ensure that participants responded as expected, i.e., rejected the sentence in the false control condition, accepted it in the true control condition, and vacillated between accepting and rejecting the sentence in the target condition. (Here, vacillation was defined as significantly fewer ‘true’ responses than in the true control condition and significantly fewer ‘false’ responses than in the false control condition.)
The experiment presented each sentence-image pair three times, and thus comprised 16×3×3 = 144 trials in total. The order of presentation was randomised for each participant.
Participants were instructed to indicate whether or not the sentence was a good description of the image. They could register their judgement by pressing either ‘1’ (to answer in the positive) or ‘0’ (to answer in the negative) on their keyboard. Trials started with the presentation of the sentence. Upon pressing the space bar, the sentence was replaced by the image, whereupon participants could give their truth judgements. We measured the time from image onset to button press.
3 participants were removed from the analyses because their accuracy on control items was below 80%. In addition, we removed trials with a verification time faster than 200 milliseconds or slower than 10 seconds, assuming that these correspond to accidental button presses or inattentiveness to the task at hand. This resulted in the removal of 14 trials (less than 0.1% of the data).
Percentage of ‘true’ responses for each scalar word and condition. Error bars represent standard errors of the mean.
One of our reviewers rightly observed that the percentage of ‘true’ responses in the target condition for ‘mediocre’ (16%) was so close to the percentage of errors in the ‘false’ control condition (11%) for that scalar word that one might call into question our assumption that the ‘not bad’ inference is a bona fide scalar inference rather than being an aspect of the lexical meaning of ‘mediocre’. However, given that ‘mediocre’ behaved in line with our assumption in the pretest (26% ‘true’ responses in the target condition vs. 7% errors in the ‘false’ control condition), we decided to retain this item in our analyses. We want to emphasise that removing ‘mediocre’ does not have any noteworthy consequences for the analyses that we report below.
Next, we considered participants’ verification times.
Mean logarithmised verification times for positive and negative scalar words. Error bars represent standard errors of the mean.
Mean logarithmised verification times for each scalar word. Scalar words are ordered from most positive (top left) to most negative (bottom right).
To test the polarity hypothesis, we constructed a linear regression mixed effects model predicting logarithmised response times based on condition (target vs. control), response (‘true’ vs. ‘false’), polarity, and their interactions, including random intercepts for participants and scalar words. For all analyses, degrees of freedom and corresponding
To also obtain a more fine-grained picture of the scope of the polarity hypothesis, we checked, for each scalar word separately, whether a B&N effect was present or not. To this end, for each scalar word, we constructed a linear regression mixed effects model predicting logarithmised response times on the basis of condition, response, their interaction, and trial number, including random intercepts for participants with random slopes for condition and response (but not their interaction). We observed significant interaction effects for ‘content’, ‘fair’, ‘passable’, ‘ajar’, ‘chubby’, ‘warm’, and ‘youthful’ (see
Parameters of the interaction effect between condition (target vs. control) and response (‘true’ vs. ‘false’) for each lexical scale. The scales are ordered based on their estimated polarity value (
⟨content, happy⟩ | 2.18 | –0.35 | 0.10 | –3.39 | .001** |
⟨fair, good⟩ | 2.12 | –0.43 | 0.08 | –5.14 | .000*** |
⟨passable, good⟩ | 2.12 | –0.30 | 0.09 | –3.28 | .001** |
⟨ajar, open⟩ | 1.18 | –0.19 | 0.08 | –2.48 | .014* |
⟨chubby, fat⟩ | 1.06 | –0.33 | 0.09 | –3.85 | .000*** |
⟨warm, hot⟩ | 0.46 | –0.28 | 0.09 | –3.14 | .002** |
⟨sleepy, asleep⟩ | –0.03 | –0.11 | 0.10 | –1.20 | .235 |
⟨ripe, overripe⟩ | –0.28 | 0.19 | 0.10 | 1.96 | .054 |
⟨cool, cold⟩ | –0.36 | –0.07 | 0.10 | –0.73 | .465 |
⟨mediocre, bad⟩ | –0.69 | –0.05 | 0.09 | –0.57 | .567 |
⟨youthful, young⟩ | –0.85 | –0.24 | 0.10 | –2.33 | .025* |
⟨breezy, windy⟩ | –0.99 | –0.04 | 0.08 | –0.51 | .613 |
⟨drizzly, rainy⟩ | –1.27 | –0.10 | 0.08 | –1.24 | .216 |
⟨scarce, absent⟩ | –1.31 | 0.08 | 0.08 | 0.94 | .349 |
⟨low, empty⟩ | –1.57 | –0.13 | 0.08 | –1.71 | .089 |
⟨unlikely, impossible⟩ | –1.83 | 0.05 | 0.09 | 0.56 | .577 |
Taken together, these results confirm the polarity hypothesis: the presence or absence of a B&N effect was modulated by the polarity of the scalar words. More specifically, all of the positive scalar words gave rise to a B&N effect, while almost none of the negative scalar words did. The only exception to this rule was ‘youthful’, which was associated with a B&N effect despite being assigned a (slightly) negative polarity value (–0.85).
Pragmatic theories make conflicting predictions about the processing of scalar inferences in out-of-the-blue contexts. Relevance theory predicts that, in such contexts, the literal interpretation should be easier to retrieve than an interpretation that is enriched with a scalar inference. By contrast, Levinson (
In a seminal study, Bott & Noveck (
The polarity-based explanation proceeds from the observation that verification times vary with the polarity of the sentence (e.g.,
Various explanations for this potentially controversial assumption may be given. First, it could be that certain propositions containing a double negation are in fact easier to process than propositions with a single negation. Second, it could be that participants eliminate the double negation on the fly as they encounter it. Both of these explanations can be empirically tested by asking participants to verify the doubly negated scalar inferences (e.g., ‘The battery is not empty’) and comparing the verification times with those for the literal interpretation of the target sentence (i.e., ‘The battery is low’). If either of the foregoing explanations is on the right track, ‘false’ responses to the negated scalar inference should be at least as fast as ‘true’ responses to the target sentence.
However, concerning the former explanation, prior research has shown that, in many cases, verification times increase with the number of negations (
If, finally, we assume that participants who arrive at a two-sided interpretation verify the literal interpretation and the scalar inference in parallel, the correct predictions follow straightforwardly: positive scalar words give rise to the B&N effect, inherently negative scalar words do not, and explicitly negated scalar words lead to the reverse B&N effect.
We extensively and systematically tested this polarity-based explanation by comparing the processing of 16 adjectival scalar inferences using a sentence-picture verification task. We estimated the polarity of the scalar words in our sample (and, hence, of the corresponding scalar inferences) on the basis of four diagnostics measuring their linguistic markedness and psychological valence. We found that the presence or absence of a B&N effect was strongly dependent on the polarity of the lexical scale. Indeed, of the 7 lexical scales whose inferences led to increased verification times, 6 were estimated to be positive. The sole exception was ⟨youthful, young⟩, which was associated with a B&N effect despite being classified as (somewhat) negative.
One interesting observation for this particular scale is that the valence criterion “correctly” classified this scale as positive rather than negative (i.e., in accordance with the behaviour of ‘youthful’ in the experiment, and in contrast to its estimated polarity). Hence, one may jump to the conclusion that the valence criterion generally offers a better measure of polarity than the other diagnostics. However, this does not hold true across the board. For example, ‘rainy’ also had positive valence ratings relative to ‘dry’, but the ⟨drizzly, rainy⟩ scale was not associated with a processing cost.
Another scalar word that merits some further discussion is ‘unlikely’. ‘Unlikely’ was the only scalar word in our sample that was explicitly marked for negativity by means of the negative prefix ‘un-’. Hence, one might expect to find a reverse B&N effect for this particular scalar word, i.e., one might expect that it patterns with explicitly negative scalar constructions like ‘not all’ rather than with implicitly negative scalar words like ‘low’. This prediction was not borne out. However, on closer inspection, this finding is not so surprising. To explain, consider the hierarchy of negation proposed by Fodor and colleagues (
Explicitly negative free morpheme (e.g., ‘not’).
Explicitly negative bound morpheme (e.g., ‘un-’).
Implicitly negative free morpheme (e.g., ‘low’).
Free morphemes that are defined in negative terms (e.g., ‘bachelor’ meaning someone who is
‘Unlikely’ is of class
Taken together, then, it is clear that there is a strong connection between polarity and the B&N effect. Perhaps most forcefully, it seems difficult to explain why the scalar inference of ‘warm’ but not ‘cool’ was associated with a processing cost without appealing to the notion of polarity, especially given that the sentences and images used for these scalar words were so similar (the images showed transparent drinking glasses containing water at different temperatures, from a block of ice to vigorously boiling, cf. Appendix). Hence, we view our results as strong support for the polarity-based explanation.
Crucially, if the polarity-based explanation is correct, the classic observation that certain scalar inferences lead to increased verification times is not reflective of any processing cost for scalar inferencing, but rather reflects the psychological difficulty of verifying negative information. Indeed, the polarity-based explanation leads us to conclude that scalar inferencing itself is not associated with a processing cost, even in the absence of a facilitating context, contra, e.g., relevance theory. At the same time, however, our results also fail to support the defaultist idea that scalar inferences arise temporally prior to the literal interpretation; nowhere did we observe faster processing times when people computed the scalar inference.
Rather, it seems that scalar inferences are conventionally or statistically associated with their triggering expressions. That is, when hearing an utterance containing the weaker scalar word, people may immediately activate the corresponding scalar inference at no processing cost. However, when verifying this scalar inference, a processing cost may ensue if the scalar inference is negative, since negative information takes longer to be verified (e.g.,
One might suppose that this explanation is not very “Gricean” in spirit. Note, however, that Grice (
Our proposal makes some empirically testable predictions. In particular, it is predicted that other experimental paradigms that make use of sentence verification should also be susceptible to the polarity effect: if the proposition to be verified contains negative information, its verification should be cognitively costly. One such paradigm that relies on sentence verification was introduced by De Neys & Schaeken (
(14) | Some dogs are mammals. |
De Neys and Schaeken found that participants were less likely to respond ‘false’, i.e., to derive the scalar inference, when they had to memorise complex dot patterns compared to simple ones (cf. also
De Neys and Schaeken explain the D&S effect based on the premise that the derivation of scalar inferences is associated with a processing cost. According to their explanation, participants who had to memorise complex dot patterns had fewer cognitive resources available to derive the scalar inference, and consequently were less likely to carry out the derivation process. However, if the polarity-based explanation is on the right track, the D&S effect could also be modulated by the polarity of scalar words.
Interestingly, van Tiel and colleagues (
However, in a more recent study, Marty and colleagues (
(15) | Not all of the apples are red. |
To explain this pattern of results, Marty and colleagues (following
These findings paint a complex picture that obviously calls for a more detailed discussion than we can offer here, but they clearly show that, for other measures, too, there has been debate about whether the locus of the alleged processing cost is in the process of scalar inferencing or in some other relevant cognitive process (e.g.,
It is an open question whether polarity also influences other measures of processing cost, e.g., those involving reading times and eye movements. There is at least some evidence indicating that these measures, too, are influenced by the polarity of a sentence. For example, Glenberg et al. (
To estimate the polarity of the scalar words in our sample, we combined insights from linguistics and psychology. In linguistics, polarity is usually construed in terms of markedness; in psychology, in terms of subjective valence. We obtained measures of both construals, and used those to estimate a hypothesised latent construct of polarity. Here, we depart from (and hopefully improve on) prior research, particularly in linguistics.
Much linguistic research is premised on the idea that polarity is a binary notion, i.e., in any antonym pair, one is positive and the other one negative (e.g.,
We observed many similar conflicts between diagnostics (cf.
One way of resolving such clashes between diagnostics is by incorporating the results of multiple diagnostics into the estimation of a gradient measure of polarity. We hope this approach finds a following in linguistic and psychological research on polarity.
Perhaps above all, our results emphasise the importance of testing broader samples of scalar words in research on scalar inferences—not just to determine whether psychological effects generalise across the entire family of scalar words, but also because it offers an insight into the structural linguistic constructs that underlie language processing. Thus, our study has shown the psychological relevance of the notion of polarity. We hope our research will inspire others to revisit the many interesting findings that have been reported on the scalar inference of ‘some’ to see if they generalise and, if not, what factors may explain the observed scalar diversity, so that we may come to a better understanding of the cognitive processes that underlie the derivation of scalar inferences—and ultimately pragmatic inferences more generally.
All data and analysis files can found at
The additional file for this article can be found as follows:
The Appendix shows the sentences and images used in the experiment. DOI:
This excellent idea was suggested to us by one of our anonymous reviewers.
This research was presented at the workshop on degree expressions and polarity effects (DegPol2020) that was held at the Leibniz-Zentrum für Allgemeine Sprachwissenschaft. We thank the audience there for raising important questions and issues. We also thank Min-Joo Kim and our three anonymous reviewers at ‘Glossa’ for extremely valuable feedback on an earlier version of this article.
This research was funded by the German Research Council (grant DFG FR 3482/2-1, KR951/14-1, SA 925/17-1) within SPP 1727 (
The authors have no competing interests to declare.