Revisiting subjunctive obviation in French: a formal acceptability judgment study

Even though the weakening of the subjunctive disjoint reference effect, also known as obviation, plays an important role in the research of subjunctives in (non-)Romance languages, to the best of our knowledge it has never been verified experimentally. The goal of our paper is to test how native speakers of (European) French evaluate sentences displaying factors that (according to Ruwet 1991) should weaken obviation using a formal acceptability judgment task. Our results show that we were unable to replicate Ruwet’s observations (when averaging over multiple participants): only one out of six factors described by Ruwet seems to clearly weaken obviation, namely Coordination. We conclude that (a) French may be a language for which formal experimentation of complex data is useful, (b) idiolects should not be ignored, and (c) our results challenge theoretical accounts of obviation weakening. Finally, we relate our study to the ongoing discussion on whether informal methods of collecting acceptability judgments (such as introspection by the author) need to be verified by formal methods. INGO FELDHAUSEN


Introduction
One of the primary sources of data in linguistics are acceptability judgments (cf. Chomsky 1965: ch.1;Newmeyer 1983: ch.2;Sprouse et al. 2013). Acceptability judgments can be obtained through different methods, with Sprouse et al. (2013: 224) classifying them into informal and formal methods. Informal methods are characterized by the involvement of relatively few (expert) participants (most often the author of a given paper), by relatively few tokens per condition, and by relatively little explicit instruction. The opposite picture characterizes formal methods: many non-expert, naive participants, several tokens per condition, explicit instruction, most often a statistical analysis, and many more response options.
In this paper, we revisit the seminal study by Ruwet (1991Ruwet ( [1984), in which he presents introspective acceptability judgments on subjunctive obviation in European French, suggesting that obviation can be weakened under certain conditions (see §2 and §3 for examples). This phenomenon has never been approached using a formal method in French (or any other language) to the best of our knowledge. We revisit his data and test how robust his results are by using a formal acceptability judgment method. Our results indicate that we cannot entirely replicate Ruwet's observations. The conclusions we draw from this are as follows: (a) French appears to be a language for which formal experimentation of complex data is useful. (b) Even though we found little evidence for obviation weakening, our results do not call the phenomenon of obviation weakening itself into question because (i) we only considered a subset of possible cases (see §2 and §3) and (ii) few individuals show a pattern similar to the one by Ruwet (1991). (c) If we are on the right track, our results have clear consequences for theories on obviation weakening. (d) Additional data must be considered, and additional experimental methods must be carried out to obtain a better overall picture of possible obviation weakening (in French).
The paper is structured as follows: In §2, we present the basic tenets of subjunctive obviation in French before we introduce obviation weakening and Ruwet's factors in §3. The acceptability judgment task and its results are presented in §4 and discussed in §5.

Subjunctive obviation in French
The subjunctive is a verbal mood in French (cf. Gsell & Wandruszka 1986;Le Goffic 1993: 122ff.;De Mulder 2010;Riegel et al 2014: §X.2.2;Mosegaard Hansen 2016: §5.4., among others). Following Quer (2020: 6), subjunctives consistently appear when the speaker conveys a meaning related to uncertainty or counterfactuality, which raises the question of how to assess such meaning on a theoretical level. Most semanticists relate subjunctive mood to the domain of modality and conversational context (Farkas 1992;Giannakidou 1997Giannakidou , 2009Giorgi & Pianesi 1997;Portner 1997;Quer 1998Quer , 2001Schlenker 2005). However, there are still debates as to whether such approaches apply to all occurrences of the subjunctive (Quer 2020: 23 f.), which gives reason to question whether the subjunctive mood should be treated as a universal semantic category as such (see Wiltschko 2016).
Independent of this ongoing discussion, the fundamental issue for our study is that the subjunctive expresses a disjoint reference between the subject of the matrix clause and the subject of the complement subjunctive clause, cf. (1).
(2) Je 1 veux PRO 1 partir I want pro.1sg leave.inf 'I want to leave.' The phenomenon of obviation has long been known, with a vast body of literature discussing it in detail for different languages such as French, Spanish, Catalan, Italian, Hungarian, Russian, and Polish (e.g. Bouchard 1983Bouchard , 1984Ruwet 1991Ruwet [1984Raposo 1985;Picallo 1985;Suñer 1986;Farkas 1992;Costantini 2005Costantini , 2009Costantini , 2013Costantini , 2016Schlenker 2005;Feldhausen 2007Feldhausen , 2010Szucsich 2009;Quer 2017, and others). Quer (1998), for example, observes that obviation is a phenomenon that occurs only in a subset of subjunctive types (contra Picallo 1985), namely intensional subjunctives, 2 i.e. subjunctive clauses that appear in the scope of intensional elements such as verbs of volition or command (Quer 2016: 957, but see also Costantini 2009: 21 for the relationship between obviation and other types of subjunctives in Italian). In the following, we adopt Quer's (1998) line of reasoning here and focus in particular on complements of vouloir 'want', an intensional verb.
(3) (Ruwet 1991: 20) ? Je veux que je sois autorisé à partir demain. I want that I am.sbjv authorized to leave.inf tomorrow 'I want for me to be allowed to leave tomorrow.' Beginning with the pioneering work of Ruwet (1991Ruwet ( [1984), obviation weakening has played an essential role in research on subjunctives and in linguistic theorizing (see, e.g., Costantini 2005Costantini , 2009Quer 2017 for an overview). Our review of the literature reveals that the provided data either stem mainly from introspection on the part of the authors (e.g. Ruwet 1991;Costantini 2009) or are adapted from the literature (e.g. Suñer 1986;Farkas 1992;Quer 2017). 3 To address the lack of formal experimentation, we developed the research question in (4) concerning the robustness of claims regarding obviation weakening made by Ruwet (1991). Since his work is fundamental to our understanding of obviation weakening, we concentrated on his factors and on European French.

(4)
Can we replicate Ruwet's (1991) results on obviation weakening by using a formal acceptability judgment method? 1 The term 'obviation' was originally used in describing a grammatical category of Algonquian languages (North America; see Cuoq 1866 andMithun 1999). Later, it was adopted for the subjunctive disjoint reference effect.

3
It is noteworthy here that Poplack (1992: 237) found some instances of coreference in her sociolinguistic interviews with native speakers of Canadian French. Since we concentrate on European French and do not investigate mood choice variability, we merely wish to mention her findings at this point. Feldhausen and Buchczyk Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1219

Ruwet's (1991 [1984]) factors of obviation weakening
Focusing on French, Ruwet (1991) argues that coreferential reading improves under specific circumstances by: (i) reducing the agentivity of the subject (p. 20) and (ii) creating a distance between the expression of desire and the fulfillment of the action (p. 21).
If these two aspects hold true in an embedded subjunctive, obviation is weakened, and coreference becomes more acceptable. After presenting this general idea, Ruwet (1991) discusses several individual factors which lead to a weakening of obviation; we will focus on six of them. Before presenting them in detail, a note on the indicated judgments in this section is necessary: the judgments are those found in Ruwet (1991). As far as we can tell, most of the judgments seem to be his own, though he occasionally took examples from others. We do not know (a) how many tokens were tested per condition (though we assume only the tokens presented in his paper) or (b) how many response options were available (we assume the classical division of OK, ? and *, though he also uses ??). No statistical analysis was conducted in the original work. Thus, Ruwet (1991Ruwet ( [1984) applied a classical informal method (see §1).

Passive:
In (5a) and (5b), the verbal form of the subjunctive complement is in the passive voice. While (5a) is better than (1b), it is worse than (5b). Ruwet (1991) argues that the passive voice per definitionem does not express an action of the subject, thus resulting in a reduction of agentivity. The fulfilment of the speaker's will must in some way "make a detour and pass by way of someone else's will" (p. 20). Moreover, the semantics of the embedded verb may affect the acceptability judgment: enterrer 'bury' suggests that the speaker is no longer present to guarantee the completion of their will (p. 20). Periphrastic past: According to Ruwet (1991: 23), any element that tends to imply a distance between the will and the accomplishment of the corresponding act is likely to improve acceptability (see (ii)); it can thus be enough to put the subjunctive clause in the perfective aspect, emphasizing the accomplishment of the act (6).
(6) (Ruwet 1991: 23) Je veux (absolument) que je sois parti dans dix minutes. I want absolutely that I am.sbjv left in ten minutes 'I want (absolutely) for me to be gone in ten minutes.' Negation: Ruwet (1991: 29) further shows that if the matrix verb is negated and the embedded clause includes a more or less negative expression with respect to the subject, the whole sentence becomes more acceptable, (7). The speaker expresses their desire for an event not to happen again but cannot fully influence the future outcome (see (ii)).
(7) (Ruwet 1991: 29) ? Je ne veux pas que je rate une occasion pareille. I neg want not that I miss.sbjv a chance same 'I do not want for me to miss a chance like that.' (Lit. 'I do not want that I miss …')

Modals:
In (8), the modal verb pouvoir 'can' is in the complement clauses. The subjunctive form in (8a) is less acceptable than the infinitive in (8b). However, according to Ruwet (1991: 21), the acceptability of (8a) depends on the context: (8a) would appear more natural if a businesswoman were addressing her secretary, asking her to book a reservation for a flight (see (ii)).

b.
Je veux pouvoir partir dès demain. I want can.inf leave from tomorrow 'I want to be able to leave (by) tomorrow.' Psych-verbs: For Ruwet (1991: 27), sentences such as (9) with a psych-verb in the embedded clause are only acceptable if they are interpreted as non-agentive (see (i)). In his interpretation, the presence of the embedded subject causes the speaker to imagine a second instance of themselves observing the act from afar.
(9) (Ruwet 1991: 27) ? Je veux que j'amuse ces enfants. I want that I.amuse.sbjv these children 'I want for me to amuse those children.' Coordination: In (10), the embedded clauses consist of two conjuncts. The difference between (10a) and (10b) is the order of the subjects in the conjuncts (je 'I' + tu 'you' in (10a), tu 'you' + je 'I' in (10b)). As stated by Ruwet (1991: 24), the respective order is highly relevant. The matrix subject is je 'I'. (10a) is ungrammatical because the conjunct with the coreferential subject immediately follows the matrix verb (order of subjects: je-je-tu). (10b), on the other hand, is better because the conjunct with the coreferential subject remains distant from the matrix verb (je-tu-je). Whereas the previous factors are all based on semantic interpretation, this factor is purely syntactic.

Participants
A total of 88 French native speakers (21 males, 64 females, and three non-specified) completed our online survey. We had to exclude one speaker. 4 The participants ranged from 18-76 years of age. They were either enrolled at university, employed or pensioned. Only four participants held a degree below A levels ("baccalauréat") at the time of the study. We included a short practice session at the beginning of the experiment with items that we did not analyze.

Material and design
The study was carried out using a formal acceptability judgment method (cf. Featherston 2006;Sprouse et al. 2013;Sprouse 2018). Our survey was comprised of 96 items: 48 test items (six factors × eight lexicalizations of each factor) and 48 filler clauses (out of which we chose 15 items as "control fillers"). The test items consisted of the original sentences from Ruwet (1991) and newly created lexicalizations (see Appendix). The items were pseudo-randomized (sentences of the same factor never appeared immediately after one another). Each item was displayed separately, followed by the Likert-type-7 scale  below. The scale ranged from grammatical (right, value 6) to ungrammatical (value 0), with ni l'un ni l'autre 'neither one nor the other' in the middle of the scale (value 3; Figure 1). Each participant evaluated all 96 items broken into two blocks, A and B, each comprised of 48 items. One group of participants received the order AB and the other BA, with the order of each block's items remaining the same. 4 The excluded speaker rated all sentences as grammatical (value 6), even all ungrammatical control fillers. Feldhausen and Buchczyk Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1219

5
The control fillers were divided into four groups. An example of each group is given in (11)

Procedure
The study was run on the online survey platform LimeSurvey in 2019. We distributed the link to our study via different social networks, email lists, and colleagues in France. Participants took between 15 and 30 minutes to complete the session. The survey began with a practice phase for familiarizing the participants with the survey. After this, participants started the official survey and evaluated the 96 test items. At the end, participants filled out a biographical questionnaire 5 Following Schütze (2016: 26f.) and his reasoning, we treat the terms acceptability judgment and grammaticality judgment as synonyms (see, e.g., Chomsky 1965:10-11; Sprouse et al. 2013: 220;Linzen & Oseki 2018: 15; Juzek & Häussler 2020: 134f. for a different view), and we distinguish between the terms grammaticality and acceptability only in cases in which it is important (e.g. when we speak of acceptable vs. grammatical sentences). In the introductory part of our experiment (see Appendix), we explained what we meant by the terms "(un)grammatical" to the participants in order to get intuitive, nonprescriptive judgments.

6
We do not have counterparts as control fillers for the other four factors (see Juzek & Häussler 2020 for arguments for testing items independently of counterparts). consisting of questions about sex, age, educational background, profession, region of origin, other native languages spoken, age of onset, whether they had lived abroad for an extended period (and, if so, where), and their parents' native language(s) and region(s) of origin.

Statistical analysis
To establish whether there were significant differences in acceptability ratings between the different factors and between grammatical and ungrammatical filler sentences, we conducted a one-way ANOVA with Tukey's HSD post-hoc pairwise comparisons (see Appendix for the mean differences). Separate analyses for comparing sentences, including the factors Coordination and Modals, to their respective control fillers (see §4.1.2) were run. Following the same structure as for the overall analysis, those comparisons consisted of a one-way ANOVA with Tukey's HSD post-hoc pairwise comparisons. In both analyses, all other factors and non-matching filler sentences were excluded. All analyses were conducted with R version 4.0.4 (2021-02-15) (R Core Team, 2018) in Rstudio 1.4.1103, using the package tidyverse (Wickham et al. 2019). Figures were rendered in the same environment using ggplot (Wickham 2016).
Before analysis, we z-score transformed the results of the scale task (see Appendix for the raw values). To derive participant averages (each black dot in the violin plot) we averaged within-participants. For the mean ratings and standard deviations by factor (see Appendix), we calculated a grand average over the 87 participants. To test for differences between grammatical and ungrammatical fillers and the six different factors, we conducted planned comparisons using Tukey HSD. Ratings for ungrammatical control fillers (*-type1-obvi and *-type2) were significantly lower than those for grammatical control fillers (Diff = -1.033, 95% CI [-2.053,-1.813], p < .001), supporting the assumption that ratings reflected participants perception of acceptability, cf. Average ratings of grammatical control fillers were significantly higher than those for test sentences in all categories (all diffs > 0.633, ps < .001). This suggests that none of the factors are sufficient for native speakers to accept the test sentences as grammatical. As for ungrammatical control fillers, only the average ratings for sentences including the factors Modals (M = -0.430, SD = 0.292) and

Overall analysis
Psychverbs (M = -0.488, SD = 0.226) were not significantly higher than those for ungrammatical sentences (diffs ≤ 0.083, ps ≥ 0.444), suggesting that those factors did not significantly contribute to improving acceptability. In turn, the factors Coordination, Passive, PerifrasticPast, and Negation were all rated significantly higher than the ungrammatical sentences. Despite this, there is an suggesting that Coordination is considered much more acceptable by the participants. Since the difference between the two factors is smaller than the difference between Coordination and the other two factors, the latter differences are also significant (all ps < .001). In line with this,
Pairwise comparisons revealed that test sentences including Modals were rated as significantly less acceptable than grammatical control fillers, including both the infinitive (type 1, Diff = -1.932, p < .001) and a switch of referent (type 2, Diff = -1.754, p < .001). They also show a smaller, albeit significant, difference between type 2 and type 1 (Diff = -0.178, p = .025), where type 1 sentences are rated as more acceptable than type 2 sentences. This suggests that including Modals in an otherwise ungrammatical sentence does not suffice to approximate the acceptability of grammatical sentences.

Figure 4
Violin plots of z-score transformed rating for factor Modals and control fillers (modal-type1 and modal-type2) across participants; including median and quartiles.

Figure 3
Violin plots of z-score transformed rating for grammatical and ungrammatical control fillers (where ungrammatical = *-type1obvi and *-type2); including median and quartiles.
Tukey's HSD pairwise comparisons reveal that sentences with Coordination were rated as significantly more acceptable than parallel ungrammatical sentences (coord-type1, je-je-tu) (Diff = -0.989, p < .001) and as significantly less acceptable than parallel grammatical sentences (coord-type 2, je-tu-tu) (Diff = 0.644, p < .001). This suggests that using Coordination in an otherwise ungrammatical sentence significantly increases acceptability, though acceptability still falls short of that of parallel grammatical sentences.

General discussion
In endeavoring to answer our research question, we detected apparent differences between Ruwet's (1991) judgments of obviation weakening and our results. In particular, our study shows that (i) obviation weakening with respect to vouloir 'want' as the matrix verb does not appear to exist in the way previously suggested by Ruwet (1991), (ii) only je-tu-je-coordinations tend to weaken obviation convincingly, (iii) the role of agentivity does not seem to be as relevant to obviation as previously assumed, and (iv) dialectal variation is not attested as a contributor to obviation weakening in our (European) French data.
We do not conclude that obviation weakening per se should be called into question. Rather, our data indicate that the occurrence of obviation weakening cannot be taken for granted in the specific conditions presented here (i.e. in the context of vouloir 'want'). There are two reasons for this. First, looking at the individual results in our data, it seems some speakers judge the data similarly to Ruwet (e.g. participants 64 and 71 rated all 6 conditions (relatively) high and still rated ungrammatical control fillers low). It is unclear whether these are participants who truly experience weakening, or participants responding in some other way (a kind of noise). All we can know for sure is that the sample average is not consistent with weakening. Figure 5 Violin plots of z-score transformed rating for factor Coordination and control fillers (coord-type1 and coord-type2) across participants; including median and quartiles. Determining how well each participant's rating reflects their grammar requires a different kind of study. The question arises as to whether Ruwet (1991) reports accurate reflections of his idiolect and whether he has a different idiolect than (most of) the participants in our study. Den Dikken et al. (2007) and Feldhausen (2016) highlight that averaging across multiple participants (as is standard in (psycho-)linguistic experiments) will obliterate individual differences. 7 The violin plots indicate variation in the participants' evaluations and reveal some outliers -evaluations that correspond to Ruwet's observations. Den Dikken et al. (2007) argue that individual judgments are highly relevant because they not only reflect the core idea of generative approaches -namely investigating the i-language of the individual/speaker hearer (p. 335) -but rather also allow for detecting micro-variants between different members of the same 'group' of language speakers (p. 339 & 342). Following the same idea, Feldhausen (2016) suggests that the phenomenon of inter-and intra-speaker variation in the linguistic data of members of the same 'group' can be integrated into formal grammatical theory. In his approach, he proposes an underlying grammar for all speakers that reflects the sameness of the 'group' while providing options to allow for individual variation. Second, as demonstrated in §2 and §3, obviation weakening is a complex phenomenon. Since we only considered a subset of possible cases (namely the volitional verb vouloir 'want'), we cannot speak to the possibility of obviation weakening in other contexts.
If we are on the right track and if our results on obviation can be generalized, there are interesting implications for theories on obviation weakening: our study indicates that a theoretical approach of obviation weakening based on the role of agentivity is not necessary -at least for French. Since only the syntactic factor Coordination allows for obviation weakening, an analysis at the syntax-semantics interface (as proposed in Farkas 1992 or Constantini 2009 among others) seems to be no longer necessary, and it is therefore possible to reduce the theoretical apparatus used to account for obviation weakening. We refer the interested reader to Buchczyk & Feldhausen (2020), where we propose a preliminary, novel syntactic analysis for the patterns attested here.

Relation to the discussion on the reliability of acceptability judgment tasks
The results of our study raise the question of how reliable the introspective data from Ruwet (1991) is. This directly relates to the ongoing discussion on whether informal methods are inherently reliable and whether the results of introspection can and should be replicated by formal acceptability judgment tasks. Some studies affirm the reliability of informal methods and highlight the importance of individual linguists' judgments (e.g. Phillips & Lasnik 2003;Featherston 2009;Phillips 2010;Sprouse et al. 2013;Chen et al. 2020), while other studies deny, doubt or question that reliability (Langendoen et al. 1973;Schütze 2016Schütze [1996 ;Edelman & Christiansen 2003;Gibson & Fedorenko 2010;Gibson et al. 2013;Linzen & Oseki 2018).
At first glance, our results seem to be in line with the latter group. However, it must be highlighted that our study was not designed to test the reliability of acceptability judgments for French generally; we merely tried to replicate a specific phenomenon. There are studies, in turn, that experimentally examined and confirmed the reliability of informal methods for a given language (English: Sprouse & Almeida 2012, 2017Sprouse et al. 2013;Korean: Song et al. 2014;Mandarin Chinese: Chen et al. 2020). We think it is worth conducting similar overarching studies on French to see whether its general pattern is like that of English, Korean, and Mandarin (especially since there are studies claiming differences between languages, e.g. Linzen & Oseki 2018). 8 If the former scholars are right, and no language differences influence reliability, how can our results be approached? Perhaps there is a difference between the general reliability of a given language established across different phenomena and the reliability of a specific, single phenomenon. Thus, while the general convergence rate for a given language is high, there might be some phenomenon where the convergence rate is low. Marantz (2005: 433ff.) and Linzen & Oseki (2018: 3, 17f.) argue that so-called class 3 judgments build such a group of Feldhausen and Buchczyk Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1219 complex and controversial data. 9 Since coreference phenomena are explicitly mentioned to be part of class 3 judgments in Marantz (2005: 434), it is safe to say that obviation weakening can be put in this category. Further research on French in general and on obviation in particular might shed light on whether the phenomenon of obviation and/or French is a special case.

Conclusion
We hope that our formal method of collecting acceptability judgments has been demonstrated to be a promising and fruitful strategy for questioning long-standing informal results. Our study has shown that we were by and large unable to replicate Ruwet's observations concerning the weakening of obviation since only the factor Coordination convincingly increases acceptability in an otherwise ungrammatical sentence, challenging theoretical accounts of obviation weakening. As for the other five factors, an increase was either not detected or no meaningful interpretation could be made. Despite these results, we did not call obviation weakening per se into question since we considered only one type of matrix verb. We therefore hope that our theory-driven investigation of obviation opens the door to further research on this phenomenon, be it from an experimental or theoretical point of view. Looking beyond our study towards further experimental research, we believe that in addition to employing a formal acceptability judgment task, it would be intriguing to approach obviation weakening from different experimental and empirical angles, such as corpus studies or studies based on sociolinguistic interviews (e.g. Poplack 1992), which would allow for a consolidation of the results presented here and thus contributing to a better overall picture of the phenomenon of obviation weakening in French and other languages.