In Romance, Clitic Left Dislocated (CLLD) Topics can appear in embedded contexts, such as (1), but embedding under a fronted wh-phrase has been reported to yield deviant results. Thus, the European Portuguese (EP) example (1) is fine in comparison to (2). In turn, (2) also needs to be contrasted with (3), which shows that a subject may intervene between the wh-phrase and the verb.
Similar contrasts obtain in relative clauses and in root questions introduced by a d-linked wh-constituent:
There are two main lines of analysis of CLLD in the literature. One influential approach is that of Rizzi (1997), who proposed that CLLDed Topics are introduced by a Topic head which establishes a kind of “higher predication” between the Topic in [Spec,TopP] and the rest of the clause. On this analysis, the Topic is moved to [Spec, TopP]. The other approach (Demirdache 1992; Anagnostopoulou 1997; Raposo 1998; De Cat 2007) assumes that the Topic-comment articulation is licensed by “rules of predication” (Chomsky 1977) that require that the Topic be “base-generated” in a position of adjunction to the XP that is predicated of it, namely either TP (in embedded clauses) or CP (in root clauses). The pronominal clitic provides the open position required for the clausal projection to function as a predicate. Under both accounts, minimality (Rizzi 1990) would explain the deviant cases, given that in all these examples the Topic intervenes between the wh-phrase and its trace.
However, minimality fails to capture another relevant observation, namely that the degree of acceptability of these structures with multiple dependencies depends on the height of the base position associated with the Topic (Cardinaletti 2004; Barbosa 2006). Indeed, there is a clear contrast between high dative Experiencers (6) and lower datives (4b).
Even though (6) is not optimal, it is significantly better than (4b). The following examples illustrate other contexts in which a dative Experiencer Topic is allowed with a resumptive clitic:
With psych verbs of the agradar ‘to please’, convir ‘to be convenient’ type, the Experiencer argument is base-generated higher than the Theme (Belletti & Rizzi 1988). In (6), (7) and (8), the Theme is the embedded infinitival clause, which contains the trace of wh-movement. Schematically, we have the following structure:
In (2), (4b), (5b), by contrast, the Topic is associated with a lower argument (the subject and the direct object are higher than the dative). Therefore, what appears to distinguish (7), (6), (8) from (2), (4b), (5b), is that, in the former, the Topic is associated with the highest argument in the clause. This contrast is not what is predicted under a pure minimality account given that, in both cases, a Topic intervenes in the path of wh-movement. If the presence of the Topic is the offending element, there should be no difference between the two sets of examples, contrary to fact.
There is one possible alternative that can rescue the minimality account. Cardinaletti (2004) argues that there is a SubjectP projection located below Topics and above TP. This position hosts lexical subjects and strong subject pronouns and may also host dative Experiencers in Italian. On the assumption that dative Experiencer Topics sit in [Spec, SubjectP] even when they co-occur with a dative clitic, no minimality effects are predicted to occur in (7), (6), (8) as opposed to the other cases. This account predicts, contrary to fact, that the status of these examples should be no different from that of (3), (4a), (5a).
If indeed height matters in determining whether a CLLDed Topic DP may intervene in the path of wh-movement, we predict that embedding a subject CLLDed Topic within a wh-movement domain should yield better results than embedding a CLLDed complement. This prediction cannot be easily tested in a null subject language, where the counterpart to the clitic in subject CLLD is pro, hence invisible to the naked eye. Spoken French provides a good alternative testing ground, as it features both subject clitics and frequent CLLD (De Cat 2007). The following example of subject dislocation inside a relative clause has been attested in spoken corpora (Barnes 1986: 220):
In light of (10), the question that arises is how such examples compare both with their counterparts with a CLLDed complement (which we predict will be less acceptable) and with their counterparts with a canonical preverbal subject instead of a dislocated one (which we predict will be more acceptable).
In this paper, we report on the results of a study designed to determine how native speakers of French rate examples with different types of CLLDed Topics embedded within the domain of wh-movement. Our goal is to find out which factors contribute to higher levels of acceptability and whether height of the base position associated with the Topic (and the clitic) is a strong predictor, as appears to be suggested by the EP examples discussed above.
Sixty adult native speakers of French (from Belgium and France, 45 female (75%)) took part in an on-line acceptability judgment task. Ethics approval was granted from the University of Leeds.
All critical items featured an embedded wh-structure (either a wh-question or relative clause) in which an XP intervened between the fronted wh-phrase and its trace.
There were 45 test items (20 embedded interrogatives and 25 relative clauses). We manipulated the syntactic position and the semantic role of the intervenor. In terms of syntactic position, the intervenor was either a non-pronominal XP in the canonical subject position (henceforth a “heavy subject”, as in (11) or (15)), a dislocated subject (12), a dislocated object (13) or an adjunct (14).
In terms of semantic role, the intervenor was either an Actor (as in (11), (12)), an Experiencer (as in (13), (15)), a Theme (16), a Goal (17) or a broadly defined “Locative”, i.e. a temporal or a locative (14).
Not all semantic roles can map on all syntactic positions. It was therefore not possible to fully cross these two variables. The distribution of the experimental items along these two dimensions is shown in Table 1.1 Note that Locative designates both temporal and situational locatives, as the difference does not matter in our study.
|Heavy subject||X58, X59, X60, X61, X62, X63, X64||X57, X65|
|Dislocated subject||X48, X49, X51, X52, X53, X54, X67||X44, X46, X47, X55, X56||X50|
|Dislocated object||X31, X32, X33, X37, X39, X40, X45||X29, X30, X38||X34, X35, X36|
|Adjunct||X21, X22, X23, X24, X25, X26, X27, X28|
Most intervenors were DPs (39/45 items, including 6 proper nouns and 6 pronouns). The rest were PPs or AdvPs (5 items). As our hypothesis applies to all wh-chains, we also allowed the type of wh-chain and trace position of the wh-phrase to vary, as shown in Table 2.
|Subject||Complement of V||Adjunct|
|X21, X22, X31, X35, X36, X45||X24, X47, X57, X58, X59, X60||X23, X29, X30, X32, X33, X34, X48, X67|
|X26, X37, X39, X43||X25, X27, X28, X44, X49, X50, X52, X53, X54, X55, X56, X61, X62, X64, X65||X38, X40, X42, X51, X63|
The distribution of test items according to the intervenor’s position and the structural position of the wh-trace is shown in Table 3.
|Dislocated subject||Heavy subject||Dislocated object||Adjunct|
|X31, X35, X36, X37, X39, X43, X45||X21, X22, X26|
|Complement of V|
|X44, X46, X47, X49, X50, X52, X53, X54, X55, X56||X57, X58, X59, X60, X61, X62, X64, X65||X24, X25, X27, X28|
|X48, X51, X67||X63||X29, X30, X32, X33, X34, X38, X40, X42||X23|
To encourage participants to use the full rating scale, we included 20 baseline items that featured a wh-structure but no intervenor, and were either fully grammatical/acceptable (20), or included morpho-syntactic violations (shown with asterisks in (19)) or lexical violation (18).
Each test item was preceded by a short context. This made it possible to control for the discourse status of the referents used in the test sentence (which was particularly important to license dislocated phrases, given their discourse-sensitive nature). The context was constructed in such a way as to make the following test item plausible in terms of relevance and information structure. The referent of the dislocated element was always sufficiently identifiable in the context (e.g., via direct mention or via bridging), as illustrated in (21).
To clearly demarcate the context from the test item, and to facilitate the intake of information, the context was provided in written form. After familiarising themselves with it, participants were invited to click on a button to proceed to the test item.
Dislocated Topics are much more prevalent in spoken French than in written French (De Cat 2007). For the task to be ecologically sound, the test items therefore had to be presented orally (and without transcription). This also allowed to control for their prosodic characteristics. All the test items were pre-recorded by a native speaker of French, who also read the preceding context (which was later removed from the recordings) to maximise the chance of a natural-sounding test item.
The items were presented in the same order to all participants, in a pseudo-randomised order: after randomising the items, we checked that items from the same condition did not follow each other and made minimal changes where required.
Participants recorded their judgment by choosing one of 5 options, listed in (22). The use of qualifiers rather than a numeric scale was intended to harmonise the value attached to each choice across raters. Option (a) is taken to reflect full acceptance and option (d) full rejection. Options (b) and (c) reflect different degrees of markedness. Option (e) allows for uncertainty to be recorded.
|(22)||a.||I could say this.|
|b.||I could say this, but in another context.|
|c.||I could not say this, but I know people who could.|
|d.||Nobody would say this.|
|e.||I don’t know.|
For ease of reference, we repeat our hypothesis in (23). We report below the raw results for baseline and test items, followed by the statistical modeling analyses for the test items only (as the baseline was only intended to get the participants to use the full scale in their judgments).
|(23)||The acceptability of a CLLDed Topic intervening between a fronted wh-phrase and its trace is affected by the height of the base position associated with the clitic: subject CLLDed Topics are better than object CLLDed Topics.|
To facilitate the initial exploration of the raw results, the ratings were converted to a numeric score, shown in the second column in Table 4. Positive scores were used to reflect the degree of acceptance; zero was used to reflect neutral judgments (with scores of 1 and 3 reflecting degrees of markedness, with 1 more marked than 3); a negative score was used to reflect rejection. Importantly the statistical analyses reported in the next section were based on the original ordinal scale (and not on the transformed scores).
|I could say this.||5|
|I could say this, but in another context.||3|
|I could not say this but I know people who could.||1|
|Nobody would say this.||–5|
|I don’t know.||0|
The raw results for the critical items (expressed as numerical scores) are shown in Figure 1. Mean acceptance ratings are plotted by intervenor position (on the y-axis). The semantic role of the intervenor is plotted by colour. The shape of the points on the plot indicate the syntactic position associated with the wh-trace.
We note three patterns in Figure 1: (i) intervening pronouns seem to be less disruptive than intervening XPs, (ii) dislocated objects (when not pronominal) are the most disruptive of wh-chains, and (iii) heavy subjects are the least disruptive.
The associations between the intervenor’s syntactic position and semantic role call for a special method of analysis that would be able to handle the interaction between the two variables in spite of the non-occurrence of observations in some cells (see Table 1). Conditional Inference Trees can test for interactions between factors even when some combination of factor levels are not attested in the data (Baayen et al. 2013). They allow the direct comparison of multiple (possibly correlated) variables without jeopardising the robustness of the final model (Strobl et al. 2009). The tree algorithm establishes the optimal partitioning of the data showing how the outcome variable is predicted by various combinations of predictor values.
Figure 2 plots the output of a conditional inference tree algorithm using the following predictors: Intervenor Position, Intervenor Semantic Role, Intervenor Nature.2 The distribution of acceptability scores (expressed numerically) appears at the bottom of the graph for each of the data subsets defined by the algorithm.
The data partitions evidenced by the conditional inference tree suggest a complex web of interaction between the predictors. The most influential predictor (nodes 1 and 2 in the tree) is Intervenor Position, predicting that dislocated objects are the most disruptive of wh-chains, and that dislocated subjects (while being less disruptive than dislocated objects) are more disruptive than heavy subjects and adjuncts. The type of wh-chain interacts in a complex way with the position and nature of intervenors (nodes 3, 6, 12 and 17). Pronouns also appear to be less disruptive than phrasal intervenors (nodes 10 and 15). Including the semantic role of intervenors in the model makes complexifies the picture further, to the point that it becomes very difficult to interpret.
To evaluate the robustness of the patterns observed above and calculate the relative importance of the predictors, we constructed a random forest model (predicting the acceptability of intervenors in wh-chains, based on the predictors under consideration). Random forests (Breiman 2001; Matsuki et al. 2016) generate multiple conditional inference trees using subsets of the data and subsets of the predictor variables in order to provide test and training sets.
The resulting predictions are tested against the observed data, and the relative importance of variables is calculated. The results are shown in Figure 3.
The predictor with the strongest impact by far is the position of the intervenor. The nature of the intervenor and the position of the wh-trace also have a significant impact. The the semantic role of the intervenor has very little impact.
Random forests can identify the relative impact of predictors, but not their specific effect. To investigate the latter, and avoid transforming the dependent variable in any way, we turn to a regression analysis of the data.
The judgment data in this study was collected using an ordinal scale (repeated below as (24)).
|(24)||a.||I could say this.|
|b.||I could say this, but in another context.|
|c.||I could not say this, but I know people who could.|
|d.||Nobody would say this.|
|e.||I don’t know.|
Ordinal scales feature a number of properties that differentiate them from metric variables. First, the response categories on an ordinal scale may not be equidistant. Our design assumes that categories a and b are very close, and that category c will be in closer proximity to the former two than to category d. Secondly, it is likely that participants interpret the points on the scale differently. We attempted to counteract this as much as possible by “forcing” participants to use the full scale, prompted by the presentation of fully acceptable vs. clearly ungrammatical items. However, differences between participants remain likely, and this can only partly be captured by statistical models (see below).
Treating ordinal data as metric (e.g., by averaging the scores) in statistical modeling leads to over-estimating the information encapsulated by those data (Bürkner & Vuorre 2019). We therefore carried out an ordinal regression analysis, using a Cumulative model.3 The assumptions of such a model is that “the categories have an ordering, but it is not known what the psychological distance between them is or whether the distances between categories are the same across participants” (Bürkner & Vuorre 2019: 2).
Below, we first present the analysis of the baseline items (to confirm the validity of our design), before moving on to the analysis of the critical items.
A mixed-effect ordinal regression model (with Cumulative Link function) was fitted to the baseline data, with Participant and Item as random effects and Structure Type as predictor. Structure Type comprised 4 levels: one for items predicted to be fully grammatical,4 and three for different types of violations, as listed in (25). We did not have specific predictions regarding the relative acceptability of each of the violations, except that they should each be demarcated clearly from the fully acceptable items.
|(25)||a.||Relative clauses containing a resumptive pronoun: (X15), (X16), (X17), (X19)|
|b.||Island violations: (X5), (X6), (X7), (X8)|
|c.||Lexical violations: (X18), (X20)|
The effect of Structure Type as per that regression model is plotted in Figure 4. The curved lines show the distribution of each level of the response variable (i.e. each type of rating). The levels of the predictor variables are plotted as vertical lines. The point where a vertical line crosses a curve indicates the mean probability of that response for the relevant type of item. For instance, the probability that items in the “Ok” condition receive the highest acceptability rating is higher than 90%. There was a small (less than 5%) chance that these items would be rated as marked (i.e. “fine in another context” or “fine for other people”). Such items were also never rated as unacceptable or impossible to rate.
Figure 4 demonstrates that the evaluation of the baseline items was as expected by our design. The items that did not feature any violation (“Ok”) are almost categorically rated as Fine, whereas the three types of items featuring a violation were rated mostly as totally unacceptable (“No one”) or, to some extent, marked (“Fine for others”). The items featuring a violation were never rated as something the participant would have said themselves (“Ok” or “Fine in another context”). Importantly for our design, the near-categorical nature of the judgments of the baseline items confirm that participants did use the full rating scale offered to them, and had confidence about their judgment (as the “Unsure” option was practically never chosen).
A mixed-effect ordinal regression model (with Cumulative Link function) was then fitted to the critical items. The model was fitted using the bottom-up procedure: starting with a random effect structure including Participant and Item, we fitted the simplest model with the most likely predictor (as identified in the Random Forest analysis), i.e. Intervenor Position. After fitting a second model with an additional predictor (Intervenor Nature), we compared the goodness of fit of the two models using likelihood ratio tests. A more complex model was retained only if it improved the fit significantly. Using this method, we tested for the impact of the following predictors, both as main effects and in interaction with each other: Intervenor Nature, Intervenor Position, Intervenor Semantic Role, wh-chain. The optimal fixed-effect structure included two-way interactions between Intervenor Nature and Intervenor Position, between Intervenor Position and wh-chain. We also tested for random slopes within Participant and Item.
Semantic Role did not improve the model fit, either as a main effect (p = 0.15) or in interaction with Intervenor Position (p = 0.23) or Intervenor Nature (p = 0). The interaction observed in the Random Tree analysis was therefore not a robust one (as suspected from the relatively low importance of that variable in the Random Forest analysis).
The summary of the optimal model is shown in Table 5. To interpret the model, the coefficients are plotted in Figure 5. The legend is a short-hand for the original judgement categories. Equivalences are given in (26).
|(26)||a.||Fine: I could say this.|
|b.||Fine.ctxt: I could say this, but in another context.|
|c.||Fine.others: I could not say this, but I know people who could.|
|d.||No.one: Nobody would say this.|
|e.||Unsure: I don’t know.|
|Intervenor position: Heavy subject||–1.9107||1.0671||–1.7906||0.0734|
|Intervenor position: Adjunct||–0.2430||1.0323||–0.2354||0.8139|
|Intervenor position: Dislocated object||1.9512||0.6109||3.1941||0.0014|
|Intervenor nature: PP/AdvP||–1.1354||1.1293||–1.0054||0.3147|
|Intervenor nature: pronoun||–3.5423||0.7189||–4.9273||0.0000|
|Dislocated object; pronoun||–5.0306||1.0753||–4.6785||0.0000|
|Adjunct; subject trace||–5.1799||1.2152||–4.2628||0.0000|
|Heavy subject; v.complement trace||–2.7666||1.1377||–2.4318||0.0150|
|Adjunct; v.complement trace||–0.8044||1.6169||–0.4975||0.6188|
Figure 5 plots the fixed effects of the optimal model. The vertical lines correspond to each term of the three-way interaction between Intervenor Position, Intervenor Nature, and wh-trace. The combinations not instantiated in the data were automatically dropped during model fitting. We refer the reader to the previous section (in relation to Figure 4) for an explanation of how to interpret the curves and the position of the vertical lines.
A clear pattern emerges, with (i) items with very high probability of full acceptance (the first 6 or 7 vertical lines), (ii) items with high probability of marginal acceptance (the middle 5 lines) and (iii) items with high probability of rejection (the right-most line). Participants had clear intuitions about the data they were asked to judge (as shown by the very low probability of “Unsure” judgements), and did not make much use of the “Fine.others” judgement.
The Random Forest analysis and the ordinal regression analysis reveal a consistent picture, which we summarise in (27).
|(27)||a.||The strongest predictor of Intervenor effect is the position of the intervenor.|
|b.||Dislocated objects are significantly more disruptive of wh-chains than dislocated subjects, adjuncts or heavy subjects.|
|c.||Pronouns are not disruptive of wh-chains (even when the intervenor is a dislocated object).|
|d.||Intervenor effects are stronger in argument chains than adjunct chains.|
|e.||The semantic role of the intervenor does not play a significant role.|
|f.||The type of wh-structure does not play a significant role.|
In light of (27a) and (27b), our original prediction is confirmed: embedding a subject CLLDed Topic within a wh-movement domain yields better results than embedding an object CLLDed Topic. There is however one exception (27c): pronominal intervenors were almost categorically accepted, including when they were object CLLDed Topics.
We start by proposing an explanation for the impact of Intervenor Position (i.e. the subject/object asymmetry), before moving on to the additional findings.
Our results show clearly that dislocated objects ((28b), (29b)) are more disruptive of wh-chains than dislocated subjects ((28a), (29a)). This holds both in relative clauses (28) and in embedded wh-clauses (29).5
One might be tempted to explain away this subject/object asymmetry by claiming that subject clitics are agreement markers, in which case the fronted DP is a subject rather than a CLLDed Topic (Culberston 2010). However, our results also show that heavy subjects are less disruptive of wh-chains than CLLDed subjects, a fact that would remain unexplained under an agreement marker analysis of subject clitics. Such an analysis has also been shown to be untenable for a number of independent reasons (De Cat 2005). In particular, the presence of a subject clitic forces a Topic interpretation of the DP associated with it. As non-referential quantified phrases cannot be Topic, this explains their incompatibility with a coindexed subject clitic:6
The literature suggests that, in fact, a further asymmetry is observed among wh-chains with an intervening CLLDed object. Rizzi (1997) mentions the following contrasts:
All of these examples contain a CLLDed object; what differs is the function of the wh-phrase, namely whether it is a subject or an indirect object. Rizzi (1997) proposes that this asymmetry between subject vs. object chains is due to the ECP. The ECP requires that traces must be properly head-governed. A trace in complement position is properly head-governed (by the verb), but a trace in subject position normally is not, unless C is turned into a governor by agreeing with the wh-subject. This is why the agreeing form of C qui must be used in cases of subject extraction in French. Under the assumption that Topics are introduced by a Topic head, the representation of an example such as (31b) is as follows (Rizzi 1997: 307):
|(33)||Je ne sais pas [ qui C [ ton livre Top … [ t pourrait … ]]]|
According to Rizzi (1997), even if C is turned into a governor via agreement, it is too far away to license the subject trace due to the intervening Topic, a standard case of relativised minimality effect. Therefore, for Rizzi (1997), the structure is ruled out as an ECP violation. (31a), by contrast, is a mere subjacency violation.
Coming back to our test sentences, this account may succeed in capturing the asymmetry between cases of wh-complement extraction (28b) as opposed to wh-subject extraction (28a), but fails in the case of wh-adjunct extraction (29). Adjunct extraction is subject to the ECP. Accordingly, neither of the examples in (29) are predicted to be acceptable on Rizzi’s account. In both cases, the Top head intervenes between the trace of the wh-adjunct and its antecedent. However, this prediction is not borne out: an intervening Topic object yields a more severe violation than an intervening subject Topic even in a wh-adjunct chain. In addition, our data show that object Topic intervenor effects are stronger in argument chains than in adjunct chains (27d), which is the opposite pattern to that predicted under Rizzi’s account.
In light of the contrast between subject and object Topics, we propose that additional information needs to be taken into account. First, let’s consider (28). (28a) differs from (28b) in the way the scopes of the wh-item and the Topic interact. Schematically, (28a) corresponds to the structure in (34a), and (28b), to the structure in (34b).
|(34)||a.||quei [les athlèthesk] ilsk sont fiers d’avoir remportées ti|
|b.||quii [les médailles d’or]k ti clk ont remportées tk|
While (34a) involves nested dependencies, (34b) involves crossing dependencies. Likewise, Rizzi’s examples (31) and (32) display similar patterns. The wh-subject extraction case involves crossing and is judged unacceptable. The wh-indirect object case involves nesting and is judged comparatively better:7
|(35)||a.||?un homme à quij, ton livrei, je pourrais lei donner ti tj|
|b.||*?un homme quij, ton livrei, tj pourrait l’acheter ti|
The contrast between (29a) and (29b) (repeated below as (36a), (36b)) can be analysed in a similar fashion. If we assume that the trace of the temporal adjunct occupies a position that is lower than the subject, but higher than object (Laenzlinger 1998; Cinque 1999), then (36a) involves nesting, while (36b) involves crossing:
|(36)||a.||quandi [ ton patronk [TP ilk … ti … ]]|
|b.||quandi [ le voleurk [TP … clk … ti … tk ]]|
Within multiple A-bar dependencies, nested dependencies have been argued to be favored over crossing dependencies (Kaplan 1973; Kuno 1973; Bach 1977; Baker 1977; Fodor 1978). The following typical examples, where wh-movement and Tough-movement are both applied, are illustrative:
|(37)||a.||Which violin is this sonata easy to play t on t?|
|b.||*Which sonata is this violin easy to play t on t?|
Baker (1977: 63) stated this constraint in terms of processing:
|(38)||a.||As a sentence is processed from left to right, a prospective tenant [=filler] y is more current than a prospective tenant x if y occurs to the right of x.|
|b.||A prospective filler is assigned to the first unoccupied address [=gap] for which it is the most current of the eligible prospective tenants.|
Fodor (1978: 448) formulated this constraint as an anti-ambiguity parsing strategy:
|(39)||Nested Dependency Constraint (NDC) (Fodor 1978)|
|If there are two or more filler-gap dependencies in the same sentence, their scopes may not intersect if either disjoint or nested dependencies are compatible with the well-formedness conditions of the language.|
Pesetsky (1982: 309) reaffirmed this condition in the form of a syntactic constraint on A’-movement (his Path Containment Condition).
|(40)||Path Containment Condition (PCC)|
|If two paths overlap, one must contain the other.|
Now recall from the Introduction that, in EP in constructions containing CLLDed datives, low dative Topics are more disruptive of wh-chains than high dative Experiencers:
A closer look at the relevant sentences reveals that the dative Experiencer cases differ from the dative Goal cases in the way the scopes of the wh-item and the Topic interact. On the assumption that, with verbs of the convir ‘be convenient’ type, the Experiencer argument is the highest argument, these examples are comparable to the subject/object asymmetries found in French. Schematically, (41a), corresponds to the structure in (42a), and (41b), to the structure in (42b).
|(42)||a.||quei à Mariak … clk convém tk [ comprar ti ]|
|b.||quei à Mariak clk ofereceu ti tk|
Thus, in all of the multiple dependency constructions examined, the illicit examples involve crossing dependencies while the others involve nesting dependencies. We therefore conclude that a restriction against crossing dependencies is operative in these constructions, with one important qualification. The restriction against crossing only applies to configurations in which the Topic appears inside the scope of the wh-phrase (i.e. not in cases like (43), in which the Topic precedes the wh-constituent).
The sentences in (43) involve crossing dependencies and yet they are acceptable. Moreover, the constraint also doesn’t apply to dependencies established between different CLLDed elements, where the relative ordering of the Topics is free (De Cat 2007):
The constraint in question should thus be restricted to apply to chains created by wh-movement. The following descriptive generalisation adequately captures the patterns observed:
|(45)||A wh-movement dependency may contain a CLLDed Topic in its scope iff the full Topic-cl-gap dependency is contained within the domain of the wh filler-gap dependency.|
It is not clear how to derive (45) from syntactic constrains. On the one hand, the fact that CLLD as such is not subject to the no-crossing constraint is consistent with non-movement analyses of the construction (Demirdache 1992; Anagnostopoulou 1997; Raposo 1998; De Cat 2007).8 But if CLLD doesn’t involve movement, one cannot appeal to a constraint on movement (e.g. Pesetsky’s Path Containment Condition or any of its current instantiations, such as the Minimal Link Condition) to rule out the cases in which the trace of a Topic to the right of a wh-phrase is c-commanded by the trace of the wh-phrase (i.e., the cases that do not fall under (45)). Conversely, if we do assume that CLLD involves movement and that (45) is derived from a ban on intersecting movement operations, then (43) is predicted to be ruled out, contrary to fact. In other words, we have no explanation for why the ban on intersecting dependencies applies whenever the wh-constituent precedes the Topic and not when the reverse order obtains. For these reasons, we will explore an account of (45) that doesn’t rely on principles of narrow syntax.
We know from the sentence processing literature that processing complexity may affect acceptability judgments (Yngve 1960; Miller & Chomsky 1963; Kimball 1973; Frazier 1985; Gibson 1991; 2000; Vasishth et al. 2010). Multiply center-embedded sentences are one example of structures that are judged unacceptable by virtue of a processing overload effect. Sentences with one level of embedding are considered grammatical (46a), while additional levels of embedding yield a degraded judgment (46b).
|(46)||a.||The reporter [who the senator attacked] disliked the editor|
|b.||#The reporter [who the senator [who John met] attacked] disliked the editor. (Gibson 2000: 96)|
The contrast above shows that processing overload may affect intuitive judgments of acceptability. The type of filler-gap dependencies investigated in this paper have been shown to be demanding in terms of processing resources (King & Just 1991; Gibson 1998; 2000; Fiebach et al. 2002). It is now a well established fact that, once a filler is encountered, comprehenders anticipate the location of potential gap sites and attempt to construct dependencies in advance of information about the gap position, a phenomenon that came to be known as active dependency formation (Crain & Fodor 1985; Frazier & Clifton 1989).
In addition, a number of electrophysiological studies have provided evidence of the memory cost of keeping a dependency open during language processing. In a study of Event-Related Potentials (ERPs) registered during the processing of subject and object questions, Kluender & Kutas (1993) found that object questions elicited a larger left anterior negativity (LAN) at the filler and gap positions. Similar findings were obtained by Fiebach et al. (2002). In the latter study, ERPs were recorded while participants processed case-unambiguous German subject and object wh-questions with either a long or short distance between the wh-filler and its gap. A sustained LAN was observed for object questions with long filler-gap distance but not for short object questions. The authors interpreted the sustained negativity as reflecting working memory processes required for maintaining the displaced object in memory. They also observed a broadly distributed positivity at the second NP for both short and long object-wh-questions. Such an effect was not found in subject wh-questions. Given that parietal positivity (P600) has been observed in response to increased integration difficulty (Kaan et al. 2000), Fiebach et al. (2002: 268) interpreted this effect as a reflection of the difficulty of local integration processes associated with the gap position in the sentence.
Building on these results, Felser et al. (2003) recorded ERPs during the processing of unambiguous German sentences containing different types of filler-gap dependency: topicalisation constructions and wh-questions. Both topicalisation constructions and wh-questions were found to elicit a LAN prior to the processing of the subcategorizing verb. At the subcategorising verb, sentences containing a wh-dependency produced a parietal positivity (P600). Topicalisation structures did not produce this effect.
These results constitute further evidence for separable parsing processes, with memory-based processes being manifested in terms of LAN, and the relative difficulty of integrating the filler with its subcategoriser manifested as P600. The fact that the size of the observed LAN is not influenced by the type of filler suggests that the working memory cost induced by processing filler-gap dependencies is independent of the type of syntactic dependency involved. Integration cost is influenced by the type of filler-gap dependency: displaced wh-phrases are more costly than topicalised elements. According to Felser et al. (2003), integration cost is higher in wh-movement because in addition to semantically integrating the filler with its subcategoriser, an operator-variable dependency must be upheld at the same time for the sentence to be assigned the correct interpretation. These findings are relevant for our purposes, because they constitute evidence that there is a difference in integration cost between Topic fillers and wh-fillers. If (45) is related to processing constraints and if the integration costs of Topic dependencies are lower than those incurred by displaced wh-phrases, it is no longer surprising that (45) should apply to wh-dependencies and not to Topic-(cl)-gap dependencies.
The difference observed between pronominal and phrasal intervenors also favors a processing approach. Indeed, a similar asymmetry between pronouns and full DPs has been observed in the processing of center-embedded sentences: their intelligibility is increased if the subject of the most embedded relative clause is a pronoun (as in (47)) rather than a full DP (as in (48), repeated from (46b)) — see e.g. Bever (1970); Gibson (1998, 2000); Warren & Gibson (2002).
|(47)||The reporter [who the senator [who I met] attacked ] disliked the editor. (Gibson 2000: 100)|
|(48)||#The reporter [who the senator [who John met] attacked ] disliked the editor. (Gibson 2000: 96)|
Warren & Gibson (2002) carried out a judgment task evaluating how sentences like (48) compared to their counterparts with first and second person pronouns in the most deeply embedded subject position. They found that the latter were rated significantly higher in acceptability than the former. Different accounts of this contrast can be found in the literature on processing complexity (see below), but all of them converge on the idea that pronoun intervenors are less disruptive of structural integration of a non local dependent than full DPs. Therefore, we take the difference between pronominal and phrasal intervenors as additional evidence in favor of the idea that structural integration cost is the key factor affecting wh-chains containing intervening Topic fillers.
Viewed from a processing perspective, (45) amounts to the claim that maintaining an active filler while processing a wh filler-gap dependency is costly. By contrast, if the Topic filler-gap dependency is complete by the time the wh-gap is encountered, performing the integration of the wh-filler is easier. In order to see this, let us compare the following examples:
First, let us consider (49). At the point que is processed there are two incomplete dependencies in storage: que is dependent on a following verb and on an empty position to be associated with it. The Topic les autres, in turn, is dependent on a verb and so is the subject clitic. The inflected auxiliary and the main verb jointly satisfy the predictions of the Topic and the clitic. The main verb introduces the prediction of an object. Thus, by the time the verb is processed there are only two incomplete dependencies in storage, namely the prediction of an object selected by V and the prediction of a gap associated with que. An empty category can be connected in the representation satisfying both predictions. Retrieval of the wh-filler is not problematic.
Turning to (50), at the point the Topic les médailles is being processed, there are four incomplete dependencies: qui is dependent on a following verb and on an empty position to be associated with it; the DP les médailles is dependent on a verb and on an associated gap. In the case of CLLDed objects, we are assuming that, even though the Topic is reactivated by the clitic in preverbal position (Pablos 2006), it is stored and maintained as an incomplete dependent until its subcategoriser (or associated gap) is encountered. Hence, at the point of integration of qui, there are four incomplete dependencies in storage. Assuming that integration of a wh-operator is by itself a costly operation, as argued by Felser et al. (2003), it is not surprising that this configuration should raise processing difficulty above a threshold that results in perception of unacceptability.
We can now return to the case of CLLDed pronouns (Finding (27c)), which we have found to be accepted as intervenors even when the clitic they are associated with is not a subject. Thus, examples such as (51) received a high acceptability score in our study.
Importantly, in our data, we only have examples with 1st and 2nd person pronouns. Different explanations can be found in the literature as to why pronouns differ from full DPs in contexts of processing complexity. Under a decay-based framework, Gibson (1998; 2000) and Warren & Gibson (2002) propose that retrieval difficulty depends on the number of new discourse referents (nouns and verbs) intervening between the two elements of a long-distance dependency. Since 1st and 2nd person pronouns do not introduce new discourse referents (in the sense that reference to speaker and hearer is always implied), they would not count as intervening elements.
A cost metric based on the notion of new discourse referent, however, has been shown not to be fully adequate. Fedorenko et al. (2012) observed a robust extraction effect even in cases in which all the DPs are old information. Per se, this doesn’t preclude the possibility that 1st and 2nd person pronouns are more easily processed than given DPs. In fact, Gibson et al. (2013) suggests that the facilitating effect of pronouns could be due to lexical factors and/or their frequencies in the relevant syntactic contexts. We now turn to the cases of wh-adjunct extraction (Finding (27d)).
The clustering of vertical lines in Figure 5 reveals the following patterns:
|(52)||a.||Items predominantly rated as fully acceptable (the 7 left-most lines in the plot)|
|b.||Items predominantly rated as “I could not say this, but I know people who could”. (the 4 lines in the center of the plot)|
|c.||Items predominantly rated as unacceptable (the right-most line in the plot)|
Among the items with the highest acceptability ratings (52a), six “conditions” involve argument chains, and one involves adjunct chains (featuring an intervening heavy subject DP). As this is a single item (X63), no firm conclusion can be drawn from its relatively high acceptance.
We interpret the ratings of the items in (52b) as “marked” for two reasons. First, the most common response suggests a perception of language variation: in most cases, participants did not fully accept nor reject the items, but judged that other speakers would accept them. Second, the items in (52b) elicited the most variation in ratings. The first two conditions (i.e. the two lines on the left) elicited 25% full acceptance and 5% rejection, the next condition (i.e. the third line) elicited an equal amount of acceptance and rejection (15% each), and the last condition (i.e. the fourth line) elicited approximately 30% of rejections and less than 5% acceptance.
Finally, the most unacceptable items (52c) are those involving a dislocated DP object in an argument wh-chain — which we discussed above.
Two factors seem to drive the effects observed in (52): the position associated with the intervenor (with dislocated objects inducing the strongest intervention effect) and the position of the wh-trace (with weaker intervention effects in adjunct chains compared with argument chains).
There is cross-linguistic online experimental evidence that wh-adjuncts elicit a kind of memory storage cost similar to that shown for wh-arguments (Stepanov & Stateva 2015). What is less clear, however, is the location of the base position of wh-adjuncts and, consequently, their integration point in a filler-gap dependency.
Our test sentences contain low wh-adjuncts, namely quand ‘when’ and où ‘where’. Therefore, storage costs are predicted to obtain. A close examination of the patterns displayed by these two adjuncts shows a difference in numerical rating9 between the two in sentences containing a CLLDed object: the mean of the où-sentences (-0.5) is lower than that of the quand-sentences (1.2). The mean of the scores for sentences with an intervening CLLDed subject is comparable to that of the sentences with an intervening canonical subject (around 8). We leave it to further research to replicate these findings and investigate the patterns of storage cost effects elicited for the two types of adjunct.
Finally, the theta-role of the intervenor and the type of wh-structure (i.e. relative clause vs. embedded interrogative) did not have a robust impact on the intervention effect. Further research will be needed to ascertain this was not due to relative lack of power or to a confound in our design.
This study has explored the interaction between different types of filler gap dependencies: wh-movement and CLLD. We have presented experimental evidence from French in favor of an asymmetry between dislocated subjects and dislocated objects when embedded under the domain of wh-movement. Dislocated objects are significantly more disruptive of wh-chains than dislocated subjects or adjuncts. We have discussed the problems faced by treatments of this asymmetry in terms of minimality (or shortest move) and we have attributed this asymmetry to the following restriction:
|(53)||A wh-movement dependency may contain a CLLDed DP Topic in its scope iff the full Topic-cl-gap dependency is contained within the scope of the wh-filler gap dependency.|
We note that: (i) the restriction in (53) only applies to wh-movement dependencies (and not to Topic dependencies) and (ii) the restriction doesn’t apply to configurations in which the CLLDed constituent is a pronoun.
Starting from the observation that, in online processing, the integration cost of wh-constituents is higher than that of displaced Topics (Felser et al. 2003), we explored the possibility of deriving (53) from the processing demands of wh-movement dependencies. Since pronouns are known for contributing to a decrease in processing complexity, we took observation (ii) above as an argument in favor of a processing approach. Viewed from this perspective, (53) amounts to the claim that maintaining an active filler in the course of processing a wh-filler-gap dependency is costly for the human processor. Thus, this paper contributes further evidence regarding the role of cognitive constraints on perceptions of acceptability (Miller & Chomsky 1963; Kluender & Kutas 1993; Hofmeister & Sag 2010).
Our predictions will need to be tested using real-time processing data. Further research will also be required to determine whether (53) derives from general limitations on working memory capacity, as we have suggested, or whether it should be attributed to the difficulty to shunt information between focal attention and passive memory, in line with more recent models of memory architecture (McElree 2006).
2The Conditional Inference Tree and Random Forest analyses reported in this section were performed in R (version 3.6.0), using the package party (version 1.3.3).
3The Ordinal Regression analysis reported in this section was performed in R (version 3.6.0), using the package ordinal (version 2019.4.25).
4The fully acceptable baseline items are listed in the appendix as: (X1), (X2), (X3), (X4), (X9), (X10), (X11), (X12), (X13), (X14), (X68).
5In the discussion, we indicate the acceptability of the examples by reporting the average, numerically-transformed rating in parentheses.
6An alternative explanation to the subject/object asymmetry observed above could be proposed along the lines of Cardinaletti (2004), who argues that there is a SubjectP projection located below Topics and above TP. One could assume that subject DPs doubled by a clitic in French occupy SubjectP and reach that position by A-movement, in which case no minimality effects are expected to occur. The ungrammaticality of (30a) however, is very intriguing in this light. The restriction against non-referential quantifiers is a well known property of CLLD, one that applies to any CLLDed element irrespective of its function. It is not a characteristic property of subjects. Thus, the ungrammaticality of (30a) is expected under a CLLD analysis and is unexpected under an A-movement analysis.
7The judgments in (35) are those reported by Rizzi.
8In fact, Topicalisation in English, which arguably involves movement, does obey the crossing constraint when combined with wh-movement (Pesetsky 1982: 269):
|(i)||a.||This problemi, Mary knows whoj [PRO to consult tj about ti]|
|b.||*This specialisti, Mary knows [what problem]j [PRO to consult ti about tj]|
In (i) the Topic precedes the wh-phrase and the structure is subject to the crossing constraint.
9The conversion of ratings to a numerical scale is explained in Table 4.
CL = clitic, CLLD = Clitic Left Dislocation, DAT = dative, IMP = imperfect, IND = indefinite, INF = infinitive, NEG = negation, PART = partitive, REFL = reflexive.
Thanks to Joe Rodd for his help with the implementation of the on-line grammaticality judgement test based on audio stimuli, and to the participants for engaging patiently with the judgment acceptability task.
The authors have no competing interests to declare.
Anagnostopoulou, Elena. 1997. Clitic left dislocation and contrastive left dislocation. In Elena Anagnostopoulou, Henk van Riemsdijk & Frans Zwarts (eds.), Materials on left dislocation, 151–192. Amsterdam: John Benjamins Publishing Company. DOI: https://doi.org/10.1075/la.14.11ana
Baayen, R. Harald, Laura Janda, Tore Nesset, Anna Endresen & Anastasia Makarova. 2013. Making choices in Russian: Pros and cons of statistical methods for rival forms. Russian Linguistics 37. 253–291. DOI: https://doi.org/10.1007/s11185-013-9118-6
Barbosa, Pilar. 2006. Minimalidade e predicação. In Fátima Oliveira & Joaquim Barbosa (eds.), Textos selecionados do XXI Encontro da Associação portuguesa de Linguística, 183–201. APL/Edições Colibri.
Barnes, Betsy Kerr. 1986. An empirical study of the syntax and pragmatics of left dislocations in spoken French. In Osvaldo Jaeggli & Carmen Silva-Corvalan (eds.), Studies in romance linguistics, 207–223. Dordrecht: Foris.
Belletti, Adriana & Luigi Rizzi. 1988. Psych-verbs and theta-theory. Natural Language and Linguistic Theory 6. 291–352. DOI: https://doi.org/10.1007/BF00133902
Breiman, Leo. 2001. Random forests. Machine Learning 45. 5–32. DOI: https://doi.org/10.1023/A:1010933404324
Bürkner, Paul-Christian & Matti Vuorre. 2019. Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science 2(1). 77–101. DOI: https://doi.org/10.1177/2515245918823199
Cardinaletti, Anna. 2004. Towards a cartography of subject positions. In Luigi Rizzi (ed.), The structure of CP and IP. The cartography of syntactic structures 2. 115–165. Oxford: Oxford University Press.
Crain, Stephen & Janet Dean Fodor. 1985. How can grammars help parsers? In David R. Dowty, Lauri Karttunen & Arnold M. Zwicky (eds.), Natural language parsing: Psychological, computational, and theoretical perspectives (Studies in Natural Language Processing), 94–128. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511597855.004
Culberston, Jennifer. 2010. Convergent evidence for categorial change in French: From subject clitic to agreement. Language 86(1). 85–132. DOI: https://doi.org/10.1353/lan.0.0183
De Cat, Cécile. 2005. French subject clitics are not agreement markers. Lingua 108. 1195–1219. DOI: https://doi.org/10.1016/j.lingua.2004.02.002
Fedorenko, Evelina, Steve Piantadosi & Edward Gibson. 2012. Processing relative clauses in supportive contexts. Cognitive Science 36(3). 471–497. DOI: https://doi.org/10.1111/j.1551-6709.2011.01217.x
Felser, Claudia, Harald Clahsen & Thomas F. Münte. 2003. Storage and integration in the processing of filler-gap dependencies: An ERP study of topicalization and wh-movement in German. Brain and Language 87(3). 345–354. DOI: https://doi.org/10.1016/S0093-934X(03)00135-4
Fiebach, Christian, Matthias Schlesewsky & Angela Friederici. 2002. Separating syntactic memory costs and syntactic integration costs during parsing: The processing of German WH-questions. Journal of Memory and Language 47. 250–272. DOI: https://doi.org/10.1016/S0749-596X(02)00004-9
Frazier, Lyn & Charles Clifton Jr.. 1989. Successive cyclicity in the grammar and the parser. Language and Cognitive Processes 4(2). 93–126. DOI: https://doi.org/10.1080/01690968908406359
Gibson, Edward. 1998. Linguistic complexity: locality of syntactic dependencies. Cognition 68. 1–76. DOI: https://doi.org/10.1016/S0010-0277(98)00034-1
Gibson, Edward. 2000. The dependency locality theory: a distanced-based theory of linguistic complexity. In Y. Miyashita, A. Marantz & W. O’Neil (eds.), Image, language, brain, 95–126. Cambridge, MA: MIT Press.
Gibson, Edward, Harry Tily & Evelina Fedorenko. 2013. The processing complexity of english relative clauses. In M. K. Tannenhaus, M. Sainz & I. Laka (eds.), Language down the garden-path: the cognitive and biological basis for linguistic structures, 149–173. Oxford University Press. DOI: https://doi.org/10.1093/acprof:oso/9780199677139.003.0006
Hofmeister, Philip & Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86(2). 366. DOI: https://doi.org/10.1353/lan.0.0223
Kaan, Edith, Anthony Harris, Edward Gibson & Phillip Holcomb. 2000. The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes 15(2). 159–201. DOI: https://doi.org/10.1080/016909600386084
Kimball, John. 1973. Seven principles of surface structure parsing in natural language. Cognition 2(1). 15–47. DOI: https://doi.org/10.1016/0010-0277(72)90028-5
King, Jonathan & Marcel Adam Just. 1991. Individual differences in syntactic processing: The role of working memory. Journal of Memory and Language 30(5). 580–602. DOI: https://doi.org/10.1016/0749-596X(91)90027-H
Kluender, Robert & Marta Kutas. 1993. Subjacency as a processing phenomenon. Language and Cognitive Processes 8(4). 573–633. DOI: https://doi.org/10.1080/01690969308407588
Laenzlinger, Christophe. 1998. Comparative studies in word order variation. Amsterdam/Philadelphia: John Benjamins. DOI: https://doi.org/10.1075/la.20
Matsuki, Kazunaga, Victor Kuperman & Julie A. Van Dyke. 2016. The random forests statistical technique: An examination of its value for the study of reading. Scientific studies of reading : the official journal of the Society for the Scientific Study of Reading 20(1). 20–33. DOI: https://doi.org/10.1080/10888438.2015.1107073
McElree, Brian. 2006. Accessing recent events. Psychology of Learning and Motivation 46. 155–200. DOI: https://doi.org/10.1016/S0079-7421(06)46005-9
Raposo, Eduardo. 1998. Definite/zero alternations in Portuguese: Towards a unification of topic constructions. In B. Tranel, A. Schwegler & M. Uribe-Etxebarria (eds.), Romance linguistics: Theoretical perspectives, 197–212. Amsterdam: John Benjamins. DOI: https://doi.org/10.1075/cilt.160.16rap
Rizzi, Luigi. 1997. The fine structure of the left periphery. In Liliane Haegeman (ed.), Elements of grammar: A handbook of generative syntax, 282–337. Dordrecht: Kluwer Academic Publishers. DOI: https://doi.org/10.1007/978-94-011-5420-8_7
Stepanov, Artur & Penka Stateva. 2015. Cross-linguistic evidence for memory storage costs in filler-gap dependencies with wh-adjuncts. Frontiers in Psychology 6(1301). DOI: https://doi.org/10.3389/fpsyg.2015.01301
Strobl, Carolin, James Malley & Gerhard Tutz. 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4). 323–348. DOI: https://doi.org/10.1037/a0016973
Vasishth, Shravan, Katja Suckow, Richard L. Lewis & Sabine Kern. 2010. Short-term forgetting in sentence comprehension: crosslinguistic evidence from verb-final sturctures. Language and Cognitive Processes 4. DOI: https://doi.org/10.1080/01690960903310587
Warren, Tessa & Edward Gibson. 2002. The influence of referential processing on sentence complexity. Cognition 85(1). 79–112. DOI: https://doi.org/10.1016/S0010-0277(02)00087-2