1 Introduction

According to Copy Theory (Chomsky 1993), the displacement property of human language is an epiphenomenon of how the interfaces deal with collections of non-distinct constituents in a phrase marker. Thus, the movement dependency underlying a passive sentence as (1a) involves (at least) two occurrences of the constituent Cosmo (1b). The intuition is that both occurrences of Cosmo form some kind of unit, i.e., a chain CH = {Cosmo, Cosmo}, in virtue of being “the same” element. Chains are assumed to have distinctive properties at the interfaces (e.g., only one chain member is spelled out at PF, the elements of a chain share the same θ-roles at LF).

(1) a. Cosmo was arrested.
  b. [TP Cosmo [T’ was [VP arrested Cosmo]]]

Since Non-Distinctiveness is the key property linking together the elements in a chain, it should be regarded as one of the core concepts associated with movement dependencies under Copy Theory. However, offering a principled definition of Non-Distinctiveness allowing to distinguish between real copies (i.e., transformationally related elements) and unrelated tokens of a single lexical item has proven to be exceptionally difficult. Most part of the literature on Copy Theory (e.g., Chomsky 1995; Nunes 1995; 2004; 2011; Bošković & Nunes 2007; among others) adopts an approach to Non-Distinctiveness that is analogous to the “coindex via movement” mechanism assumed to be part of the Move-α operation in the GB framework (e.g., Lasnik & Uriagereka 1988).1 It follows a proposal made by Chomsky (1995) according to which every syntactic object in a Numeration should be marked as distinct by the operation Select. Thus, in a representation like (2a), every element in the derivation carries an index distinguishing it from the remaining lexical items. When a copy of a constituent is generated, as in (2b), the Copy operation replicates all its properties, including its index.2 Once this new copy is merged with the main structure, as in (2c), the representation contains two constituents carrying the same index.

(2) a. K = [TP wasi [VP arrestedj Cosmok]]
  b. K = [TP wasi [VP arrestedj Cosmok]]
    L = Cosmok
  c. K = [TP Cosmok [T’ wasi [VP arrestedj Cosmok]]

As the index allows recognizing both occurrences of Cosmok as non-distinct, they are trivially predicted to form the chain CH = {Cosmok, Cosmok}. For ease of presentation, I will call this type of marking device Indexical Sameness, or Indexical-S for short. The mechanism can be informally defined as follows:

(3) Indexical-S
  Two constituents α and β are non-distinct if and only if they are assigned the same index/marking through an application of the Copy operation (or any other derivational procedure).

There are two main theoretical problems with this approach to Non-Distinctiveness. First, it violates the Inclusiveness Condition.

(4) Inclusiveness Condition
  Any structure formed by the computation is constituted of elements already present in the lexical items. No new objects are added in the course of computation apart from rearrangements of lexical properties.

Since indexes (or any kind of markings) are not inherent properties of any lexical item, the condition in (4) bans them.

While many useful proposals in the literature seem to depart from Inclusiveness, non-obeying (4) is particularly significant for the case at hand. Satisfaction of Inclusiveness is supposed to be one of the key advantages of Copy Theory over Trace Theory.3 If Copy Theory requires introducing indexes to deal with Non-Distinctiveness, one of the conceptual benefits of adopting it vanishes. As Neeleman & van de Koot (2010: 332) put it, “the copy theory by itself does not resolve the tension between Inclusiveness and the displacement property of natural language”, at least if Indexical-S is assumed.

Second, Indexical-S involves no real theory of Non-Distinctiveness; it is just a marking mechanism, an inductive device to get the right chains without further complications. In contrast, a true theory of Non-Distinctiveness should be able to explain on independent grounds (i) what kind of elements count as non-distinct for grammar and (ii) what kind of criteria are taken into consideration in such a calculus.

To put it in different terms, once Copy Theory is assumed, Non-Distinctiveness should be regarded as a theoretical problem that is similar to defining the identity conditions on ellipsis, a classic topic in linguistic theory since, at least, Ross (1969). The resemblance between these two types of phenomena is evident. Both require some kind of parallelism between a phonetically realized antecedent and a silent gap; and in fact, it has been proposed that ellipsis and copy deletion are indeed two varieties of the same general phenomenon (e.g., Chomsky 1995; Donati 2003; Saab 2008). If this parallelism requirement is an active domain of inquiry for the theory of ellipsis, then it should also be so for Copy Theory. Therefore, there would be no reason to aprioristically accept and maintain an unprincipled definition of Non-Distinctiveness such as Indexical-S if an empirically adequate alternative is offered. The present paper attempts to offer such an alternative.

The structure of the paper is as follows. In section 2, I introduce the premises that allow defining Non-Distinctiveness as an inclusion relation between the morphosyntactic values of two constituents in a local configuration. Additionally, I argue that chain formation is based on a representational computation that takes place at the interfaces. In section 3, I compare Indexical-S in (3) and the inclusion-based approach regarding three empirical domains that have received accounts within Copy Theory: wh-copying, non-identical wh-doubling, and anti-reconstruction effects. I show that many properties of these phenomena follow as straightforward consequences of the system advanced in section 2. Section 4 summarizes the conclusions.

2 Towards a principled definition of Non-Distinctiveness

I pursue a theory of Non-Distinctiveness that relies on certain assumptions on grammatical features. The premise in (5) serves as a starting point for discussing them.

(5) Syntactic objects are abstract sets of features without any phonological content.

This is the (generalized) Late Insertion hypothesis advanced by proponents of Distributed Morphology (Halle & Marantz 1993). According to it, syntactic terminals consist only of grammatical features without any phonological information. Phonological matrixes are introduced into syntactic representations during the mapping to the phonological component due to an operation called Vocabulary Insertion.

Notice that the assumption in (5) treats features as the most basic unit of syntactic computation, thus some definitions are in order. I follow Gazdar et al. (1985) and Adger & Svenonius (2011) in taking grammatical features to be ordered pairs formed by an attribute and a corresponding value.

(6) Valued feature (Adger & Svenonius 2011: 38)
  a. A valued feature is an ordered pair <Att,Val> where
  b. Att is drawn from the set of attributes, {A, B, C, D, E, …}
  c. and Val is drawn from the set of values, {a, b, …}

The set of attributes contains classes of features (e.g., Number or Gender), while the set of values contains morphosyntactic properties pertaining to these classes (e.g., singular, plural; or feminine, neuter). If a given lexical item expresses any particular property in a language (e.g., plural), it means it inherently has the corresponding attribute (e.g., Number). Therefore, there are no privative syntactic features under these assumptions; every distinctive behavior between two tokens of a constituent is due to opposing values in certain features.

Following Adger (2010), a feature is taken to be unvalued if it has the empty set ∅ instead of a value.

(7) Unvalued feature
  a. An unvalued feature is an ordered pair <Att,∅> where
  b. Att is drawn from the set of attributes, {A, B, C, D, E, …}
  c. and ∅ needs to be replaced with an element from the set of values, {a, b, …}

For simplicity, syntactic features that do not participate in processes based on valuation will be either replaced with ellipses (…) or represented as values {Val}. For instance, a categorial V-feature may be represented as {V}, and not as an ordered pair <Cat,V>.

As stated in (7c), an unvalued feature <Att,∅> requires replacing the empty set ∅ with a value. I follow Chomsky (2000; 2001) in adopting the Agree system for feature valuation. This implies accepting the Activity Condition in (8).

(8) Activity Condition (Chomsky 2001)
  A Goal G is accessible for Agree if G has at least one unvalued feature.

The usual instance of activity/inactivity involves φ-agreement and Case assignment. A DP carrying an unvalued Case feature <Case,∅> is active for φ-agreement, so it may be a Goal for a Probe requiring φ-features. As a consequence of agreement, the Probe values the Case feature of the DP, turning it inactive for further φ-related operations. I take that these mechanisms also hold for left-peripheral features.

(9) The Activity Condition applies for both A and A’-dependencies.

From now on, I will use the Greek letters κ and ω to designate activity-features for A and A’-dependencies, respectively. For concreteness, κ is simply an abbreviation for classic abstract Case, while ω is an attribute that allows a constituent carrying a left-peripheral value (e.g., Wh) to be targeted by a Probe in the C-domain.

According to (9), a wh-pronoun like who in (10) requires entering into the derivation with two unvalued features, a Case feature <κ,∅>, allowing it to enter in an Agree relation with matrix T, and an ω-feature <ω,∅>, allowing it to move to the specifier position of the interrogative complementizer.

(10) Who seems to be happy?

The derivation of (10) involves four occurrences of who valuing features in different positions in the structure according to (11): who4 is externally merged in a small clause SC carrying both unvalued features <κ,∅> and <ω,∅>; who3 is a copy of who4 generated through successive cyclic movement; who2 is a copy of who3 that A-moved to matrix Spec,T and received nominative Case; and finally, who1 is a copy of who2 that A’-moved to Spec,C valuing its ω-feature.4

(11) a. [CP Who1 [TPwho2 seems [TPwho3 to be [SCwho4 happy]]]]?
  b. Who1{<κ,NOM>,<ω,Q>, …} … who2{<κ,NOM>,<ω,∅>, …} … who3{<κ,∅>,<ω,∅>, …} … who4{<κ,∅>,<ω,∅>, …}

A principled definition of Non-Distinctiveness must determine the type of linking principle that binds together the copies of who in order to form the chain CH = {who1, who2, who3, who4}. In doing so, it should not introduce markings nor indexes into the representation; the relevant relation must be based on the properties of the elements forming the chain. Therefore, Non-Distinctiveness must follow from some kind of association between the features of the copies of who. An inspection of the representation in (11b) reveals that the attributes of the features of the four copies remain constant, while there are some differences regarding their values. Suppose that Vα is the set of morphosyntactic values of a constituent α, and that the sets V1, V2, V3 and V4 correspond to the values of who1, who2, who3 and who4, respectively. According to (11b): the sets V4 and V3 are identical (cf. (12a)); V3 is a proper subset of V2 (cf. (12b)); and V2 is a proper subset of V1 (cf. (12c)).

(12) a. V4 = V3 (i.e., {…} = {…})
  b. V3 ⊂ V2 (i.e., {…} ⊂ {NOM, …})
  c. V2 ⊂ V1 (i.e., {NOM, …} ⊂ {NOM, Q, … })

The relations of identity between sets in (12a), and proper inclusion in (12b) and (12c) may be unified as a single type of relation: (improper) inclusion. That is because (i) for every set A identical to a set B, A is a subset of B (i.e., if A = B, then A ⊆ B), and (ii) for every set A that is a proper subset of a set B, A is a subset of B (i.e., if A ⊂ B, then A ⊆ B). In other words, the values of the copies of who in (11) are related through inclusion, i.e., the values of who3 contain the values of who4 (cf. (13a)); the values of who2 contain the values of who3 (cf. (13b)); and the values of who1 contain the values of who2 (cf. (13c)).

(13) a. V4 ⊆ V3 (i.e., {…} ⊆ {…})
  b. V3 ⊆ V2 (i.e., {…} ⊆ {NOM, …})
  c. V2 ⊆ V1 (i.e., {NOM, …} ⊆ {NOM, Q, …})

Given my assumptions, such an inclusion relation will arise systematically for every new copy of a constituent. Hence, it may be exploited to define Non-Distinctiveness. Call this definition Inclusion-based Sameness, or Inclusion-S, for short.

(14) Inclusion-S
  A constituent β is non-distinct from a constituent α if for every value of β there is an identical value in α.

The definition in (14) is a more formal version of a very intuitive idea: if the features of α contain the morphosyntactic information encoded in the features of β, then β is indistinguishable from (part of) α. Therefore, Inclusion-S involves an asymmetric comparison between two constituents, in which one of them may be underspecified with respect to the other.

Notice that (14) does not introduce any specification of the structural conditions that two constituents must comply to be evaluated as non-distinct. This is undesirable on both empirical and conceptual grounds. In principle, there are two requirements that seem almost ineludible for any two elements forming a movement dependency: (i) they must be in a c-command relation, and (ii) they must be local. By adopting a locality constraint based on Relativized Minimality (Rizzi 1990), the conditions in (15) follow.5

(15) Two constituents α and β are part of the same chain if
  a. α c-commands β,
  b. β is non-distinct from α (by Inclusion-S),
  c. there is no δ between α and β such as (i) β is non-distinct from δ, or (ii) δ is non-distinct from α.

These conditions define chain links, i.e., they allow recognizing two consecutive members of a movement dependency. Chains of more than two members are obtained by transitivity. That is, if the constituents α and β comply with the conditions in (15), and β and γ also comply with (15), then α, β and γ should be part of the same chain.6

A demonstration of the functioning of (15) is in order. Consider once again the passive sentence in (1), repeated for convenience in (16). In this example, both copies of Cosmo must form a single chain. As shown in (16c), Cosmo1 has a valued Case feature <κ,NOM> while Cosmo2 carries its unvalued counterpart <κ,∅>.

(16) a. Cosmo was arrested.
  b. [TP Cosmo1 [T’ was [VP arrested Cosmo2]]].
  c. [TP Cosmo1{<κ,NOM>, …} … [VP … Cosmo2{<κ,∅>, …}]]

First, Cosmo1 c-commands Cosmo2, so they comply with (15a). Second, (15b) requires these constituents to be non-distinct according to Inclusion-S. The set of values of Cosmo2, i.e., {…}, is a subset of the set of values of Cosmo1, i.e., {NOM, …}, so this requirement is also satisfied. Finally, there is no intervener δ between Cosmo1 and Cosmo2 in the sense of (15c). Given that the three conditions in (15) are satisfied, the copies of Cosmo in (16) form the chain CH = {Cosmo1, Cosmo2}, which is the wanted result.

The next example is the active sentence in (17). It has three occurrences of the constituent Cosmo: Cosmo3 receives accusative Case in its base position,7Cosmo2 occupies a thematic position and lacks Case, while Cosmo1 is a copy of Cosmo2 generated through A-movement that receives nominative Case in Spec,T.

(17) a. Cosmo arrested Cosmo.
  b. [TP Cosmo1 [T’ T [VPCosmo2 [V’ arrested Cosmo3]]]]
  c. [TP Cosmo1{<κ,NOM>, …} … [VP Cosmo2{<κ,∅>, …} … [V’ … Cosmo3{<κ,ACC>, …}]]]

Here, (i) Cosmo1 c-commands Cosmo2, (ii) the set of values of Cosmo2 is a subset of the set of values of Cosmo1 (i.e., {…} ⊆ {NOM, …}), and (iii) there are no interveners between them, thus Cosmo1 and Cosmo2 form the chain CH1 = {Cosmo1, Cosmo2}. Cosmo3 cannot be considered non-distinct from the elements in CH1 since it carries a value ACC that is absent in both Cosmo1 and Cosmo2 (e.g., {ACC, …} ⊄ {…}) Therefore, Cosmo3 forms a trivial chain of its own CH2 = {Cosmo3}. Again, this is the wanted result.

A more complex case is posed by the sentence in (18a). It contains four occurrences of the constituent Cosmo (cf. (18b)). In the embedded clause, Cosmo4 is externally merged in the internal argument position, and Cosmo3 is generated from it through A-movement; in the matrix clause, Cosmo2 occupies the external argument position, while Cosmo1 is generated from Cosmo2 through A-movement.

(18) a. Cosmo said that Cosmo was arrested.
  b. [TP Cosmo1 [T’ T [VPCosmo2 [V’ said that [TP Cosmo3 [T’ was [VP arrested Cosmo4]]]]]].
  c. [TP Cosmo1{<κ,NOM>, …} … [VP Cosmo2{<κ,∅>, …} … [TP Cosmo3{<κ,NOM>, …} … [VP … Cosmo4{<κ,∅>, …}]]]]

This sentence contains the chains CH1 = {Cosmo1, Cosmo2} and CH2 = {Cosmo3, Cosmo4}, both formed in a similar way to the one in (16). Notice that Inclusion-S would make some erroneous predictions in this case if not combined with a locality constraint like (15c). For example, the set of values of Cosmo3 is a subset of the set of values of Cosmo1 (i.e., {NOM, …} ⊆ {NOM, …}), so the chain *CH = {Cosmo1, Cosmo3} would be, in principle, incorrectly predicted. This unwanted result is avoided by assuming that the calculus of Inclusion-S obeys Relativized Minimality. In other words, Cosmo3 cannot be considered non-distinct from Cosmo1 because there is an element being non-distinct from Cosmo1 in the way, i.e., Cosmo2. The same kind of intervention prevents forming the chain *CH = {Cosmo2, Cosmo4}: even though the values of Cosmo2 contain the values of Cosmo4 (i.e., {…} ⊆ {…}), Cosmo4 cannot be considered non-distinct from Cosmo2 since Cosmo3 acts as an intervener.

The last example has two A’-dependencies, one in the main clause and the other in the embedded clause, which involve six occurrences of the wh-pronoun who: who3 and who6 carry both unvalued Case <κ,∅> and left-peripheral <ω,∅> features; who2 and who5 are copies generated through A-movement; who1 and who4 are copies generated through A’-movement.

(19) a. Who said who was arrested?
  b. [CP Who1 [C’ C [TPwho2 [T’ T [VPwho3 [VP said [CP who4 [C’ C [TPwho5 [T’ was [VP arrested who6]]]]]]]]]]]]
  c. [CP Who1{<κ,NOM>,<ω,Q>, …} … [TP who2{<κ,NOM>,<ω,∅>, …} … [VP who3{<κ,∅>,<ω,∅>, …} … [CP who4{<κ,NOM>,<ω,Q>, …} … [TP who5{<κ,NOM>,<ω,∅>, …} … [VP … who6{<κ,∅>,<ω,∅>, …}]]]]]]

This sentence contains two chains CH1 = {who1, who2, who3} and CH2 = {who4, who5, who6}. As already pointed out, the conditions in (15) calculate chain links of two elements, so chains of more than two members must be formed by transitivity. In the case at hand, who1 cannot form a chain directly with who3 since who2 intervenes between them. However, given that who1 and who2 comply with the conditions in (15), and who2 and who3 do as well, the chain CH1 is formed. The same applies for the Non-Distinctiveness relation between who4 and who6: it is mediated by who5.

There is still one important aspect of this system that should be discussed before exploring further empirical consequences of adopting it. Inclusion-S (cf. (14)) and its associated set of conditions (cf. (15)) rely on a representational characterization of chains.

(20) Representational characterization of chains (Rizzi 1986: 66)
  Chains are read off from S-structures (and/or other syntactic levels), hence chain formation is a mechanism independent from “move α”, and in principle chains do not necessarily reflect derivational properties.

Adopting this characterization has three main consequences in a minimalist and copy-based theoretical setting. First, given that (20) states that narrow syntactic operations (e.g., Copy, Merge) and chain formation apply independently at distinct computational cycles, it becomes necessary advancing an algorithm of chain recognition that makes no use of narrow syntactic devices but exploits representational properties of phrase markers (e.g., features, geometrical relations between nodes). I take that Inclusion-S in (14) together with the conditions in (15) offer such an algorithm.

Second, given that chains are supposed to be computed over a representation, and there are no levels of representation other than the interface levels (Chomsky 1993), it follows that chains must be computed at the interfaces. Furthermore, since there is no direct link between PF and LF, then chains must be calculated independently and in parallel at both interfaces. According to this, notions such as Non-Distinctiveness or chain should be regarded as exclusive of the grammatical processes that take place at PF and LF.8 This result allows capturing Chomsky’s (2001) observation that no narrow syntactic mechanism seems to employ chains.

Third, if chains are read off from syntactic representations, then there is no need to define them as linguistic objects existing separately from a phrase marker. That is, chains are nothing more than an abstract relation holding between some nodes in a syntactic structure, a relation that ultimately denotes a set CH. Therefore, the conditions in (15) must be understood as an intensional definition for this set.

3 Indexical-S and Inclusion-S are not equivalent

As already discussed, the Inclusion-S system computes chains over both interface representations independently and in parallel; such a calculus is based only on information encoded in the phrase marker. Since the Copy operation determines only indirectly the form of chains, there may be “mismatches” on how narrow syntax, PF and LF process movement dependencies. For example, it could be the case that narrow syntax generates a set of copies that is not recognized as a chain in one of the interfaces. Conversely, it could also happen that two non-transformationally related constituents comply with Inclusion-S at the interfaces and, therefore, form a chain.

Scenarios like these are not expected under Indexical-S. This marking mechanism establishes a univocal connection between the Copy operation and chain formation during the derivational procedure itself by assigning indexes to the copies. In other words, transformational procedures and chains are inexorably isomorphic under Indexical-S.

This section argues that the “mismatches” predicted by the Inclusion-S system do occur. For conciseness, the discussion focuses on three phenomena that have been accounted for within Copy Theory: (i) Nunes’ (2004) treatment of wh-copying, (ii) Barbiers et al.’s (2010) analysis of non-identical wh-doubling in Dutch, and (iii) Takahashi & Hulsey’s (2009) extension of Lebeaux’s (1988) ideas on anti-reconstruction. These three proposals adopt (implicitly or explicitly) Indexical-S. It is shown that each proposal can be either empirically or conceptually improved by adopting the Inclusion-S system.

3.1 Uniqueness and its apparent exceptions

As mentioned in the introduction, Copy Theory explains the displacement property of language by assuming that only one member of a chain receives pronunciation. Call this general property of chains Uniqueness.9

(21) Uniqueness
  Given a chain CH, only one member of CH is pronounced.

In a Late Insertion model as the one adopted here, Uniqueness may be regarded as the result of a natural tension between (i) economy-related considerations on the application of Vocabulary Insertion and (ii) the general conditions governing the recoverability of information. That is, the most economical way of pronouncing a chain is applying Vocabulary Insertion to only one of its members.10

Even though Uniqueness states a crucial property of movement dependencies under Copy Theory, it is usually regarded as a false generalization. This is because some constructions exhibit more than one overt copy. One of these cases is wh-copying, a phenomenon that has been attested in German, Hindi, Romani, and other languages.11 Sentences involving wh-copying as (22) contain more than one overt occurrence of the same wh-pronoun, despite the fact that they seem to have the same meaning as a regular long distance wh-question (cf. (23)).

    1. (22)
    1. German (McDaniel 1986: 183)
    1. Wen
    2. who
    1. glaubt
    2. thinks
    1. Hans
    2. Hans
    1. wen
    2. who
    1. Jakob
    2. Jakob
    1. gesehen
    2. seen
    1. hat?
    2. has
    1. ‘Who does Hans think Jakob saw?’
    1. (23)
    1. German
    1. Wen
    2. who
    1. glaubt
    2. thinks
    1. Hans
    2. Hans
    1. dass
    2. that
    1. Jakob
    2. Jakob
    1. gesehen
    2. seen
    1. hat?
    2. has
    1. ‘Who does Hans think Jakob saw?’

Given the semantic similarity between these sentences, it has become standard to assume that they have the same syntactic structure. In particular, the wh-copying pattern is typically analyzed as involving the overt realization of a copy of the wh-pronoun that has been generated through successive cyclic movement and occupies the specifier position of an embedded complementizer. The relevant representation for the sentence in (22) is sketched in (24).

(24) [CP Weni CINT glaubt Hans [CP weni CDECL Jakob [VPweni gesehen] hat]]

Under Indexical-S (cf. (3)), the three copies of the wh-pronoun wen must necessarily form a single chain CH = {weni, weni, weni}. Given that two members of this chain receive pronunciation, wh-copying must be analyzed as an exception to the Uniqueness property.

Nunes (2004) offers a general account of cases involving pronunciation of more than one copy per chain. His system incorporates the following three main assumptions.

(25) Nunes’ (2004) assumptions
  a. Chain Reduction (i.e., the operation deleting chain members at PF) is costly.
  b. Chain Reduction applies until the structure is linearizable according to the Linear Correspondence Axiom (LCA) of Kayne (1994). A structure is non-linearizable if the LCA computes two or more non-distinct constituents.
  c. The LCA “cannot see” inside words.

According to Nunes, whenever there is a case of multiple copy pronunciation it is because one of the copies has been morphologically reanalyzed as part of a bigger word through an application of Fusion (Halle and Marantz 1993),12 an operation that combines two terminal nodes into one. For convenience, I adopt Embick’s (2010) definition of this operation, which explicitly states that the features of two syntactic terminals merge into a single set.

(26) Fusion (Embick 2010: 78)
  [x α] [y β] ➔ [x/y α,β]
  where α and β are features of X and Y.

Regarding a structure like (24), Nunes proposes that the intermediate copy of wen and the embedded complementizer CDECL undergo Fusion and form a single terminal node [wen+CDECL]. Given that Chain Reduction applies to comply with the LCA (cf. (25b)), and the LCA “does not see” the internal structure of words (cf. (25c)), Chain Reduction is not required to delete the copy of wen inside [wen+CDECL]. Therefore, only the lowest copy of wen undergoes Chain Reduction, and the doubling pattern is obtained.

The reanalysis-based account of wh-copying allows deriving two defining properties of the construction. First, given that (by assumption) this morphological reanalysis always affects an embedded complementizer, it follows that a pronoun in its base position cannot be spelled-out in wh-copying constructions. This prediction is borne out. Consider the sentences in (27) and (28). The unacceptability of (28) is due to the presence of an overt occurrence of wen within the VP of the embedded clause.

    1. (27)
    1. German (Fanselow & Mahajan 2000: 219)
    1. Wen
    2. who
    1. denkst
    2. think
    1. Du
    2. you
    1. wen
    2. who
    1. sie
    2. she
    1. meint
    2. believes
    1. wen
    2. who
    1. Harald
    2. Harald
    1. liebt?
    2. loves
    1. ‘Who do you think that she believes that Harald loves?’
    1. (28)
    1. German (Nunes 2004: 39)
    1. *Wen
    2.   who
    1. glaubt
    2. thinks
    1. Hans
    2. Hans
    1. wen
    2. who
    1. Jakob
    2. Jakob
    1. wen
    2. who
    1. gesehen
    2. seen
    1. hat?
    2. has
    1. ‘Who does Hans think Jakob saw?’

Second, given that the morphological reanalysis is based on an application of Fusion, a PF operation targeting terminal nodes, it follows that there cannot be cases of multiple copy pronunciation involving full wh-phrases. This prediction also seems to be true. In (29), for example, the wh-phrase welchen Mann ‘which man’ cannot be repeated.

    1. (29)
    1. German (Fanselow & Mahajan 2000: 220)
    1. *Welchen
    2.   which
    1. Mann
    2. man
    1. glaubst
    2. believe
    1. Du
    2. you
    1. welchen
    2. which
    1. Mann
    2. man
    1. sie
    2. she
    1. liebt?
    2. loves
    1. ‘Which man do you believe that she loves?’

The account of wh-copying based on morphological reanalysis can be straightforwardly implemented under the Inclusion-S system with two conceptual advantages: (i) there is no need to treat the phenomenon as an exception to the Uniqueness principle in (21), and (ii) the explanation does not rely on any specific theory of linearization (e.g., the LCA). Consider once again the representation in (24), repeated for convenience in (30a), this time including the featural content of the occurrences of wen (cf. (30b)). Basically, wen3 receives accusative Case in-situ, wen2 is a copy of wen3 generated through successive cyclic movement, and wen1 values its ω-feature.

(30) a. [CP Wen1 CINT glaubt Hans [CP wen2 CDECL Jakob [VPwen3 gesehen] hat]]
  b. [CP Wen1{<κ,ACC>,<ω,Q>, …} … [CP wen2{<κ,ACC>,<ω,>, …} … [VP wen3{<κ,ACC>,<ω,∅>, …}…]]]

Narrow syntax generates this phrase marker and delivers it to the interfaces for interpretation. At LF, Inclusion-S generates the chain CHLF = {wen1, wen2, wen3} since (i) wen1 c-commands wen2, and wen2 c-commands wen3, (ii) the set of values of wen1 contains the values of wen2 (i.e., {ACC, …} ⊆ {ACC, Q, …}), and the values of wen2 contain the values of wen3 (i.e., {ACC, …} ⊆ {ACC, …}), and (iii) there are no potential interveners. The chain CHLF determines that this movement dependency is semantically interpreted as regular long distance wh-movement.

Nevertheless, something different happens at PF. There, wen2 and the embedded complementizer are reanalyzed as a single terminal node through an application of Fusion. According to (26), Fusion combines the features of wen2 and its neighbor complementizer CDECL into a single syntactic terminal [wen2+CDECL]. Suppose that CDECL has, at least, a categorial value {C}. Therefore, the new node carries the Case and left-peripheral features corresponding to wen2 and the value corresponding to CDECL.13

(31) PF representation after Fusion
  [CP Wen1{<κ,ACC>,<ω,Q>, …} … [CP [wen2+CDECL]{<κ,ACC>,<ω,∅>, C, …} … [VP wen3{<κ,ACC>,<ω,∅>, …}…]]]

According to Inclusion-S, the copies of wen in this representation should form two distinct chains at PF. Since the values of [wen2+CDECL] contain the values of wen3 (i.e., {ACC, …} ⊆ {ACC, C, …}), these elements form the chain CHPF2 = {[wen2+CDECL], wen3}. However, the set of values of wen1 does not contain the values of [wen2+CDECL] (i.e., {ACC, C, …} ⊄ {ACC, Q, …}), therefore wen1 forms a chain of its own CHPF1 = {wen1}. Given that wen1 and [wen2+CDECL] are heads of distinct chains at PF, both elements receive pronunciation independently, in accordance with Uniqueness in (21).

The account of wh-copying based on Inclusion-S offers an additional empirical advantage over Nunes’ (2004) assumptions in (25).14 Consider a case where a head Y carrying a feature β moves to a head X carrying a feature α. The resulting structure contains two copies of Y, i.e., Y2 in the original base position and Y1 adjoined to X.

(32) [XP [ Y1{β} X{α}] [YP Y2{β} … ]]

In a standard case of head movement, Y2 should not be pronounced. This follows in both proposals from forming the chain CH = {Y1, Y2) and pronouncing only the head of the chain, as usual.15

Suppose that Fusion applies to Y1 and X after head movement. This would form a node [Y1+X] that carries both features α and β.

(33) [XP [Y1+X]{α, β} [YP Y2{β} … ]]

According to Nunes’ system, this sequence of operations should yield the pronunciation of both members of the chain CH = {Y1, Y2}. That is, if Y1 undergoes Fusion, it becomes inaccessible to the LCA. Therefore, the LCA only “sees” the other member of the chain, i.e., Y2. At this point, there is no need to apply Chain Reduction, as the structure is already linearizable.

Under Inclusion-S, Y2 should remain silent. This is due to the fact that [Y1+X] and Y2 form the chain CHPF = {[Y1+X], Y2} since the set of values of Y2 is a subset of the set of values of [Y1+X] (i.e., {β} ⊆ {α, β}), and, as usual, only the head of the chain receives pronunciation.

This second pattern is the one attested in the literature for every scenario in which head movement feeds Fusion. For instance, Julien (2002) sketches an analysis on these lines for fused markers of polarity and tense in Bambara.

    1. (34)
    1. Bambara (Kastenholz 1989; as cited in Julien 2002: 307)
    1.  
    1. a.
    1. dúnan
    2. guest
    1. AFF.PAST
    1. dᴐlᴐ
    2. millet.beer
    1. mìn
    2. drink
    1. ‘the guest drank millet beer.’
    1.  
    1. b.
    1. ń
    2. my
    1. mùso
    2. wife
    1. NEG.PAST
    1. jɛgɛ
    2. fish
    1. fèere
    2. buy
    1. ‘my wife did not buy fish’

If tense and polarity are distinct heads, a transformational derivation must have caused them to end up in a single syntactic terminal. This derivation is the one already sketched in (32) and (33), i.e., the Polarity head moves to Tense, and Fusion combines them. As (34) shows, such a derivation is supposed to proceed in exactly the same way as predicted by Inclusion-S, i.e., the lowest occurrence of the Polarity head must remain silent. As already discussed, this pattern does not follow from Nunes’ system since it predicts multiple copy pronunciation every time a moved element undergoes Fusion.

To sum up, it has been shown that an account of wh-copying based on morphological reanalysis does not require positing any additional assumptions under Inclusion-S. That is, applying Fusion on an intermediate copy entails the formation of more than one chain at PF. Moreover, unlike Nunes’ (2004) system, Inclusion-S predicts the correct pattern of chain pronunciation in cases of head movement feeding Fusion.

3.2 Partial Copying and chain formation

While there is certain consensus that the proper analysis of wh-copying constructions involves a movement dependency in narrow syntax, there is a much bigger controversy around the pattern exemplified in (35). The obvious difference between this pattern and the one discussed in the previous section is that in this case both wh-pronouns are not the same.

    1. (35)
    1. German (Fanselow & Mahajan 2000: 196)
    1. Was
    2. what
    1. denkst
    2. think
    1. Du
    2. you
    1. wen
    2. who
    1. sie
    2. she
    1. gesehen
    2. seen
    1. hat?
    2. has
    1. ‘Who do you think that she has seen?’

This phenomenon is known as wh-scope marking or what-construction. However, I adopt Barbiers et al.’s (2010) terminology and refer to it as non-identical wh-doubling.

There are two main types of analysis for this phenomenon. The first one postulates that there is a direct dependency between both wh-elements. According to McDaniel (1989), the wh-element was ‘what’ is an expletive associated to the wh-pronoun wen ‘who’. At LF, the wh-pronoun replaces the expletive, so the resulting semantic representation is identical to one involving regular long distance wh-movement.

According to the second type of analysis, the wh-element in the matrix clause is related to the whole embedded CP, so there is only an indirect dependency between both wh-elements. Dayal (1994; 2000) proposes that was is a wh-pronoun functioning as the object of the matrix verb, while the embedded CP is an adjunct. Since was is a clausal pronoun, it can refer to the whole embedded CP, which allows explaining the semantics of the construction.

Consider now the patterns of non-identical wh-doubling in Dutch varieties reported by Barbiers et al. (2010). These sentences show three different pronouns participating in the construction: (i) the neuter pronoun wat ‘what’, (ii) non-neuter pronoun wie ‘who’, and (iii) the relative pronoun die.

    1. (36)
    1. Dutch, dialect from Overijssel (Barbiers et al. 2010: 2)
    1. Wat
    2. what
    1. denk
    2. think
    1. je
    2. you
    1. wie
    2. who
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have
    1. ‘Who do you think I saw?’
    1. (37)
    1. Dutch, dialect from North-Holland (Barbiers et al. 2010: 2)
    1. Wie
    2. who
    1. denk
    2. think
    1. je
    2. you
    1. die
    2. REL.PRON
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have
    1. ‘Who do you think I saw?’
    1. (38)
    1. Dutch, dialect from Overijssel (Barbiers et al. 2010: 2)
    1. Wat
    2. what
    1. denk
    2. think
    1. je
    2. you
    1. die
    2. REL.PRON
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have
    1. ‘Who do you think I saw?’

The sentences in (36), (37) and (38) display the orders in which the pronouns can appear in these constructions, i.e., wat must precede wie or die, and wie must precede die. As shown in (39), any other order is ruled out.

    1. (39)
    1. a.
    1. *Wie
    2.   who
    1. denk
    2. think
    1. je
    2. you
    1. wat
    2. what
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have
    1.  
    1. b.
    1. *Die
    2.   REL.PRON
    1. denk
    2. think
    1. je
    2. you
    1. wie
    2. who
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have
    1.  
    1. c.
    1. *Die
    2.   REL.PRON
    1. denk
    2. think
    1. je
    2. you
    1. wat
    2. what
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have

What is particularly interesting about these data is that offering a unified account for them in terms of any of the theoretical alternatives in the literature seems quite difficult. In principle, positing that the left-peripheral wh-pronouns in (36), (37), and (38) are expletives, in line with McDaniel (1989), does not seem to constitute a satisfactory analysis. If both was ‘what’ and wie ‘who’ are expletives, the restriction on their distribution remains unexplained, i.e., why would wat co-appear with wie (cf. (36)) and die (cf. (38)) while wie can co-appear only with die (cf. (37))? On the other hand, it does not seem possible either to posit that a wh-pronoun like wie ‘who’ in (37) refers to a clause, in line with Dayal (1994; 2000).

Barbiers et al. (2010) advance an account of these puzzling patterns based on two main ingredients. First, they provide an analysis of the pronouns wat, wie and die according to which they are layers of a nominal structure. Since wat can appear in a number of syntactic contexts, the authors analyze it as a very impoverished pronominal form corresponding to the most embedded layer in the structure. It is basically an indefinite numeral that carries a quantificational Q-feature. The pronoun wie is assumed to contain the properties of wat (i.e., the Q-feature) plus φ-features. Finally, die contains the properties of wie (i.e., the Q-feature and φ-features) plus a definiteness D-feature. The structure is sketched in (40).

(40) [dieD [wie φ [wat Q]]]

The second ingredient of the analysis is an operation that allows creating movement dependencies in which only a subpart of a constituent moves. Barbiers et al. (2010) call this operation Partial Copying. Consider as an example the derivation in (41). In (41a), the structure K contains an occurrence of the pronoun die; in (41b), Partial Copying targets the intermediate layer of this pronoun and creates a new copy of it, which corresponds to the pronoun wie; finally, wie merges into the structure K in (41c).

(41) a. K = [XP X … [YP … [dieD [wie φ [wat Q]]] … ]]
  b. K = [XP X … [YP … [dieD [wie φ [wat Q]]] … ]]
    L = [wie φ [wat Q]]
  c. K = [XP [wie φ [wat Q]] [X’ X … [YP … [dieD [wie φ [wat Q]]] … ]]]

For convenience, I reformulate the analysis in (40) and the Partial Copying operation in (41) in more traditional terms. That is, I propose that the pronouns wat, wie and die can be analyzed as sets of features as in (42). Activity features as <κ,∅> or <ω,∅> are omitted for simplicity.

(42) a. wat = {Q, …}
  b. wie = {Q, φ, …}
  c. die = {Q, φ, D, …}

Accordingly, Partial Copying should be understood as an instance of the Copy operation targeting a proper subset of the features of a constituent, i.e., feature movement in the sense of Hiemstra (1986), Cheng (2000) and Sabel (2000).16 In these terms, a derivation like (41) involves (i) a structure K containing a node with the set of features {Q, φ, D, …} corresponding to die as in (43a), (ii) copying from this node the subset of features {Q, φ, …} as in (43b), and (iii) merging this newly generated set into the main structure K as in (43c).

(43) a. K = [XP X … [YP … {Q, φ, D, …} … ]]
  b. K = [XP X … [YP … {Q, φ, D, …} … ]]
    L = {Q, φ, …}
  c. [XP {Q, φ, …} [X’ X … [YP … {Q, φ, D, …} … ]]]

Given that the sets of features {Q, φ, …} and {Q, φ, D, …} satisfy the Vocabulary Insertion rules of wie and die, respectively, the relevant nodes are spelled-out as sketched in (44).

(44) [XP wie [X’ X … [YP … die … ]]]

Consider now the analysis of (36). The relevant movement dependency involves two steps. First, the wh-pronoun wie occupying the complement position of the embedded verb moves through successive cyclic movement to embedded Spec,C. Then, Partial Copying applies to the highest occurrence of wie, moving the set of features corresponding to the wh-pronoun wat to the left periphery of the sentence.

(45) Analysis of (36) in terms of Partial Copying
  [CP Wat1{Q, …} CINT denk je [CP wie2{Q, φ, …} CDECL ik [VPwie3{Q, φ, …} gezien] heb]]

The same kind of derivation applies to the sentences in (37) and (38), as represented in (46) and (47), respectively. In (46), the wh-pronoun wie is generated by applying Partial Copying to the copy of die in the periphery of the embedded clause. In (47), Partial Copying generates the pronoun wat from the features of die. The only difference between these two constructions is that φ-features are not copied in the latter case.

(46) Analysis of (37) in terms of Partial Copying
  [CP Wie1{Q, φ, …} CINT denk je [CP die2{Q, φ, D, …} CDECL ik [VPdie3{Q, φ, D, …} gezien] heb]]
(47) Analysis of (38) in terms of Partial Copying
  [CP Wat1{Q, …} CINT denk je [CP die2{Q, φ, D, …} [C’ CDECL ik [VPdie3{Q, φ, D, …} gezien] heb]]

This type of analysis allows deriving the restrictions on the distribution of wh-pronouns shown in (39). The unacceptable patterns involve a richer pronoun in a higher position that could not have been generated by copying features from the previous one. Since these representations cannot be derived by applying copy operations, they are predicted to be ungrammatical.

(48) a. *[CP wie{Q, φ, …} … [CP wat{Q, …} …]] (cf. (39a))
  b. *[CP die{Q, φ, D, …} … [CP wie{Q, φ, …} …]] (cf. (39b))
  c. *[CP die{Q, φ, D, …} … [CP wat{Q, …} …]] (cf. (39c))

While Barbiers et al. (2010) offer an elegant account of the distribution of wh-pronouns in non-identical wh-doubling constructions, their proposal has some flaws that result from their implicit adoption of Indexical-S. According to Indexical-S, partial copies should form chains with their original counterparts. Therefore, despite being distinct lexical items because of the features they carry, the wh-pronouns in (36), (37) and (38) should be regarded as “wh-elements belonging to a single chain that is established in overt syntax” (Barbiers et al. 2010: 25). The chains corresponding to these three sentences are sketched in (49), (50) and (51), respectively.

(49) a. [CP Wati … [CP wiei … [VPwiei …]]] (cf. (36))
  b. CH = {wati, wiei, wiei}  
(50) a. [CP Wiei … [CP diei … [VPdiei …]]] (cf. (37))
  b. CH = {wiei, diei, diei}  
(51) a. [CP Wati … [CP diei … [VPdiei …]]] (cf. (38))
  b. CH = {wati, diei, diei}  

Under standard assumptions, these chains should behave exactly in the same way as any chain CH = {XPi, XPi, XPi} consisting of three occurrences of the same constituent. This prediction is not borne out. As discussed, chains are supposed to comply with the Uniqueness property in (21) while, on the contrary, non-identical wh-doubling involves pronouncing two wh-elements. To derive this pattern from a single chain, Barbiers et al. adopt Nunes’ (2004) account of multiple copy pronunciation (cf. (25)). That is, they propose that the patterns of chain pronunciation in (36), (37) and (38) follow from an intermediate wh-pronoun being reanalyzed as part of an embedded complementizer.

In principle, this solution seems attractive as Dutch also displays wh-copying patterns.

    1. (52)
    1. Dutch, dialect from Drenthe (Barbiers et al. 2010: 2)
    1. Wie
    2. who
    1. denk
    2. think
    1. je
    2. you
    1. wie
    2. who
    1. ik
    2. I
    1. gezien
    2. seen
    1. heb?
    2. have
    1. ‘Who do you think I have seen?’

However, there is an additional property distinguishing non-identical wh-doubling from a regular wh-movement dependency. Moving a wh-pronoun across negation is perfectly possible in Dutch (cf. (53)), while non-identical wh-doubling in the same context is unacceptable (cf. (54)). This asymmetry is unexpected if both sentences involve a non-trivial chain connecting a thematic position in the embedded clause with the specifier position of the matrix clause.

    1. (53)
    1. Dutch (Barbiers et al. 2010: 40)
    1. Wie
    2. who
    1. denk
    2. think
    1. je
    2. you
    1. niet
    2. not
    1. dat
    2. that
    1. zij
    2. she
    1. uitgenodigd
    2. invited
    1. heeft?
    2. has
    1. ‘Who don’t you think she has invited?’
    1. (54)
    1. Dutch (Barbiers et al. 2010: 40)
    1. *Wat
    2.   what
    1. denk
    2. think
    1. je
    2. you
    1. niet
    2. not
    1. wie
    2. who
    1. zij
    2. her
    1. uitgenodigd
    2. invited
    1. heeft?
    2. has
    1. ‘Who don’t you think she has invited?’

Barbiers et al. do not offer an explanation for the contrast between (53) and (54). Instead, they point out that negation also creates an intervention effect in wh-copying constructions in both Dutch (cf. (55)) and German (cf. (56)). As discussed in the previous section, the standard assumption is that wh-copying is a phonological variant of regular long distance wh-movement. Therefore, negation is not supposed to produce any effect in these constructions.

    1. (55)
    1. Dutch (Barbiers et al. 2010: 40)
    1. *Wie
    2.   who
    1. denk
    2. think
    1. je
    2. you
    1. niet
    2. not
    1. wie
    2. who
    1. zij
    2. she
    1. uitgenodigd
    2. invited
    1. heeft?
    2. has
    1. ‘Who don’t you think she has invited?’
    1. (56)
    1. German (Rett 2006: 359)
    1. *Wen
    2.   who
    1. glaubst
    2. think
    1. du
    2. you
    1. nicht
    2. not
    1. wen
    2. who
    1. sie
    2. she
    1. liebt?
    2. loves
    1. ‘Who don’t you think she loves?’

Since there seems to be no unified analysis allowing to explain negative intervention effects in all these cases, the authors conclude that the contrast between (53) and (54) is not evidence enough to reject an account of non-identical wh-doubling according to which both overt wh-pronouns are members of the same chain.

It must be noticed, however, that many German speakers accept without problem sentences like (57), where wh-copying across negation is attested.

    1. (57)
    1. German (Pankau 2014: 17)
    1. Wen
    2. who
    1. glaubst
    2. think
    1. du
    2. you
    1. nicht
    2. not
    1. wen
    2. who
    1. sie
    2. she
    1. gesehen
    2. seen
    1. hat?
    2. has
    1. ‘Who don’t you think she has seen?’

The only way to account for the otherwise contradictory contrast between (56) and (57) is assuming that there are two alternative derivations allowing to generate wh-copying patterns, one that is sensitive to negative intervention and one that is not.

I propose that the derivation that is not sensitive to negative intervention is the one discussed in the previous section, i.e., these are regular wh-movement dependencies in which an intermediate copy is morphologically reanalyzed as part of an embedded complementizer at PF.

On the other hand, I argue that non-identical wh-doubling constructions and cases of wh-copying that are sensitive to negative intervention can be analyzed in a unified way by combining (i) a derivation based on Partial Copying and (ii) the Inclusion-S system. As already discussed, Partial Copying allows explaining in an elegant way the distribution of wh-pronouns in non-identical wh-doubling constructions. However, under Indexical-S, the operation does not derive straightforwardly (i) the fact that two members of the same chain receive pronunciation and (ii) the negative intervention effect. I contend that these properties find a principled explanation under Inclusion-S.

Consider again the structure in (45), which corresponds to the sentence in (36). This representation is generated in narrow syntax and delivered to the interfaces, where chain formation is calculated according to Inclusion-S in (14) and its associated conditions in (15). At PF, wie2 and wie3 form a chain CHPF2 = {wie2, wie3} as they carry the same features (i.e., {Q, φ, …} ⊆ {Q, φ, …}); however, since the features of wie2 are not a subset of the features of wat1 (i.e., {Q, φ, …} ⊄ {Q, …}), wat1 forms a trivial chain of its own CHPF1 = {wat1}. At LF, chain formation works in exactly the same way, i.e., wie2 and wie3 form a chain CHLF2 = {wie2, wie3} while wat1 forms the trivial chain CHLF1 = {wat1}. These results are summarized in (58).

(58) Chain formation at the interfaces according to Inclusion-S (cf. (36))
  a. [CP wat1{Q, …} … [CP wie2{Q, φ, …} … [VP wie3{Q, φ, …} …]]]
  b. CHPF1 = {wat1}; CHPF2 = {wie2, wie3}
  c. CHLF1 = {wat1}; CHLF2 = {wie2, wie3}

In more explicit terms, Inclusion-S predicts that constituents that are transformationally related through Partial Copying must form distinct chains. That is, this system states that two constituents α and β are non-distinct if (i) α c-commands β and (ii) α contains the information encoded in β. However, Partial Copying systematically creates configurations where the features of the c-commanding element are contained in the lower constituent. Therefore, partial copies are always computed as distinct elements at the interfaces. This is attested once again in the derivations in (46) and (47), which correspond to the sentences in (37) and (38), respectively. In the former, the features of die2 are not a subset of the features of wie1 (i.e., {Q, φ, D…} ⊄ {Q, φ, …}), so they form two separate chains as shown in (59). In the latter, the features of die2 are not a subset of the features of wat1 (i.e., {Q, φ, D…} ⊄ {Q, …}), so they form two chains at each interface as shown in (60).

(59) Chain formation at the interfaces according to Inclusion-S (cf. (40))
  a. [CP wie1{Q, φ, …} … [CP die2{Q, φ, D, …} … [VP die3{Q, φ, D, …} …]]]
  b. CPF1 = {wie1}; CPF2 = {die2, die3}
  c. CLF1 = {wie1}; CLF2 = {die2, die3}
(60) Chain formation at the interfaces according to Inclusion-S (cf. (41))
  a. [CP wat1{Q, …} … [CP die2{Q, φ, D, …} … [VP die3{Q, φ, D, …} …]]]
  b. CHPF1 = {wat1}; CHPF2 = {die2, die3}
  c. CHLF1 = {wat1}; CHLF2 = {die2, die3}

Extending Partial Copying to capture wh-copying patterns is conceptually simple.17 Assume a feature {F} that is common to wat, wie and die. Assume also that the Vocabulary Insertion rules for these pronouns do not refer to {F}, i.e., their corresponding syntactic terminals receive the same phonological exponent no matter {F} is present or not. Now, consider the derivation in (61). First, the whole set of features {Q, φ, F, …} corresponding to the wh-pronoun wie moves from the complement position of the embedded verb to embedded Spec,C through successive cyclic movement. Then, Partial Copying applies to this set neglecting its {F} feature, so the set {Q, φ, …} is generated and merged into matrix Spec,C. Since this newly formed set of features also corresponds to the phonological exponent wie, a third occurrence of this pronoun is generated in the structure.

(61) Analysis of (52) in terms of Partial Copying
  [CP Wie1{Q, φ, …} CINT denk je [CP wie2{Q, φ, F, …} CDECL ik [VPwie3{Q, φ, F, …} gezien] heb]]

Once again, applying Partial Copying entails forming more than one chain at the interfaces, i.e., the features of wie2 are not a subset of the features of wie1 (i.e., {Q, φ, F…} ⊄ {Q, φ, …}).

(62) Chain formation at the interfaces according to Inclusion-S (cf. (52))
  a. [CP wie1{Q, φ, …} … [CP wie2{Q, φ, F, …} … [VP wie3{Q, φ, F, …} …]]]
  b. CHPF1 = {wie1}; CHPF2 = {wie2, wie3}
  c. CHLF1 = {wie1}; CHLF2 = {wie2, wie3}

Given that CHPF1 and CHPF2 in (58), (59), (60) and (62) are separate chains at PF, both of them should receive pronunciation independently according to Uniqueness (cf. (21)). That is, there is no need to assume that a morphological reanalysis operation applies in these cases. Instead, Partial Copying entails doubling under Inclusion-S.

Consider now the LF chains CHLF1 and CHLF2 in (58), (59), (60) and (62). In each of these cases, the trivial chain CHLF1 lacks a thematic interpretation, but its only member satisfies the formal requirements of the interrogative complementizer, i.e., it functions as an expletive. On the other hand, the chain CHLF2 does receive a θ-role as one of its members occupies a thematic position; however, none of the wh-elements pertaining to CHLF2 is in a spec-head configuration with CINT. Since there is no overt syntactic relation between CHLF2 and the interrogative C-domain, this chain must be interpreted by appealing to the mechanisms that allow licensing wh-in-situ.

As usually assumed (e.g., Lasnik & Saito 1992; Beck 1996; 2006; Pesetsky 2000; Kratzer & Shimoyama 2002; i.a.), in-situ wh-phrases in, for example, multiple wh-questions must be licensed by establishing a covert dependency with the interrogative C-domain.18 The German sentence in (63), for instance, is supposed to involve an abstract relation linking the wh-pronoun wo ‘where’ and the interrogative complementizer CINT that is represented in (64).

    1. (63)
    1. German (Beck 1996: 4)
    1. Wen
    2. whom
    1. hat
    2. has
    1. Luise
    2. Luise
    1. wo
    2. where
    1. gesehen?
    2. seen
    1. ‘Where did Luise see whom?’
(64) Wen CINT hat Luise wo gesehen?

In a similar fashion, the head of each of the wh-chains CHLF2 in (58), (59), (60) and (62) must be licensed by establishing a covert dependency with the interrogative complementizer in the matrix clause. In the representation in (65), for instance, this relation holds between CINT and wie2.

(65) [CP Wat1CINT denk je [CPwie2 ik wie3 gezien heb]]? (cf. (36))

As Beck (1996) points out, this type of dependency may be disrupted by an intervening negative element. As shown in (66), the presence of niet ‘not’ in between the interrogative complementizer CINT and wie2 prevents the licensing of the wh-chain CHLF2 = {wie2, wie3}.

(66) *[SC Wat1CINT denk je niet [SCwie2 zij wie3 uitgenodigd heeft]] (cf. (54))

The effect in (66) belongs to a natural class of phenomena together with instances of intervention triggered by negation and some other quantificational elements at LF. For instance, the multiple wh-question in (63) is acceptable as long as there is no negative element as niemand ‘nobody’ between wo ‘where’ and the left periphery of the sentence (cf. (67a)). Negation, however, does not disrupt overt movement, so wo can move over niemand as in (67b), yielding an acceptable result.

    1. (67)
    1. German (Beck 1996; 2006)
    1.  
    1. a.
    1. *Wen
    2.   whom
    1. hat
    2. has
    1. niemand
    2. nobody
    1. wo
    2. where
    1. gesehen?
    2. seen
    1. ‘Where did nobody see whom?’
    1.  
    1. b.
    1.   Wen
    2.   whom
    1. hat
    2. has
    1. wo
    2. where
    1. niemand
    2. nobody
    1. gesehen?
    2. seen
    1. ‘Where did nobody see whom?’

Similar intervention patterns are attested in many languages. For instance, French allows moving a wh-pronoun to the left periphery (cf. (68a)) or interpreting it in-situ (cf. (68b)).

    1. (68)
    1. French (Bošković 2000)
    1.  
    1. a.
    1. Qui
    2. whom
    1. as-tu
    2. have-you
    1. vu?
    2. seen
    1.  
    1. b.
    1. Tu
    2. you
    1. as
    2. have
    1. vu
    2. seen
    1. qui?
    2. whom
    1. ‘Whom have you seen?’

However, applying overt wh-movement seems to be the only available option if a negative element appears between the wh-pronoun and the left periphery.

    1. (69)
    1. French (Bošković 2000)
    1.  
    1. a.
    1.   Qui’est-ce
    2.   What
    1. que
    2. that
    1. Jean
    2. Jean
    1. ne
    2. NEG
    1. mange
    2. eats
    1. pas?
    2. not
    1.  
    1. b.
    1. *Jean
    2.   Jean
    1. ne
    2. NEG
    1. mange
    2. eats
    1. pas
    2. not
    1. quoi?
    2. what
    1. ‘What doesn’t John eat?’

Discussing potential accounts of LF intervention effects goes beyond the aims of this paper. My purpose is simply to show that adopting Inclusion-S together with Partial Copying yields configurations in which one of the resulting chains must be licensed through covert dependencies, and that independent phenomena for which these covert dependencies are originally postulated display the same type of negative intervention.

In sum, the distribution patterns of wh-pronouns in non-identical wh-doubling constructions in Dutch is elegantly explained by appealing to Partial Copying, as proposed by Barbiers et al. (2010). However, if Indexical-S is adopted (i) additional assumptions are required to explain the doubling pattern, and (ii) the asymmetry between non-identical wh-doubling and regular wh-movement regarding negative intervention remains mysterious. Under Inclusion-S, on the contrary, both traits follow straightforwardly from partial copies forming separate chains at the interfaces.

3.3 There is no need for Late Merger

Reconstruction has been an important source of evidence for Copy Theory. Assuming that movement involves two (or more) occurrences of the same constituent allows explaining the unacceptability of sentences like (70) as violations of Condition C.

(70) a. *Which argument that Cosmoi is a genius did hei believe?
  b. *[DP Which argument [CP that Cosmo1 is a genius]] did he1 believe
      [DPwhich argument [CPthat Cosmo1is a genius]]?

There are, however, some cases that do not follow straightforwardly from Copy Theory. For instance, if the complement CP that Cosmo is a genius in (70) were replaced by an adjunct CP also containing the R-expression Cosmo, the sentence would become acceptable.

(71) [DP Which argument [ADJ that Cosmo1 made]] did he1 believe?

According to Lebeaux (1988), cases as (71) do not involve true violations of Condition C. I will refer to his approach as the Lebeauxian Approach to Anti-Reconstruction (LATAR).

(72) LATAR
  Apparent violations of Condition C follow from the absence of the constituent containing the relevant R-expression in some members of the movement chain.

LATAR involves assuming that the adjunct containing the violating R-expression appears only in the overt member of the chain. Since the pronoun does not c-command the R-expression it binds, no violation of Condition C may arise.

(73) [DP Which argument [ADJ that Cosmo1 made]] did he1 believe [DPwhich argument]?

More recently, LATAR has been extended to capture phenomena involving A-movement (Takahashi & Hulsey 2009). For instance, the sentence in (74) would be expected to be unacceptable due to a Condition C violation under standard assumptions.

(74) [DP The claim that Cosmo1 was asleep] seems to him1 to be correct.

Adapting Lebeaux’s proposal, Takahashi & Hulsey (2009) argue that the complement NP of the determiner the, i.e., the constituent containing the R-expression Cosmo, appears only in the head of the chain.19

(75) [DP The [NP claim that Cosmo1 was asleep]] seems to him1 to be [DPthe] correct.

The same kind of analysis may be advanced for sentences such as (76). Here, the grammatical subject is interpreted as the logical subject of the predicate intrusion (i.e., the picture is an intrusion), so this DP must have been generated very low in the structure of the sentence. Notice that (i) the DP the president inside the subject can be correferential with the pronoun him, so him should not be able to c-command the president, but (ii) the quantifier every man can bind the pronoun his inside the subject, so the quantifier must c-command this pronoun.

(76) [DP His1 picture of the president2] seemed to every man1 to be seen by him2 to be an intrusion.

This deceptive contradiction may be accounted under LATAR. What is generated near the predicate intrusion is a bare possessive determiner that lacks an NP. This element moves through successive cyclic A-movement and reaches a position where it can be bound by every man. Finally, its complement NP appears only in the head of the movement chain.20

(77) [DP His1 picture of the president2] seemed to [every man]1 [DPhis1] to be seen by him2 [DPhis1] to be [DPhis1] an intrusion.

If this approach to anti-reconstruction is on the right track, then these movement dependencies involve chains like the ones depicted in (78), in which chain members are not isomorphic constituents.

(78) a. CH = {[DP which argument that Cosmo made], [DP which argument]} (cf. (71))
  b. CH = {[DP the claim that Cosmo was asleep], [DP the]} (cf. (74))
  c. CH = {[DP his picture of the president], [DP his], [DP his], [DP his]} (cf. (76))

Compare the way Indexical-S and Inclusion-S generate these chains. Under Indexical-S (cf. (3)), two (or more) constituents form a chain only if they receive the same index through the Copy operation. Therefore, to explain the differences between members of the same chain in (78) it would be necessary (i) generating two (or more) strictly identical copies with the same index, and then (ii) applying an additional operation on the higher copy to introduce the constituent containing the relevant R-expression. Consider the following sample derivation. A constituent αP is generated in a position where it is c-commanded by a pronoun; since αP does not contain an R-expression, Condition C is respected (cf. (79a)). Later in the derivation, αP moves to a position where it c-commands the pronoun; both copies of αP share the same index (cf. (79b)). As a third step, a βP containing an R-expression is inserted into αP as in (79c). At this point, the pronoun and the R-expression can be correferential, and since both occurrences of αP share the same index, they form a chain.

(79) a. [XP X … [YP Pronoun1 … [ZP … [αP α]i ]]]
  b. [XP [αP α]i [X’ X … [YP Pronoun1 … [ZP … [αP α]i ]]]
  c. [XP [αP α [βPR-expression1 ]]i [X’ X … [YP Pronoun1 … [ZP … [αP α]i ]]]

The derivational step in (79c) corresponds to the operations that are called Late Merger (Lebeaux 1988) and Wholesale Late Merger (Takahashi & Hulsey 2009). The difference between them is that Late Merger is supposed to be restricted to adjuncts (i.e., βP in (79c) must be an adjunct), while Wholesale Late Merger extends the empirical domain of the former to complements (i.e., βP in (79c) may be either an adjunct or a complement). In other words, both operations involve merging a constituent countercyclically inside a derived specifier. Thus, it may be concluded that implementing LATAR under Indexical-S implies abandoning strict cyclicity.

As known, cyclicity is a theoretical desideratum in generative syntax since at least Chomsky (1965), and it is encoded in several derivational restrictions, such as the Extension Condition.

(80) Extension Condition (Chomsky 1993)
  Syntactic operations must extend the tree at the root.

Therefore, if an extensionally equivalent and cyclicity-respecting implementation of LATAR is offered, it should be preferred on conceptual grounds.

The definition of Non-Distinctiveness based on Inclusion-S has two important traits that allow offering a cyclic implementation of LATAR. First, Inclusion-S does not require structural isomorphism between chain members; it only states a condition on their morphosyntactic values. Second, Inclusion-S does not require chain members to be related through the Copy operation.

Consider the cases involving anti-reconstruction in A-movement in (74) and (76). The sentence in (74) is repeated for convenience in (81) with a description of the features of the relevant constituents. A bare determiner Dmin/max is base-generated low down in the structure inside a small clause. This constituent does not carry a full set of valued φ-features as some of these are only inherently valued in the NP domain (e.g., Number, Gender). After T is merged, the Spec,T position is filled with a base-generated full-DP with a complete set of valued φ-features. This DP agrees with T and receives nominative Case.21

(81) [DP The [NP claim that Cosmo1 was asleep]]{<κ,NOM>, <Num,SG>, …} seems to him1 to be correct [DP the]{<κ,∅>, <Num,∅>, …}.

The base-generated full-DP and the bare determiner comply with the conditions to form a chain according to Inclusion-S. That is, (i) they are in a c-command relation, (ii) the values of the full-DP contain the values of Dmin/max (i.e., {…} ⊆ {NOM, SG, …}), and (iii) there are no potential interveners between them. Therefore, these two elements form the chain in (78b) despite the fact they are not transformationally related.

Consider now the sentence in (76), repeated for convenience in (82). Here, a bare possessive determiner merged low down in the structure undergoes successive cyclic A-movement to a position just below the quantifier every man. There, it gets bound by the quantifier. Later in the derivation, a full-DP is externally merged in the Spec,T position, receiving nominative Case.22

(82) [DP His1 picture of the president2]{<κ,NOM>, <Num,SG>, …} seemed to every man1 [DP his1]{<κ,∅>, <Num,∅>, …} to be seen by him2 [DP his1]{<κ,∅>, <Num,∅>, …} to be [DP his1]{<κ,∅>, <Num,∅>, …} an intrusion.

According to Inclusion-S, the full-DP and the copies of the possessive determiner form the chain in (79c).

The sentence in (71), repeated for convenience in (83), involves an additional derivational step. Here, the DP which argument is externally merged in the complement position of the verb believe, where it receives accusative Case. Almost at the end of the derivation, a new DP which argument that Cosmo made is base-generated in the specifier position of the interrogative complementizer. Since this new DP carries an active ω-feature, it enters in an Agree relation with CINT. Notice, however, that the DP still lacks a value for its κ-feature.

(83) [CP [DP Which argument [ADJ that Cosmo1 made]] {<κ,∅>, <ω,Q>, …} [C’ CINT did he believe [DP which argument]{<κ,ACC>, <ω,∅>, …}]]]]]

To value its κ-feature, the higher DP probes the structure for a matching Goal. By hypothesis, it looks for an active element matching both κ and ω-features, so the DP which argument is the closest and only available candidate. Both DPs agree, and the κ-feature in the higher DP is valued, delivering the representation in (84) to the interfaces.

(84) [CP [DP Which argument [ADJ that Cosmo1 made]] {<κ,ACC>, <ω,Q>, …} [C’ CINT did he believe [DP which argument]{<κ,ACC>, <ω,Q>, …}]]]]]

According to Inclusion-S, these two non-transformationally related DPs form the chain in (79a).

As seen, Inclusion-S allows generating non-isomorphic chains without assuming any countercyclical operation. Moreover, the principles restricting this implementation of LATAR are no different from the ones assumed by Takahashi & Hulsey (2009) for Wholesale Late Merger. These authors identify two types of constraint: (i) Agreement/Case, and (ii) semantic interpretability. Regarding the former, it has been already observed that a bare Dmin/max does not carry a complete set of φ-features. Therefore, this type of element is unable to value the φ-features of a Probe and receive Case. It follows, then, that a full-DP should be introduced in the derivation as late as in the specifier position of the relevant Case assigner. From this, prediction in (85) follows:

(85) Only caseless chain members (i.e., “traces” of A-movement) may be Dmin/max.

Regarding interpretability, Fox (2002) proposes that Trace Conversion applies at LF to the tail of an A’-movement dependency to obtain a valid operator-variable relation under Copy Theory.

(86) Trace Conversion (Fox 2002: 67)
  a. Variable Insertion: (Det) Pred ➔ (Det [Pred λy(y=x)]
  b. Determiner Replacement: (Det) [Pred λy(y=x)] ➔ the [Pred λy(y=x)]

This rule transforms a wh-phrase into a definite description with anaphoric value.23 The subpart in (86a) introduces a predicate <e,t> that functions as a variable and is interpreted compositionally with a complete nominal predicate, i.e., another <e,t> expression, through Predicate Modification (Heim & Kratzer 1998). Incomplete nominal predicates (i.e., nouns lacking some argument) are not <e,t> expressions, so they are not proper inputs for Trace Conversion. Therefore, the tail of an A’-movement dependency must always contain a noun with all its arguments. Consequently, the statement in (87) follows.

(87) Anti-reconstruction effects in A’-movement of DPs are restricted to non-arguments of nominal predicates.

Adopting Trace Conversion also allows ruling an unwanted consequence of base-generating chain members under Inclusion-S. Consider the structure in (88). If a DP as which girl is externally merged in a θ-position, and a different DP which woman is base-generated in Spec,C, they would be expected to form a chain.

(88) [CP [DP Which woman]{<κ,ACC>,<ω,Q>, …} [C’ CINT did Elaine meet [DPwhich girl]{<κ,ACC>,<ω,Q>, …}]]]

This unwanted result is ruled out as the output of applying Trace Conversion would yield the uninterpretable operator-variable dependency in (89). As the pair in (90) shows, lexical identity on the NP is a requisite for a definite expression to receive anaphoric interpretation.24

(89) *Which woman λx. Elaine met the girl x

(90) a.   The neighbor of [every comedian]1 always takes advantage of [the comedian]1.
  b. *The neighbor of [every comedian]1 always takes advantage of [the postman]1.

The unacceptability of (89) shows that there are semantic mechanisms imposing identity conditions on chain members. Importantly, these mechanisms do not seem to depend on any narrow syntactic device (e.g., the Copy operation). Presumably, some other independently motivated principles also introduce constraints on the properties of unpronounced chain members.25

To sum up, Inclusion-S offers a straightforward way of capturing anti-reconstruction effects under LATAR. Moreover, it allows getting rid of countercyclical operations as Late Merger and Wholesale Late Merger, a very welcome result from a conceptual point of view.

4 Concluding remarks

Copy Theory is based on the idea that elements forming a chain are non-distinct. In this paper, I offered a definition of the Non-Distinctiveness relation based on the featural content of constituents in a phrase marker: Inclusion-S. According to it, two constituents are non-distinct for the purposes of chain formation if the morphosyntactic properties of one of them constitute a subset of the morphosyntactic properties of the other. This condition is part of a representational algorithm of chain recognition that applies independently and in parallel at both interface levels.

Apart from offering a principled definition of Non-Distinctiveness, Inclusion-S introduces a number of empirical and conceptual advantages over a mere indexing mechanism. As discussed, it allows understanding wh-copying as a phenomenon in which a morphological reanalysis operation affects how chains are computed at PF. That is, LF takes a set of occurrences of a wh-pronoun to form a single chain, while the same elements form two (or more) chains at PF, which derives the doubling pattern.

Something similar has been argued to happen in non-identical wh-doubling constructions in Dutch. In this case, however, both interfaces form two chains from a set of wh-elements. The distribution of the pronouns wat, wie and die indicates that the doubling pattern is attested in those cases in which the overt pronouns cannot form a chain at PF according to Inclusion-S. Moreover, these constructions display an intervention effect triggered by negation that shows that the pronouns do not form a chain at LF either.

Finally, anti-reconstruction phenomena have been used to show that non-isomorphic constituents may be part of the same chain in certain contexts. This follows from Inclusion-S, as it predicts that two elements may form a chain even if they are not derivationally related through the Copy operation.