Deconstructing categories syncretic with the nominal complementizer

This paper investigates the internal structure of categories syncretic with the complementizer from a nanosyntactic perspective (cf. Starke 2009; 2014; Caha 2009). The (emotive factive) that-complementizer in Germanic, Romance, Hellenic, Slavic and Finno-Ugric languages has the same morphophonological form as other nominal categories, like demonstrative, interrogative, relative pronouns and indeterminate nouns. We claim that this homophony is not accidental. We also argue that these elements are internally complex and composed of syntactico-semantic features which are hierarchically ordered according to a functional sequence. More specifically, the internal structure can be considered essentially trimorphemic, being composed of (i) a lexical core or base which in our data is nominal (the nominal core called simply n), (ii) an inflectional ending (which we label Infl or Φ), and (iii) a functional morpheme which resembles an article of sorts and often (but not always) appears as a prefix (which we label simply F). The n and Infl components in the structures studied here are invariant and can be shown to be quite small, while F, on the other hand, crucially varies in size, depending on the function of the relevant morpheme involved (Dem, Comp, Rel, Wh or Indet). Importantly languages may lexicalize each of these components (n, Infl, and F) in different ways. Evidence for the fseq we are advocating comes from crosslinguistic patterns of syncretism and morphological containment.

The prevalence of this phenomenon suggests that we should not treat it in terms of accidental homophony but rather in terms of a common underlying structure and more properly syncretism, defined as "a surface conflation of two distinct morphosyntactic structures" (Caha 2009: 6).Cross-categorial syncretism thus arises when two or more distinct grammatical categories, each with a distinct underlying structure, are spelled out by a single element.In this paper we will show (i) that syncretism involving the nominal complementizer is highly constrained, (ii) that the elements participating in these syncretism patterns can be decomposed further, into a tripartite morphological structure, and (iii) that each of the morphological components in this tripartition have certain basic properties which are stable across languages.

Syncretism with the emotive factive complementizer
That-complementizers vary as to what information they lexicalize crosslinguistically (Roussou 2010;Baunaz 2015;2016;in press;Baunaz & Lander in press;2017a).Whereas some languages lexicalize only a single form of the complementizer, others show two or even three morphophonological forms.For instance, whereas English has only one nominal Comp (that), Modern Greek (MG) has two (oti and pu). 3 In languages where more than one complementizer appears, the distinction is related to interpretation.In MG for instance, oti is a non-factive complementizer and pu is a factive complementizer.
In Baunaz & Lander (2017a), we show that the declarative complementizer (Comp) participates in crosslinguistic syncretism patterns involving the demonstrative pronoun (Dem), restrictive relative marker (Rel) (in most languages considered here actually an indeclinable relativizer, labeled Rvz), and interrogative pronoun (Wh). 4Furthermore we observe that in many cases there is syncretism between these categories and a bound morpheme encoding a bleached meaning like 'thing'.In many of the languages studied here this bound morpheme makes up part of the internal structure of certain quantifier words.We call this element Indet, which stands for indeterminate noun. 5he data considered in Baunaz & Lander (2017a) came from a sample of 13 Indo-European and Finno-Ugric languages.Our data set in this paper has been expanded to 22 languages.We also make an important distinction in this work which has not been made before, namely that it is the emotive factive complementizer (that is, the complementizer used under predicates like 'regret', 'be surprised', 'be happy', etc.) which is the relevant function that overlaps with the functions represented by Dem, Rel/Rvz, and Wh (e.g.Greek pu is syncretic with the Rvz function, while oti is not).In languages where there is no overt distinction between the factive and non-factive complementizer, we assume that there is syncretism between the two.For English, for instance, we provide that as the factive complementizer (even though it happens to also be the non-factive complementizer).

The data
Table 1 shows the main data we consider in this paper.Languages are grouped into North Germanic (NGmc), West Germanic (WGmc), Romance (Rom), Finno-Ugric, Hellenic, East Slavonic (ESlav), West Slavonic (WSlav), and South Slavonic (SSlav).Syncretism is indicated by gray shading, and pronouns are provided in their neuter/inanimate singular forms throughout.As the reader can see, our columns can be arranged in one particular order which accounts for all the patterns attested crosslinguistically in terms of strictly adjacent cells in the paradigm.Exactly this behavior is typical of syncretism (Caha 2009). 6Non-standard English also has relative use of what (e.g. the thing what I can't stand), giving a Rel/Wh syncretism. 7We do not discuss Spanish in this paper, as Spanish quantifiers happen not to show overt realization of an indeterminate noun -que (cf.cada 'each, every', cada uno 'each one', alguno 'some/someone, somebody', alguien 'someone, somebody'), noting in passing that cual-que 'some' was in use in Old Spanish.See also Section 4. Again, more often than not the relative marker at stake is an indeclinable relativizer (Rvz).Note also that some languages with multiple complementizers available will allow one of them to appear under either factive or non-factive predicates, while the other one is possible under factive predicates only (e.g.SC factive/non-factive da vs. strictly factive što).In such cases we give the complementizer which is unambiguously factive (e.g.SC što), as this is the one that participates in syncretism. 8 At this point, we briefly summarize the data in Table 1 branch by branch.

Germanic
We see that in North Germanic, there are no (obvious) syncretisms with the complementizer.In West Germanic, complementizers are often syncretic with the relativizer and the distal 3sg neuter Dem, but they are not syncretic with Wh.Wh and Indet are syncretic in Icelandic, Dutch, German, and Yiddish (cf.Ice.eitt-hvað 'something', Du. iets 'something' but also wat 'some(thing) ', Ger. et-was 'something', Yid. et-vos 'something').
Note that where German has et-was, Swiss German has öp-is 'something' (vs.öp-er 'someone'), suggesting -is (and -er) are indeterminate nouns (see also Leu 2016).Note that Swiss German -is is not syncretic with the item in the next cell (Wh was).The fact that Wh and Indet are very frequently syncretic might suggest that Indet is not really distinct from Wh and that what we have labeled Indet (e.g. in Fr. quel-que) is just Wh.To us the Swiss German (and probably Bulgarian) facts prove that Indet and Wh really are separate entities.

Romance
In Romance (minus Romanian), Comp, Rel, Wh and Indet are all syncretic with each other, but these are not syncretic with Dem.Romanian has one declarative complementizer, că.Complementizer că is the complementizer by default, appearing almost everywhere except under predicates selecting the subjunctive mood (see fn. 8 and also Baunaz & Lander 2017a for more details).Că is not syncretic with Rel, Wh, Dem or Indet.Note also that ce is used as a relativizer, and that this item is syncretic with the Wh item meaning 'what' (Grosu 1994;Benţea 2010, among others), as well as with the Indet word meaning 'thing' (cf.Ro. ce-va 'something', ori-ce 'anything'). 8We are acutely aware that there are various complications involving indicative vs. subjunctive (or realis vs. irrealis) in the complementizer systems of many of the languages under discussion here, such as Greek, Balkan, and the dialects of South Italy.For instance, in addition to oti and pu, Modern Greek also has na under desiderative ('wish'-type) verbs.The status of na is debated (Roussou 2010 considers it as a complementizer, while Giannakidou 2009, among others, view it as a mood particle), but it is clear that its distribution is different in nature from that of oti or pu.The same has been observed for Griko (Italiot Greek), with its declarative complementizer ka vs. modal complementizer (or particle) na (Baldissera 2013 and references cited there).Parallel to Greek, both Bulgarian and Serbo-Croatian also display a subjunctive mood particle (da) under certain verbs (see Krapova 1998

Finno-Ugric
The Hungarian complementizer hogy (related to Rel a-hogy and Wh manner adverbial hogy(an) 'how, in which manner/way') is not syncretic with Dem a-z, Rvz a-mi, Wh mi or Indet -mi, though there is syncretism between a-mi and mi (the a-in the Rel marker is likely a D-marker, much like th-/d-in West Germanic).For Indet consider vala-mi 'something, anything', bár-mi 'anything, whatever'.The Finnish complementizer että is syncretic with neither Rel, Wh and Indet.That Finnish että is nominal can be argued on the basis of the fact that it is historically derived from the demonstrative *e-(see ez 'this' in Hungarian).The -ttä component is taken to be a modal ending, with the original meaning of 'in this way, so' (Keevallik 2008: 141).We may note that the Finnish proximal demonstrative tä-derives from another root than -ttä, but that their phonological similarity may perhaps be synchronically analyzable in terms of a nascent Dem/Comp syncretism.As for mi-, it is syncretic with Rel, Wh, and Indet (mikä hyvänsä 'anything', ei mi-kään 'nothing'), as in Hungarian.For the Wh paradigm, mi-is the inanimate stem (vs. the animate stem ke-).

Hellenic
Modern Greek has two different complementizers (though see fn.8 above).Pu introduces factive complements, and oti introduces non-factive complements.Complementizer pu is syncretic with the relativizer pu, but not with Dem.We note that the locative Rel pronoun ó-pu is bimorphemic, combining interrogative pu with the definite article o-(cf.Hungarian a-hogy, a-mi above).Oti may also introduce epistemic factive complements (but not emotive factive complements).MG complementizer pu is thus the factive complementizer which we single out in our data.It is syncretic with Rel, but not with Dem, Wh, and Indet.Wh and Indet, however, are syncretic (consider Indet in ká-ti 'something', tí-pota 'anything').

Slavic
The complementizer by default in Russian is čto.Čto is syncretic with the Rvz, Wh and Indet (i.e.čto-to 'something', ne-čto 'something specific'), though not with Dem to.
Polish że is not syncretic with anything in the standard language, but it is syncretic with a relativizer which is available in South-Eastern Polish and in some non-standard varieties of Polish.We note that relativizer and the Wh-word co are also syncretic with the Indet -co (see Po. co-ś 'something').
The default complementizer že in Czech has similar properties as its Polish cognate, though it does not seem to serve as a relativizer in any Czech varieties we know of.Czech co also shows Rvz/Wh/Indet syncretism (see Czech ně-co 'something' for Indet), as in Polish.
Like Modern Greek, Serbo-Croatian and Bulgarian lexicalize two complementizers: da and što in SC, and če and deto in Bulgarian.In Serbo-Croatian the complementizer da is the complementizer by default.It is not syncretic with Rel, Wh, Indet, or Dem.The use of SC što is quite limited: it only appears under emotive factive verbs.It is syncretic with Rvz, Wh and Indet (for which consider SC ni-šta 'nothing', ne-što 'something').9In addition, just like in Russian, SC što is partially syncretic with the 3.sg demonstrative to.In Bulgarian, the complementizer če appears in most environments, with the notable exception of under emotive factive verbs, where deto is used.Comp deto is syncretic with Rvz deto, but not with Wh kakvo 'what' or with Indet -shto (see Bg. ni-shto 'nothing, anything', ne-shto 'something').See also fn.8 above.
Finally, the emotive factive complementizer in Macedonian is što, which is also a relativizer, a Wh-pronoun and an Indet (for which consider Mac.ni-što 'nothing', ne-što 'something'), instantiating a Rvz/Wh/Indet syncretism.Though deka is the default complementizer (i.e.not the emotive factive complementizer), we include it here to show that it is syncretic with Rvz deka in this language.

Additional evidence
We note that although most of the evidence in Table 1 comes from Indo-European, the patterns in Hungarian and Finnish (Finno-Ugric) also conform to the sequence.Some further evidence will serve to bolster our generalizations.
First consider our claim that it is the factive complementizer which is at stake in the paradigm above.Consider data from Gungbe, in which factive clauses (1) and relative clauses (2) both make use of the same element, Rvz ɖĕ. (1) Gungbe (Aboh 2005 Similarly in Turkish (Turkic), the factive nominalizer -DIK (as opposed to non-factive -mA/-mAK; Bağrıaçık & Göksel 2016: 64) is also used for relative clauses (more precisely non-subject relatives, which according to Kornfilt 2008 instantiates the unmarked way to nominalize relative clauses in Turkish).This suggests a factive Comp/Rel syncretism in Turkish as well.
Second consider the fact that the Dem/Comp syncretism appears to be somewhat rare in our data, since only Germanic shows this pattern (see Roberts & Roussou 2003: §3.4; in particular see Longobardi 1991 andFerraresi 1997;2005 for Gothic, whose complementizers were case-inflected or relativized demonstratives).However, Heine & Kuteva (2002: 107, citing Lehmann 1982: 64) point out that a similar development (from Dem to Comp, as for Old English þaet) has taken place in Welsh a, Akkadian ša (<šu), and Nahuatl in.Furthermore, as mentioned above, Finnish demonstrative tä-'this' and complementizer että may be approaching syncretism, giving us another potential Dem/Comp syncretism.In Japanese, finally, the (roughly) factive complementizer ko-to (original meaning 'thing'; Heine & Kuteva 2002: 295) is apparently made up of ko-plus the non-factive complementizer to (cf.Kuno 1973, among others).Interestingly, the ko-component seems to be the same as the Dem root ko-'this'. 10hird, a reviewer brings up the possibility that various North-West Italian dialects systematically defy our generalization above, since demonstrative kwel 'that' can also be interrogative 'what' (see Munaro 2001) while Comp and Rel are both ke.Thus we seem to have a systematic violation of the adjacency effect seen in Table 1.However, there are various reasons to be careful here.First of all, although Munaro (2001: 282) proposes that Wh kwe is historically a reduced form of Dem kwe(lo/lu) in these dialects, the synchronic situation according to the Atlante Italo Svizzero (1919Svizzero ( -1926) ) was that the two nevertheless were distinct and not syncretic at all (Ligurian: Dem kwelo/kwelu/kölu vs. Wh cos(a)/cose/cusi, Southern Piedmontese: Dem lo/lu vs. Wh cosa, Central Piedmontese: Dem lon/lun vs. Wh kwe/kwa, Northern Piedmontese: Dem kul(lu) vs. Wh kwe, Valdotian: Dem (t)sò/sèn vs. Wh kye; Munaro 2001: 282, his (1)).The fact remains that the demonstrative could be used to mean 'what', both in the older AIS data as well as in modern North-West dialects, but an important fact is that the complementizer (che, chi, cu, etc.) must follow the demonstrative in order for the interrogative reading to emerge (Munaro 2001: 283-284 and elsewhere).This syntactic dimension (and other complications we cannot discuss here for reasons of space) seem to preclude analyzing these forms in terms of straightforward syncretism, at least not in the sense that we mean it.

Nanosyntactic approach to syncretism
The nanosyntactic approach to syncretism (Caha 2009) is crucially based on the idea that morphosyntactic heads are cumulative, as schematized in (3).11 (3) Each of these structures can also be associated with a phonological exponent.When two (or more) structures are spelled out by the same phonological exponent, we speak of syncretism.Syncretism has been shown to be restricted to adjacent cells/layers in a paradigm, which is known as the *ABA generalization (see Bobaljik 2007;2012;Caha 2009).Capitalizing on this adjacency restriction, by looking at attested syncretisms across languages it becomes possible to deduce the underlying linear order of functional heads at stake.
The syncretisms in Table 2, for example, necessitate the linear order in (4), where Dem is next to Comp which is next to Rel which is next to Wh which is next to Indet.What syncretism patterns and the *ABA theorem cannot tell us, however, is which hierarchical order is correct, that is, whether (5a) or (5b) is correct: ( In Baunaz & Lander (in press;2017a) we present reasons for believing that (5a) is the correct fseq.For the details of the argument we refer to these papers.For the purposes of this paper it is not crucial which fseq in ( 5) is accurate, but we will assume it to be (5a).This means that the underlying structures for Dem, Comp, Rel, Wh and Indet are the ones given in ( 6). (6) Dem pro

A tripartite internal structure
It will be noted that the forms provided in Table 1 have not been overtly decomposed in any way (though the fact that we explicitly take a nanosyntactic approach to syncretism implies that there is a complex internal morphosyntactic structure).In this section we will endeavor to take this next step, performing a radical decomposition on these forms.The ultimate result of this will be an underlying tripartite structure.The decomposition, furthermore, forces us to change the way we think about the syncretism patterns being tracked in Table 1.There are a number of logical possibilities regarding the nature of this syncretism once a decompositional approach is taken.If there are three morphemes per form, as we will argue below, then we must ask which morpheme participates in the cumulative, superset-subset structure-building that is at the center of the formal analysis of syncretism.It could be that only one morpheme represents the fseq Dem > Comp > Rel > Wh > Indet (e.g. ( 7)), or that two (e.g. ( 8)) or even all three (e.g. ( 9)) of them do, growing and shrinking in sync with each other depending on which structure is being built (Dem, Comp, Rel, Wh, Indet).
Indeed, the complexity of the situation increases greatly once multiple morphemes, each with their own potential internal structure and behavior, are taken into account.Not even all logical possibilities are presented in (7-9), moreover.For example, the different sequences (A and C in (8), for instance) do not necessarily have to be dependent on each other (e.g. if A builds up to A 3 then C must also build up to C 3 ) but could be partially or totally independent of one another.In any case, the point should be clear that allowing for multiple morphemes complicates the clean picture of syncretism presented in Table 1.
Fortunately in this case, we will argue that the simplest of these scenarios -the one sketched in (7), with a single fseq and two invariant bits of structure -is at stake for the data in Table 1.

Germanic
The elements in Table 1 can be decomposed further, strongly suggesting that they have a complex internal structure.Taking a look at English, it is clear that the Dem items can be segmented into two parts, as seen in ( 10).(For now we provide forms in their standard orthography, but more phonologically precise representations are given later in the paper).
( In other words we can speak of a basic bimorphemic structure as in Table 3, where the first part displays a morphological alternation and the second part remains stable.In Table 3 we label these two components F (for 'functional') and Base.The decomposition in Table 3, however, is only an approximation.It can be seen that Base in Table 3 actually contains (at least) two elements: a vowel and a consonant.Taking Dutch and German, the consonant (Du.-t, Ger.-s) can be identified with frozen inflectional/agreement (Φ) morphology (cf.German strong adjective ending for neuter nominative/accusative, i.e. -es/-s; see also Leu 2015, for instance).This leaves the vowel (Du.-ɑ-, Ger.-a-) as the "true" realization of what we have called Base.
Obviously, since the non-decomposed Base in Table 3 was invariant, each of the individual components resulting from decomposition in Table 4 inherit this property of invariance.That is, neither Du. -ɑ-/ Ger.-a-nor Du. -t / Ger.-s show any kind of morphophonological alternation throughout the paradigm.
Even languages with (apparently) more complex systems can be seen to fall into a tripartite structure with invariant Base and Inflection components.The relevant Icelandic forms are given in Table 5, where we note that sem is very likely a portmanteau that is not overtly decomposable (though the fact that að can cooccur with it might put this part

F Base
of the relativizer on a par with the other -a-cores identified in the table).Here and in the tables that follow gray shading will be used to highlight portmanteau elements.First, -ð is uncontroversially the neuter singular inflectional marker in Icelandic, and not surprisingly it remains stable in Table 5.As for the Base, besides sem (which we have taken to be a portmanteau), the only obstacle to Icelandic showing us another instantiation of an invariant Base seems to be the long vowel in the Wh/Indet pronoun (i.e.Wh /k h v-aː-ð/ vs. short /-a-/ elsewhere).However, this is a non-issue since vowel length is a non-contrastive, predictable (hence allophonic) property in Icelandic.Thus there is no real phonological reason stopping us from equating this long -aː-with the short -a-seen elsewhere in the paradigm.
Yiddish too, with some additional investigation, can be seen to fall in line with the expectation of an invariant Base and Inflection.We will begin with the Base.In Table 6, we see that -o-is a plausible candidate for the realization of an invariant Base in Yiddish.Once again, the apparently deviant form, this time the demonstrative pronoun jenc, can

F Base Inflection
Dutch Table 5: Trimorphemic decomposition in Icelandic.

F Base Inflection
Icel.

F Base Inflection
Yiddish plausibly be analyzed as a portmanteau jent-(consisting of both F and Base).Note in particular that the ending /-ts/ in Dem jenc could be seen to deviate from the regular -s in the rest of the paradigm.Interestingly, Jacobs (2005: 112) writes that "[s]urface phonetic affricates are frequently found between consonants l, n, and a following sibilant" (but then immediately following with "though this might be seen as a historical process").In other words, depending on one's analysis of such affricates in Yiddish, one could defend the view that /-ts/ should be analyzed simply as /-s/ in this case, with the (predictable) insertion of the stop /-t-/ in this particular environment (i.e./jen-s/ > jenc [jents]), bringing this ending in line with the rest of the paradigm as well.
Even if the process of t-insertion is historic and not a part of the active phonological processes, one can still say that the -t-is part of the portmanteau.12Thus jent-is not decomposable and thus cannot be expected to show the Base.

Slavic
The discussion of Yiddish jen(t)-leads us quite smoothly into Slavic, where a packaging similar to jent-is on full display.In order to set the background, we start from SC što and Ru.čto (we leave Bulgarian and Macedonian for future work), where we observe the same tri-morphemic template as in West Germanic: SC /ʃ-t-o/ and Ru./ʂ-t-o/.This is shown in Table 7. Historically, /ʃ/ and /ʂ/ derive from palatalization of the wh-morpheme k-before a front vowel (i.e.Proto-Balto-Slavic *ki-to > Proto-Slavic *čь-to 'what').The second consonant t-is the demonstrative root, and -o is the neuter singular inflection.
Polish and Czech, however, do not fit in as neatly.While Dem t-o can be decomposed just as in Serbo-Croatian and Russian, Comp że /ʐe/ (Polish), že/ʒe/ (Czech) and Rel/Wh/Indet co /t͡ so/ (both Polish and Czech) are less straightforward: not only are the consonants different, but in Comp the vowel is also divergent (i.e.Comp /-e/ vs. /-o/ in Dem, Rel, Wh, and Indet).We think a natural approach for Polish and Czech would be to analyze the initial affricate /t͡ s-/ in Rel/Wh/Indet co /t͡ so/ as a portmanteau of F and Base (like Yiddish jen-), as shown in Table 8.
Furthermore, the alternation between -e and -o in Table 8 also falls out in a completely regular way, since -e is in fact an allophone of -o after "soft" consonants (e.g.Po. ż-/ʐ/ and Cz.ž-/ʒ/).This means that Polish Comp ż-e and Czech complementizer ž-e have exactly

F Base Inflection
Serbo-Croatian the same basic structure as Serbo-Croatian and Russian Comp/Rel/Wh/Indet što/čto above, with the same neuter singular ending as well.
In sum, in Polish and Czech both Comp ž-e and Rel/Wh/Indet c-o are underlyingly tripartite structures, with the initial affricate being analyzed as a portmanteau morpheme.

Romance
If we start by looking at French, though, it is clear that it is amenable to an analysis like the one provided for Slavic above, where /k-/ in que /kǝ/ is a bigger chunk of structure, composed of both F and the Base component.Indeed, the same can be posited for ce /sǝ/, as shown in Table 9. 14If we had taken /s-/ or /k-/ to be the Base, then the F column would require a null morpheme, which we take to be an undesirable analysis, preferring to see these consonants as portmanteau morphemes encoding both the F ingredient and the Base. 15However, an unfortunate result of the portmanteaux in French is that the Base cannot be overtly identified, since decomposition is not possible.
Turning now to Italian, the same analysis can, simply for the sake of parsimony, be given for It.che /k-e/.As for Dem quello /kwello/, on the other hand, there is reason to

F Base Inflection
French think that it should be segmented slightly differently, i.e. /kwe-llo/.Indeed, as seen in ( 11), Italian demonstratives overtly contain the definite article.Now, if we identify Def with the F slot, then we can keep Italian on a par with much of Germanic by putting -(l)lo here as well.The leftover morpheme, namely /kwe-/, appears to be quite large.We can offer two possibilities for /kwe-/: either it can be segmented as /kw-/ plus /-e/, or it is a portmanteau made up of both Base and Inflection.The first option is more interesting: not only does it leave us with a regular /-e/ ending in the righthand column throughout the paradigm, but more importantly it means that the nominal core emerges in the Dem form (recall that it remained hidden in French above) as kw-.Therefore we have shown this first option in Table 10 below.The Base emerges in Dem in Italian.
Finally we can take a look at the Romanian forms, where it seems possible to uncover a realization of the Base.First of all, we might assume that Comp că /kǝ/ has the same underlying structure as in French and Italian, namely that the initial stop is a portmanteau of F and Base.However, since the vowel is different in Comp (schwa rather than /e/), it would be better to analyze the entire form as an idiomatic portmanteau of F, Base, and Inflection.This approach allows us to posit that the vowel /e/ is a completely regular morpheme in the rest of the paradigm.Furthermore, Dem contains the definite article in Romanian, as it does in Italian.See Tables 11-12 (from Savu & Bican-Miclescu 2012).This means that -l in (a-)cel /tʃ-e-l/ needs to be put in the F column.Finally we abstract away from a-in acel since it is droppable in the vernacular.As seen in Table 13, this leaves us with a nominal core tʃ-.
As we will explain below, /tʃ-/ in Rel, Wh, and Indet is technically not an overt realization of the Base; rather, in these forms /tʃ-/ must be a portmanteau of both F and Base.Only in the Dem form is /tʃ-/ an overt realization of the Base in Romanian.See section 5.3 for further details.

Back to English
We now turn to English, with its even more complex (and seemingly problematic) paradigm.In Table 14 we see the English forms, and despite what seems to be a high degree of decompositionality, there is some variation in what is expected to be the invariant Base ingredient: -ae-or -ʌ-/-ɒ-?
One interesting fact relevant for our purposes here concerns the marking of distal/neutral vs. proximal.Note that this distinction in English is actually accomplished by the contrast /-aet/ vs. /-ɪs/ (rather than, say, /-ae-/ vs. /-ɪ-/). 16In other words, there is reason to want to keep /-aet/ in that as a unit rather than splitting it up into two parts (i.e./-ae-/ plus /-t/).For Comp and Rel, however, which are not meant to encode spatial deixis, this reasoning does not apply.
There is also an important detail about Comp and Rel that sets them apart from Dem, in that they have reduced forms that Dem does not (i.e./ðət/, as in The fact /ðət/ he bought new clothes… or the man /ðət/ came to dinner but not */ðət/ girl is my daughter).Interestingly, the first part of this reduced form is homophonous with the definite article in English, i.e. the /ðə/.(Note that, at least for Rel, affixation of the definite article -or at least something syncretic with the definite article -is crosslinguistically common: Fr. le-quel, Hu. a-mi, MG ó-pu and even Comp o-ti.)In other words, we have reason to group /ðə-/ as a unit in Comp and Rel (in the same way we grouped /-aet/ as a unit for Dem).Indet is not syncretic with anything: because it is not overtly decomposable and also quite large (i.e. about the same size as the lexical noun thing), we propose that it is a portmanteau of F, Base, and Inflection.This brings us to the revised table for English, in Table 15. 16However, the case could be made for overt tripartition in the plural, since /-z/ serves to mark plural in both distal/neutral and proximal: /ð-oʊ-z/ (Am.), /ð-əʊ-z/ (UK) vs. /ð-iː-z/.

F Base Inflection
Rom.

F Base Inflection
English As seen in Table 15, the Base now reveals itself in the interrogative pronoun, namely the vowel -ʌ-in American varieties and -ɒ-in UK varieties.

Interim conclusion
We have argued for a tripartite underlying structure for Dem, Comp, Rel, and Wh items.Crucially, in various cases it can be shown with some further investigation that languages which do not seem to fit the tripartite mold at first actually are not counterexamples to our hypothesis at all.Instead, these languages just require a more careful approach, where complicating factors about phonological processes or structural considerations concerning possible portmanteau morphemes (which are typically treated as "phonological idioms" in nanosyntax) are kept in mind.
Since the Base is invariant, it will not grow and shrink in the typical subset-superset manner that nanosyntactic theory makes use of in its account of syncretism (Caha 2009) -but if we are correct about the Base, then there is no syncretism to account for anyway.We will take a similar stance towards the frozen set of Φ features (Inflection) as well, namely that it is a constant, invariant piece of structure.This leaves us with F. That is, F is the source of the syncretism patterns we have identified in a number of languages in Table 1 above.It is this part which involves cumulative structure-building in the nanosyntactic sense.In the languages we have considered here, then, we end up with an F that alternates based on the particular function (Dem, Comp, Rel, Wh, or Indet) and two invariant ingredients, Base and Infl.

The underlying structure
For the items discussed so far we can propose the basic merge order in (12a), which entails the specific structures in (12b). 1717 A reviewer expresses some worries about morphological decomposition and the nature of the fine-grained grammatical features at stake here: what does it mean for a demonstrative pronoun to be composed of Comp, Rel, and Wh features?Though we do not have a full answer to this question, we do have two relevant comments to make.(i) Our data seem to point to a view in which these elements are at their core actually much simpler, syntactico-semantically speaking, than usually thought, basically made up of a noun with some functional architecture added.Additional syntactic functions and semantic readings probably result from these nominal elements entering into relations during the syntactic derivation with other elements (e.g. the relationship established between an antecedent and its relative pronoun which can be analyzed in terms of movement and agreement).Fleshing this idea out more, it appears that we are actually tracking D features rather than operator features.Icelandic hv-að 'what' vs. eitt-hv-að 'something', for example, show that hv-is not an operator in both since in the latter form eitt-'some-' takes on this role.This would mean that hv-words are more rightly indefinites, as in Japanese, i.e. they get their quantificational meaning via an additional quantificational particle of some kind (which can perhaps be null in Icelandic).Thus the features responsible for building hv-seem to be different from the features responsible for building quantificational particles.The "Wh" feature we are looking at, then, does not necessarily trigger an interrogative meaning but is the necessary basis for an operator to enter into the configuration.(ii) On the other hand, the range of functions which we are investigating should not be overestimated.In fact, we have isolated only one thin "slice" of the syntactico-semantic realm to which these elements belong.Dem, Comp, Rel, and Wh do not overlap in every direction, but happen to do so when these particular functions (again, distal or While Infl can be identified with a set of Φ features, we still have not said anything explicit about Base.As seen in ( 12), we suggest that Base is a (semi-)lexical category on top of which Φ 18 and the functional features used for building Dem, Comp, Rel, Wh, and Indet items are merged.In other words, Base is a noun of sorts, which we will label the nominal core (n).
We have argued that n is invariant.Because there is no syncretism to account for in n, then, there is no reason to think that its structure will grow and shrink in the subset-superset way typical of syncretism (Caha 2009).We propose to take a similar stance towards the Inflection part, that is, the frozen set of Φ features: it is a constant, invariant piece of structure.F, on the other hand, is apparently the source of the syncretism patterns we have identified in a number of languages above (see Table 1).It is this part which involves cumulative structure-building.

Germanic
These results can now be represented in Tables 16-19 below, where F is now represented with a larger cell in the Dem row, and then gradually smaller down to Indet.Reflecting the basic facts in Table 1, F is either syncretic with Dem, Comp, Rel (German, Dutch), or with Comp, Rel, Wh, Indet (Yiddish), pointing to the hierarchy Dem > Comp > Rel > Wh > Indet.
neutral Dem, emotive factive Comp, indeclinable restrictive Rel, and interrogative pronoun Wh, all with neuter/inanimate singular inflection) are considered.Thus there is a bigger question of how functional sequences intersect with one another to create such multidimensional paradigms, with syncretism patterns going in multiple directions (for relevant discussion see Vanden Wyngaerd to appear). 18If we want to keep the basic merge order of F-domain > n, then we must allow for the constituent of Φ features to be merged in different spots in the structure.In English, n and Φ can form a portmanteau, this implies a constituent made up of n and Φ to the exclusion of F, meaning Φ is merged between The nominal core in these five languages has also been revealed as a vowel: Ger.-a-, Du. -ɑ-, Icel.-a-, Yid.-o-and English -ʌ-(Am.)and -ɒ-(UK).Inflection is overtly realized as the 3 rd person neuter morpheme of these languages.

Slavic
As for German and Dutch, decomposing the relevant forms into three parts in Serbo-Croatian and Russian does indeed result in a plausible candidate for the "genuine" nominal core (SC/Ru.-t-).Syncretism in the F column is with Comp, Rel, Wh and Indet (just like in Yiddish above).Following our analysis from section 3.1.2above, we propose to represent SC and Russian as in Table 20.
Recall that in Polish and Czech, the initial affricate /t͡ s-/ in Rel/Wh/Indet co /t͡ so/ is a portmanteau of F and Base (like Yiddish jen-).The only instance where the Base is overtly realized is with Dem t-o (see also fn.15) in both languages.This is represented in Table 21.

French and Italian
In section 3.1.3we argued for an analysis for French amenable to the one provided for Slavic, where /k-/ in que /kǝ/ is a bigger chunk of structure, composed of both F and n, i.e. /k-/ is a portmanteau.In French, thus, the nominal core cannot be overtly identified,  Table 18: Identifying the nominal core in Yiddish.

Indet
-thing since decomposition is not possible.The same can be posited for ce /sǝ/, as shown in Table 22.
Recall that we argued for a similar analysis for Italian, with Dem quello /kwello/ segmented slightly differently, i.e. /kwe-llo/.Morphologically we argued that -(l)lo occupies the F position, while /kwe-/, being quite large, should be segmented as /kw-/ plus /-e/: not only does it leave us with a regular /-e/ ending in the right-hand column throughout the paradigm, but more importantly it means that the nominal core emerges in the Dem form (recall that it remained hidden in French above) as kw-.This is shown in Table 23.

Romanian
Romanian Comp că /kǝ/ has been analyzed as an idiomatic portmanteau of F, n, and Infl.The vowel /e/ in Dem acel, Rel, Wh and Indet ce is a completely regular morpheme in the rest of the paradigm.Also Dem contains the definite article in Romanian, as it does in Italian, meaning that -l in (a-)cel /tʃ-e-l/ needs to be put in the F column.As seen in Table 24, this seems to leave us with a nominal core tʃ-.
Note here that the /tʃ-/ elements appearing in Rel, Wh, and Indet are technically not overt realizations of the nominal core, but portmanteau morphemes of both F and n.If they were not realizations of both F and n and only represented n, then the F column would be left empty for no (apparent) reason in Rel, Wh, and Indet.If che and che cosa instantiate (related but) distinct structures, as necessitated by our approach, then there must also be an interpretive difference between the two.While more research is required, this prediction appears to be borne out.In ( 16) we have provided two ways to ask 'What did you do?', one with che (16a) and one with che cosa (16b).
(16) a. Che hai fatto?b.Che cosa hai fatto?'What did you do?' Interestingly, (16b) with che cosa is odd in an out-of-the-blue context, whereas che (16a) is fine in this context.More specifically, the option with che cosa presupposes that something has necessarily been done (Ciro Greco, p.c.), along the lines of 'What is/are the thing(s) that you have done?'.This supports our general line of reasoning, according to which che and che cosa must be (at least slightly) different, structurally speaking.A nice consequence of being forced to posit a double structure for che cosa with a semi-lexical N, then, is that this may start to explain the presuppositional reading which is present in (16b) but absent in (16a).

Concluding remarks
In this paper we have endeavored to show that Dem, Comp, Rel, Wh, and Indet elements not only participate in syncretism patterns but also have a tripartite structure.By looking in more detail at the internal structure of these elements, we have shown that they are composed of an F domain that can grow and shrink in the typical nanosyntactic subsetsuperset manner, an invariant base (n, also called nominal core), and an invariant inflection.
Based on Baunaz & Lander (2017a), we have claimed that languages may lexicalize some or all the parts of this fseq in different ways: if the Base n is compulsory in all languages, in some languages the F domain can either be missing or form a portmanteau with the Base n, or it can realize a morpheme on its own.In languages where Infl is realized, it is almost always frozen and suffixed to the compulsory Base n.Base n and Infl can also form a portmanteau constituent to the exclusion of F. Overall, the Base n appears to be the fundamental structural building block used in the construction of Dem, Comp, Rel, Wh, and Indet elements.
We have one final remark.Since Base and (we have assumed -cf.for example our brief discussion of Yiddish /-s/ above) Inflection are invariant, it turns out that what we have been tracking in our syncretism data is in fact F, which often appears as a functional prefix in many of the languages we have considered here.This is an interesting result, especially since our hierarchy parallels findings from more traditional cartographic work on the clausal spine (e.g.D > C > Rel in Cinque 2008; Force > Int > Foc (i.e.C-domain) > Wh in Rizzi 2001).This parallelism suggests that the word-internal or morphological structure we are interested in is replicated at the higher clausal level.In fact, it would appear that the bigger the F-structure is, the higher the entire complex of [F + n + Φ] ends up quel-lo lo (+ word-initial sC-or z-) [quel] il (+ word-initial other C-) F.SG quel-la la M.PL que-gli gli (PL of lo) que-i i (PL of il) F.PL quel-le le F and n: [F… [Φ [n]]]; in other languages F and n make up a portmanteau morpheme to the exclusion of Φ, meaning that Φ must be merged on top of F: [Φ [F… [n]]].

Comp SUBJ ki -Rel ki, or Làconi: Comp IND ka -Comp SUBJ tʃi -Rel tʃi). Thanks
to a reviewer for comments on this topic.

Table 2 :
Six crucial syncretism patterns from Table1., we see that Bulgarian, Yiddish, and (some varieties of) Polish show that Comp and Rel must be adjacent.Standard Czech and Finnish show that Rel and Wh must be adjacent.(Non-standard) English shows that Dem and Comp must be adjacent.Finally Yiddish, (some varieties of) Polish, Standard Czech, and Finnish show that Wh and Indet must be adjacent.Hence the linear ordering in (4) is the only one which can capture these facts without any *ABA violations.

Table 4 :
Trimorphemic decomposition in Dutch and German.

Table 8 :
Trimorphemic decomposition in Polish and Czech.

Table 16 :
Identifying the nominal core in German and Dutch.

Table 17 :
Identifying the nominal core in Icelandic.

Table 19 :
Identifying the nominal core in English.

Table 20 :
Identifying the nominal core in Serbo-Croatian and Russian.

Table 21 :
Identifying the nominal core in Slavic.checosa as parallel structures of the same size.For che we are already committed to the idea that ch-spells out F and n, and that -e spells out Infl; if we now have che cosa, we still need F, n, and Infl for che, leaving no available structure for cosa.We are forced, then, to give che and che cosa different structures, with one possibility (with cosa corresponding to a constituent made up of n and semi-lexical N) in (15).