When missing NPs make double center-embedding sentences acceptable

A number of languages, such as English, exhibit a grammaticality illusion in ungrammatical double center-embedding sentences where a VP is missing. This article shows that the illusion generalizes to ungrammatical Mandarin Chinese double center-embedding sentences where the head NP of a relative clause is missing. The Mandarin illusion raises interesting questions for existing accounts of center-embedding illusions. Mandarin missing NP sentences consist of three transitive verbs and only three NPs; the clear shortage of NPs should affect the thematic relations built for such sentences, with potential consequences for acceptability. We explore these issues with acceptability judgment experiments. We show that these illusory sentences receive distinct thematic interpretations compared to their better-studied missing VP counterparts, in ways not predicted by structural forgetting or interference accounts. A computational simulation further shows that the Mandarin illusion is problematic for accounts that attribute cross-linguistic variation in the illusion to differences in language experience. To capture cross-linguistic variation, we build on existing interference accounts, in which the parser mis-attaches a verb or NP to the main clause instead of a relative clause. We supplement this approach with a repair process, in which the parser tracks thematic relations, repairing them where necessary so no verb or noun is thematically “orphaned.” We suggest that the illusion of grammaticality arises when the parser can establish thematic relations between all verbs and nouns. This interference-and-repair approach provides a unified analysis of the missing VP and missing NP illusions, while accounting for the observed difference in thematic relations.


Introduction
Native speakers are usually very sensitive to grammatical violations. They can quickly tell when a sentence is ill-formed, even though they might not be able to give a precise description of the problem. However, for a subset of ungrammatical sentences, native speakers do not show the same kind of sensitivity. Because these "illusions of grammaticality" occur systematically, they are of psycholinguistic interest, as they reflect some systematic error in the way linguistic representations are built or accessed.
A particularly acute type of grammaticality illusion can be found in sentences with double center-embedding (DCE). In English, these sentences are formed by embedding an object relative clause inside another object relative clause that modifies a subject (1), producing a sequence of three noun phrases (NPs) followed by three verb phrases (VPs). While syntactically well-formed, DCE sentences are generally hard to parse and are judged as strongly unacceptable (Chomsky & Miller 1963;Miller & Isard 1964;Cowper 1976;Gibson & Thomas 1999;etc.). Curiously, these sentences are perceived to improve in acceptability if a VP -specifically, the middle VP -is omitted (Frazier 1985, attributing the original observation to Janet Fodor;Gibson & Thomas 1999;also Pulman 1986).
(1) Double center embedding (DCE) (Gibson & Thomas 1999) [ An important recent finding is that this effect, often called a "missing VP illusion," varies across languages: sentences that are string-equivalent to (2) elicit the illusion in English, French, and Spanish but not in German and Dutch (Gimenes et al. 2009;Vasishth et al. 2010;Frank et al. 2016;Frank & Ernst 2019;Pañeda & Lago 2020; see Häussler &Bader 2015 andBader 2016 for discussion of exceptions in German). Several recent proposals attribute this variation to cross-linguistic differences in the frequency of verb-final clauses and verb clusters, suggesting that linguistic experience can influence the processing of DCE sentences and the size of the illusion (Vasishth et al. 2010;Frank et al. 2016;Futrell & Levy 2017;Frank & Ernst 2019;Futrell et al. 2020).
This article is interested in why ungrammatical DCE sentences in some languages can exceptionally elicit a sense of acceptability and how speakers interpret these sentences. We bring in novel data from Mandarin Chinese. With acceptability judgment experiments in Mandarin and English, we demonstrate a missing NP illusion for Mandarin that is analogous to the better-studied missing VP illusion.
The Mandarin data also let us evaluate experience-based accounts about cross-linguistic variation in the illusion. To do so, we adapt and extend an existing computational model (Futrell et al. 2020, also Futrell & Levy 2017.
Our results pose challenges to existing accounts. Mandarin missing NP sentences are interpreted differently from their English missing VP counterparts, in ways not predicted by these accounts.
Our modeling results suggest that experience-based accounts incorrectly predict that Mandarin should lack the illusion, like German and Dutch. We discuss how an interference account, such as Bader 2016, might be modified to capture cross-linguistic differences.
This article is structured as follows. In Section 2, we review the missing VP illusion and various existing accounts of this illusion. We then review center-embedding in Mandarin in Section 3 and present the experiments in Sections 4 through 7. We discuss implications of our results in Section 8, before concluding in Section 9.

The missing VP illusion
In this section, we consider why missing VP sentences in languages like English exhibit a grammaticality illusion. For scope reasons, we will only review recent memory-based accounts, which have been adapted to explain cross-linguistic variation. 2 We distinguish between three classes of accounts: structural forgetting, language experience, and interference accounts.
Existing accounts are mostly focused on why such an illusion is found in some, but not all, languages. In our review of these accounts in Section 2.1, we break down this question into two smaller questions: (i) What is the representation(s) in which a VP is omitted, and over which acceptability is computed? (ii) How does a VP get omitted in that representation in some (but not all) languages? The first question is often addressed obliquely, so where necessary, we spell out what the underlying assumptions might be.
Section 2.2 focuses on the perception of acceptability: What aspect of the representation of a missing VP sentence makes it relatively acceptable, despite its ungrammaticality?

Structural forgetting
A classic structural forgetting account is proposed by Gibson & Thomas (1999). While they do not explicitly state what representations are built, Gibson & Thomas appear to assume that a sentence can be encoded, at a minimum, as a set of NPs, VPs, and so on. They also appear to assume a process that combines constituents to produce argument-predicate relations, although they say little about how the combinatorial process works for relativization. The parser predicts upcoming constituents based on the constituents observed so far, but can forget certain predictions under high memory load.
For illustration, we describe how the parser might process a DCE sentence like (1). The first NP, the novel, triggers a prediction for a verb, required to form a complete sentence. The second NP, the horror author, signals to the parser the presence of an object relative clause, triggering a prediction for a VP -specifically, a verb and an empty category. The third NP, the publishing company, also triggers the prediction for a VP, consisting of another verb and empty category. These predictions are costly to maintain, and the parser forgets the prediction with the highest memory cost in order to reduce memory load.
In Gibson & Thomas's theory of memory load, prediction costs depend on structural position and the number of discourse referents processed since a prediction was made. Consider the prediction costs at the point at which all three NPs have just been processed. NP1's prediction is stipulated to be costless, as it is associated with the main clause. Between NP2 and NP3, NP2's prediction is more costly, because after it was made, a further discourse referent (NP3) was introduced. No equivalent discourse referent exists for NP3. As a result, NP2's prediction gets forgotten.
Gibson & Thomas also provide experimental evidence for this claim. They systematically omit each of the three VPs from a DCE sentence (3a) to produce the missing VP conditions in (3bd). The logic of the manipulation goes as follows. If the forgotten VP prediction is the one associated with NP2 the horror author, then the first VP should be paired with NP3 as its subject, and the next VP with NP1 as its subject. These pairings are plausible in (3c), but not in (3b) and (3d): in (3b), the VP had typed quickly gets paired with the NP3 the publishing company, for instance. Consequently, (3c) should be more acceptable than (3b) and (3d), reflecting the differences in plausibility. This prediction was verified experimentally.
To be clear, the degree to which a missing VP sentence like (3c) is more acceptable than its grammatical counterpart (3a) varies somewhat: Gibson & Thomas did not find a significant difference, but Frank & Ernst (2019) did. Regardless, the fact remains that both studies show ungrammatical sentences like (3c) to be no less acceptable, a hallmark of a grammaticality illusion. Vasishth, Frank, and colleagues suggest that these cross-linguistic differences reflect differences in language experience. They point out that German and Dutch consistently have verb-final embedded clauses, unlike English. As a result, in German and Dutch, verbs are more likely to appear relatively far from their subjects. Sequences of verbs are also more common. Consequently, speakers of these languages are better at parsing grammatical DCE sentences and detecting the ill-formedness of missing VP sentences. Both types of sentences feature sequences of two or three verbs, such that some of the verbs appear some distance away from their subjects.
This hypothesis has been fleshed out in two ways. Vasishth et al. (2010) integrate it with the structural forgetting hypothesis. They suggest that German and Dutch speakers' experience with processing verb-final structures can condition their working memory, so predicted verbs have "more robust memory representations" (p. 558) in these languages than in English. Consequently, German and Dutch speakers are less likely to forget predictions, even under high memory load. Similar intuitions are implemented in the computational models of Engelmann & Vasishth (2009), Futrell & Levy (2017), and Futrell et al. (2020. The second option is a shallower theory that avoids making similar commitments about memory representations, mentioned by Frank et al. (2016) and Frank & Ernst (2019). This theory assumes implicitly that sentences can be represented linearly as sequences of nouns and verbs. Because sequences of three consecutive verbs are more common in German and Dutch than in English, they are read more easily and rated as more acceptable, even in DCE contexts. However, as Frank & Ernst and Futrell et al. note, this is a weaker theory of the illusion: it predicts that an English DCE sentence should become more acceptable as long as any of the VPs is omitted, a prediction inconsistent with Gibson & Thomas's results. Häussler & Bader (2015) and Bader (2016) present proposals that appeal to interference and primacy and recency effects, all well-established concepts in the memory retrieval and sentence processing literature (see also Gibson & Thomas 1999: 242-244). In both accounts, the parser attaches words into a syntactic representation as they appear. Both accounts also assume that recency effects cause the parser to correctly attach the first verb to the VP in the lower relative clause: the parser has just processed the third NP as the subject of the lower relative clause, so the relative clause (or its VP) is still highly activated.

Interference
At that point, the representation contains two empty positions for verbs, one in the main clause and one in the higher relative clause. Both positions compete for the attachment for the next verb. Häussler & Bader (2015) observe that the main clause was the first clause created by the parser. It therefore benefits from a primacy effect, making it more likely that the second verb is incorrectly attached there, leaving the relative clause without a verb.  Bader (2016) presents a structural account of why the strength of the illusion differs between English and German. German has verb-second word order in main clauses, so the two attachment sites differ syntactically: the relative clause site is inside a verb-final VP, while the main clause site is a verb-second position in CP. The German parser can therefore discriminate between the two sites. In contrast, English does not have verb-second word order. Both attachment sites are inside VPs, so they are more easily confused, making wrong attachment to the main clause site more likely. Although Bader does not go into details, this proposal may be compatible with a cue-based retrieval system, in which structural features like "VP" and "CP" are part of the set of retrieval cues associated with a verb.
To explain why German speakers can successfully go on to attach the third verb to the main clause (instead of overlooking or forgetting the need for a third verb by the time it appears), Bader appears to suggest that the distinctiveness of the German main clause attachment site makes it easier to detect when it is missing a verb. He also suggests that processing load might play a role. In these sentences, the subject is structurally high; he argues that this position is associated with a lower processing load, freeing up resources that improve parsing accuracy. We also note, following Häussler & Bader (2015), that primacy effects might provide yet another explanation: the main clause attachment site was created early, and so is less easily overlooked.

What makes English missing VP sentences acceptable?
In contrast to the debate on why a VP gets omitted, there is less discussion about why ungrammatical missing VP sentences can be perceived as more acceptable. Implicit in existing accounts is the idea that the parser comes to develop certain expectations about the VPs in these sentences. These expectations are then borne out, producing a sense of completeness.
We can distinguish between two variants of this idea. The first variant is found in accounts like Gibson & Thomas's and computational models of the illusion (also see Christiansen & Chater 1999;Christiansen & MacDonald 2009). Memory limitations cause the parser to expect only two VPs. Since the parser observes only two VPs, these predictions are borne out.
In contrast, interference accounts implicitly assume that the parser expects to fill all three VP nodes, even though the parser ends up leaving the intermediate VP node empty and fails to notice it. Regardless of why exactly the empty node goes unnoticed, the result is that the parser mistakenly concludes that all three nodes have been filled and the sentence is complete.
Here, we offer an alternative non-structural perspective, based on the premise that a core parsing objective is to establish thematic relations between the arguments and predicates in a sentence: who did what to whom, so to speak. Setting aside what processes cause VPs to be omitted, we suggest that missing VP sentences are acceptable not because of the parser's structural expectations, but because thematic relations can be successfully built between all remaining arguments and predicates. Because no argument or predicate is thematically "orphaned," speakers perceive a sense of completeness, which translates into an illusion of grammaticality.
For illustration, consider the missing VP sentence in (3c): The novel that the horror author who the publishing company had recently fired was banned by the local library. In this sentence, all three NPs (more precisely, the arguments they represent) are related to some VP predicate: NP1 the novel is related to VP2 was banned by the local library, while NP2 the horror author and NP3 the publishing company are related to VP1 had recently fired. Conversely, the two predicates in VP1 and VP2 collectively require three arguments: two for had recently fired, and one for was banned by the local library. These requirements are satisfied by the three NPs. It is easy to see from Figure 1 that every argument and predicate is connected to something else. (For comparison purposes, we depict thematic relations for a grammatical DCE sentence in Figure 2).
An important question here is what kind of representation is implicated in this thematic relations account. It is difficult to say definitively that the relevant representation must be syntactic and that thematic relations are read off from that structure. In a grammatical DCE sentence, NP1 and NP2 should each be thematically and syntactically related to two verbs: the verb of the relative clause modifying them and the verb that they are the subjects of, as reflected in For this reason, we do not rule out the possibility that in addition to a syntactic representation, there is a shallower representation that lets the parser track what thematic relations exist in a sentence. We remain agnostic about the specifics of this shallower representation, for example, whether relations are encoded directly between constituents (NPs, verbs) or the arguments and predicates they represent.
Of course, our suggestion that shallow representations are involved seems inconsistent with a body of psycholinguistic research. There is substantial evidence that the parser incrementally builds a rich, connected syntactic representation, as revealed by the finding that the real-time building of various dependencies is sensitive to syntactic constraints (for filler-gap dependencies, see Stowe 1986;Traxler & Pickering 1996;Phillips 2006;Wagers & Phillips 2009;etc.; for anaphoric dependencies, see Sturt 2003;Kazanina et al. 2007;Aoshima et al. 2009, etc.).
This inconsistency can be resolved if we consider the exceptional syntactic complexity and memory load associated with DCE sentences (Miller & Isard 1964;Resnik 1992;Gibson & Thomas 1999, etc.). Under such circumstances, the parser might find it difficult to build and maintain a connected structure for interpretation. Consequently, it might rely on a shallower interpretive strategy, directly tracking how arguments and predicates are linked, so that it can produce a passable interpretation.
As presented, the thematic relations account, as well as the structural forgetting and interference accounts, predict missing VP sentences to be highly acceptable, contrary to experimental findings (Gibson & Thomas 1999 and Experiment 3 here; Frank & Ernst 2019). The low ratings might be attributable to the length and structural complexity of these sentences. Alternatively, the proposed parsing mechanisms might be less deterministic than depicted, so missing VP sentences do not consistently produce an illusion of grammaticality.
Overall, the thematic relations hypothesis makes very similar predictions as existing accounts about missing VP sentences. However, a key difference is that it does not require the parser to overlook the fact that the second NP in these sentences is missing a VP. Rather, the absence of a VP is essentially offset by the fact that this NP can still be linked to another predicate. One way to test this hypothesis, then, would be to consider a double center-embedding scenario where this link is unavailable, so the second NP (or its equivalent) runs the risk of being "orphaned." Mandarin Chinese provides just such a test case.

The present study
As discussed below, the word order properties of Mandarin conspire to produce DCE sentences in which a sequence of three transitive verbs is followed by a sequence of NPs. Mandarin exhibits a missing NP illusion, in which the second transitive verb seems to be thematically orphaned. We highlight challenges in adapting existing hypotheses to account for the missing NP illusion.  We present experimental evidence to substantiate these points. Experiments 1 and 2 demonstrate the missing NP illusion and clarify how the transitive verbs and the following NPs are thematically related. Unexpectedly, we found that the final NP in missing NP sentences can be related to all three verbs, so no verb is truly orphaned. This outcome diverges sharply from English, where the final VP (the analog of the Mandarin final NP) is interpreted only with the first NP. We run a third experiment on the English missing VP illusion (replicating Gibson & Thomas 1999) to ensure that this comparison is valid.
Our claim about the Mandarin illusion rests on the premise that missing NP sentences are truly ungrammatical. Where appropriate, we point out how these sentences are ill-formed. We show in Section 3.2 how they are different from Mandarin constructions where NPs are omitted. We designed materials for Experiment 1 to prevent certain alternative grammatical parses, while Experiment 2 provides additional evidence that rules out a grammatical single center-embedding parse. We also run post-hoc analyses on acceptability ratings, showing that Mandarin missing NP sentences have the same acceptability profile as English missing VP sentences, which are unambiguously ungrammatical (Sections 6.6 and Appendix 1).
Finally, we argue that the language experience account, intended as an account of crosslinguistic variation of these illusions, does not predict the missing NP illusion. We do so with a computational simulation (Experiment 4), adapting Futrell and colleagues' model of the language experience account (Futrell et al. 2020;also Futrell & Levy 2017).

Center-embedding in Mandarin
Mandarin has head-final relative clauses that precede a particle de, which in turn precedes the head noun. Suppose that a noun NP3 is modified by a subject relative clause with an object NP2. This complex noun phrase has [V-NP2-de]-NP3 word order. Since Mandarin is a Subject-Verb-Object language, a sentence where this complex NP is the object has Subject-V1 -[V2 -NP2-de]-NP3 word order. Inserting another subject relative clause to modify NP2 yields a DCE sentence with The prime minister is meeting the minister who previously rebuked several times the judge who just heard the corruption case not long ago.' A missing NP sentence can be constructed by omitting one of the des and NPs (5), by analogy with missing VP sentences.
The prime minister is meeting ----previously rebuked the judge who just heard the corruption case not long ago.' As the English "translation" of (5) shows, a missing NP sentence is ungrammatical: the three transitive verbs each require an object, but there are only two NPs in object positions. However, impressionistically, such a sentence is as acceptable as its grammatical counterpart (4), if not more so -it is this relative acceptability that we refer to as the missing NP illusion.

Ruling out null object or headless relative clause analyses
Mandarin has null object and "headless" relative clause constructions in which NPs are omitted. One might wonder whether missing NP sentences might be instances of these constructions and therefore grammatical. This section argues against this possibility. First, null objects (6a) and headless relative clauses (6b) constructions elicit a specific percept of ill-formedness when presented out of context, when no antecedent is available (e.g. Huang 1984;Li 2014). In contrast, missing NP sentences are relatively acceptable even out of context.  Second, missing NP sentences are syntactically distinct from both constructions. In null object sentences, verbs usually appear at the end of a clause (embedded or otherwise), unless adverbial material happens to be present (6a). In missing NP sentences, none of the verbs appear in such a position. Both V1 and V2 precede another verb that appears to start an embedded clause, while V3 immediately precedes a direct object NP. As for headless relative clause sentences like (6b), only the head NP is omitted. This contrasts with missing NP sentences, where an NP and a de are missing.

Challenges for existing accounts
Abstractly, missing NP sentences resemble missing VP sentences: missing NP sentences feature a sequence of three verbs and two NPs, while missing VP sentences feature a sequence of three NPs and two VPs. However, it turns out to be challenging to adapt hypotheses proposed for the missing VP illusion to account for the missing NP illusion. Under the structural forgetting and interference accounts, there is a distinct possibility that the middle transitive verb would be "orphaned" thematically, without agent and theme arguments, in turn raising questions about why missing NP sentences should be relatively acceptable. The language experience account, on the other hand, predicts that Mandarin should not exhibit such an illusion.
We first elaborate on the challenge for the structural forgetting and interference accounts. In Mandarin, as in English, the parser forgets or overlooks the absence of a constituent in a DCE sentence. However, unlike in English, the absence of this constituent -an NP -should cause a more severe disruption in terms of thematic relations, which should reduce acceptability. The three verbs in missing NP sentences are transitive, each requiring two arguments. However, including the main clause's subject, there are only three NPs. Intuitively, there are not enough NPs.
More specifically, let us consider missing NP sentences from an incremental parsing perspective, following these accounts. (We have simplified the following description, for exposition purposes. We refer interested readers to Jäger et al. 2015 for an overview of the local ambiguities that a Mandarin parser must navigate in parsing a relative clause.) Suppose that a missing NP sentence like (5) is parsed in the following manner. First, the parser observes an NP, prime minister, and a transitive V1 meet. It interprets the NP as V1's subject and agent argument. Observing V1 triggers a prediction for an NP object, its theme argument.
Next, when a transitive V2 rebuked is seen rather than V1's object, V2 is recognized as the main verb of a subordinate clause modifying V1's object. Being transitive, V2 triggers predictions for an NP object and a de. To the extent that the parser analyzes this subordinate clause as a subject relative clause, it should also expect the NP appearing after de to be interpreted as V2's agent.
Lastly, when a transitive V3 hear is seen rather than V2's object, V3 is recognized as the main verb of a subordinate clause modifying V2's object. The parser again should predict an NP object (V3's theme) and de. Again, to the extent that the parser analyzes this subordinate clause as a subject relative clause, it should expect the NP appearing after de to be interpreted as V3's agent.
Both structural forgetting and interference hypotheses claim that the parser forgets or overlooks the predictions associated with V2. Consequently, the parser should pair the first post-verb NP and de with V3 and the second NP with V1. To the extent that the parser analyzes V3 as a subject relative clause, the second NP should also get interpreted as V3's agent. arguments: V1 and V3; V2 is "orphaned." If missing VP sentences are acceptable because the parser can build thematic relations between predicates and arguments, one might expect missing NP sentences to be less acceptable than what they are reported to be. (For comparison, Figure 4 depicts thematic relations for a grammatical DCE sentence.) Alternatively, one might wonder if the "orphaning" of V2 in Figure 3/(5) is only superficial.
Since Mandarin allows null arguments, perhaps the parser can supply arguments for V2 from the discourse context or world knowledge, so V2 can satisfy its argument structure needs without being linked to other NPs in (5). If so, it should be easy to find a grammatical version of (5) that has overt arguments for V2 and is consistent with the thematic relations in Figure 3. But there is no acceptable way to add two more overt NPs for V2 to a missing NP sentence like (5).
We next turn to the challenge for the language experience hypothesis. It has been suggested that this hypothesis, when paired with an account like structural forgetting, can explain crosslinguistic variation in the missing VP illusion (Vasishth et al. 2010, Engelmann & Vasishth 2009, Frank et al. 2016Futrell et al. 2020). We argue that this approach does not improve matters here. In fact, adopting it might yield the prediction that Mandarin should lack such an illusion.
The root of this problem lies in the syntax of Mandarin noun phrases: they are uniformly headfinal; all complements and modifiers are marked overtly with de. Consequently, sequences of nouns, whether interleaved with de or not, are common in Mandarin: they appear in compound (7a) and possessive structures (7b). They also appear when a noun has a subordinate clause that contains an object, regardless of whether the clause is a complement (7c) or relative clause (7d). In fact, subject relative clauses, which correlate with noun sequences (7d), are reported to occur more frequently than object relative clauses (Hsiao & Gibson 2003;Vasishth et al. 2013).

(7)
Mandarin noun sequences (nouns in sequences bolded) a. dàxué xiàozhǎng bàngōngshì university president office 'office of the university president' b.
kēxuéjiā (    Mandarin speakers should therefore be familiar with processing sequences of nouns, just as German and Dutch speakers are familiar with processing sequences of verbs. In addition, given Mandarin's VO word order and head-final NPs, it is not unusual for a verb to be separated from the head of its object NP by a modifier, such as a possessor or a relative clause. The net effect is that Mandarin speakers should find grammatical DCE sentences, which feature long verb-object dependencies and sequences of nouns, easier to process than missing NP sentences.
Of course, the arguments above assume that the missing NP illusion exists and Mandarin speakers interpret missing NP sentences where V2 is thematically orphaned. The next section reports an experiment designed to validate these assumptions. As a preview, we find that missing NP sentences are relatively acceptable, but no verb is orphaned. Instead, speakers can and do interpret V2 as taking NP2 as an argument.

Materials
24 DCE sentences were constructed, each with six versions corresponding to different experimental conditions. Two of these conditions were grammatical DCE sentences and "plausible" missing NP sentences, designed to test for the missing NP illusion. The remaining conditions were "implausible" missing NP sentences, designed to determine which verbs are thematically connected with which NPs in missing NP illusions. We distributed these 24 sentences into six lists using a Latin Square design, so that each list contained four sentences per condition, and no two sentences in each list were variants of each other.
In addition, 48 fillers were constructed. Fillers were similar in length and syntactically complex, featuring subordinate clauses, conjoined clauses, and non-canonical word order. 36 fillers were acceptable sentences and 12 were unacceptable ones. This distribution was chosen so that participants would read an equal number of acceptable and unacceptable sentences, assuming that participants find all target sentences less acceptable than the acceptable fillers. All sentences were presented in simplified Chinese characters.
In all target sentences, a duration or frequency adverb appeared between the NP objects and the clause-final particle de. In (4), repeated below as (8), the duration adverb bùjiǔ "not long ago" appears between NP1 tānwū-àn "corruption case" and de, and the frequency adverb hǎojǐcì "several times" between NP2 fǎguān "judge" and de. These adverbs served to disambiguate the reading of de. Without an adverb, de in this position -between two nouns like tānwū-àn "corruption case" and fǎguān "judge" -is in principle ambiguous. De can mark the end of a relative clause, in which case tānwū-àn is inside the relative clause and fǎguān is the head of the relative clause. Alternatively, de can be understood as a possessive marker connecting two NPs, so "corruption case" and "judge" form a single complex NP (9). The adverb blocks the possession reading.
(9) tānwū-àn de fǎguān corruption-case de judge '(the) corruption case's judge' DCE sentences like (8) were contrasted with "plausible" missing NP variants like (10), which end prematurely at NP2. These variants were plausible in the respect that all likely verb-argument relations were semantically plausible. For instance, (10) is plausible in that NP2, fǎguān "judge" is a plausible theme argument for either V1, jiējiàn "meet," or V2, zébèi-guo "rebuked," and a plausible agent of V3 shěnlǐ "hear." Of course, they were not plausible in the sense that they have a fully coherent meaning, since the sentences were incomplete. We chose verbs, aspect markers, and adverbs so as to maximize the possibility that participants parse a missing NP sentence as doubly center-embedded. First, we tried to pick V1 and V2 verbs with a clear NP complement bias, avoiding verbs that can also readily appear with only a clause-like complement, such as verbs describing speech, cognition, or desire, like xǐhuān "like." The goal was to prevent participants from assigning an unintended, alternative grammatical parse with only a single level of embedding (11b); we expand on this point in Experiment 2. To identify these verbs, we relied on the intuitions of the first author, a Mandarin native speaker. We also verified these intuitions by checking what argument structure frames are associated with each verb in the "frame files" in the Chinese Proposition Bank (Xue & Palmer 2008 Second, verbs within a sentence also had distinct lexical semantics, and where possible, distinct grammatical aspect morphology (such as experiential -guo or perfective -le, translated using the English past tense). As a generalization, two verbs can be conjoined in Mandarin without an overt marker when the verbs have similar semantics and are in the same grammatical aspect. By controlling for these factors, we discouraged participants from coercing a reading of a missing NP sentence where two of the three verbs, such as "criticize" and "rebuke" in (12), are conjoined. A conjunction analysis would produce a grammatical sentence, as illustrated by the English translation in (12b). Third, we added past-oriented temporal adverbs like céng(jīng) "previously," yǐ(jīng) "already," gāng "just recently," zhīqián "in the past" after the first verb and after the second verb, as suggested by an anonymous reviewer. These temporal adverbs seem to encourage these verbs to be parsed as taking a complex NP object, rather than a complement clause.
Four other conditions were created by modifying the plausible missing NP condition, so that the NPs following the verbs would be perceived as inappropriate arguments for plausibility and/or animacy reasons, depending on which of the three verbs the NP was interpreted with. These manipulations parallel Gibson & Thomas's manipulations for their English missing VP experiment (3), which were also intended to detect which nouns and verbs are interpreted together.
Three of these conditions manipulated verbs to alter the plausibility of NP2 as an argument. In the first condition, V1 (and associated adverbs, aspect, and modals) was chosen so that NP2 would be an unlikely or inappropriate theme (13a). While we do not translate sentences like (13a) because doing so would presuppose a parse, the intuition for this condition is as follows: if speakers interpret NP2 fǎguān "judge" as the theme argument of V1 chóngzǔ "reorganize," they should assign a lower acceptability rating to the sentence, since one reorganizes institutions, not judges. Huang  Likewise, in the second condition, V2 was chosen so that NP2 would be an inappropriate theme argument (13b), if NP2 were so interpreted. In the third condition, V3, its object (NP1), and adverbs were chosen so that NP2 would be an unlikely or inappropriate agent (13c): judges do not lead national basketball teams. For ease of reference, we call these the "implausible V1 object," "implausible V2 object," and "implausible V3 subject" conditions. The fourth and last condition altered NP1 so that it would be an implausible theme object of V3, the immediately preceding verb (14). We label this the "implausible V3 object" condition. Obviously, there are other implausible conditions that we could have created. We chose the above four conditions, however, because there is prior reason to think that speakers might interpret the NPs and verbs as standing in these thematic relations. It is plausible that NP2 gets interpreted as V1's object, by analogy to English missing VP sentences, where the first NP is interpreted as the second VP's subject. One might also expect NP2 to get interpreted as V2's object and as V3's subject, as these are the interpretations that NP2 would receive in grammatical DCE sentences. Similarly, one would expect NP1 to be interpreted as V3's object, by analogy to grammatical DCE sentences and because NP1 appears in a canonical object position.

Participants
Participants were 60 self-identified Mandarin Chinese native speakers born in mainland China. All participants were above the age of 18. They were recruited over Prolific and compensated US$3.17 for their time, based on the time estimated necessary for completing the experiment and Prolific's recommended hourly rate of US$9.50.
We recruited participants by first asking them to complete a ten-question multiple choice screening test on Prolific, for which they were compensated US$0.81 for their time (again based on Prolific's recommended rate). In addition to ensuring that participants could read simplified Chinese proficiently, the test also checked for familiarity with Mandarin syntax and vocabulary, including idiomatic expressions that non-native or heritage speakers might find difficult. We selected only participants who answered at least eight questions correctly.

Procedure
Sentences were presented using Ibex (created by Alex Drummond). Participants were instructed to rate the acceptability of each sentence based on their intuitions, with a 7-point Likert scale, where "1" was "very incoherent (bù tōngshùn), totally unacceptable" and "7" was "very coherent, totally acceptable." For the first four practice sentences, participants saw brief Huang and comments about each sentence. For example, they were instructed that a sentence that was incoherent (with word order violations) or described an implausible scenario should receive a low rating. All ungrammatical practice sentences had word order or plausibility issues, except for the following ungrammatical sentence (the fourth practice sentence), a non-DCE sentence missing an NP (15). This sentence was modeled after one of Gibson & Thomas's English practice sentences, a non-DCE sentence missing a VP. Following Gibson & Thomas, participants were told that this sentence should receive a low rating because it was not coherent, without further explanation. Participants were not exposed to missing NP or DCE sentences nor told how they should rate them.
(15) Practice sentence missing an NP Zhè míng hùshì fēicháng guānxīn zhè ge wèikǒu bù hǎo. this cl nurse very.much concern this cl appetite not good Intended parse (if there is one): 'This nurse is very much concerned about this ---whose appetite is poor.' (To make the sentence complete, one might add a de and bìngrén "patient" after wèikǒu bù hǎo.) Each sentence appeared with a rating scale. Participants rated sentences using the keyboard or by clicking on the rating scale. Even though there was no time limit, participants were also instructed to respond as quickly as possible, to discourage them from re-reading sentences and noticing any structural anomalies.

Data analysis
The target sentences in this experiment were fairly long. To ensure that participants had taken the time to read a target sentence before rating it, we excluded responses that were provided in less than 3 seconds after the presentation of the sentence. This eliminated 2.8% of responses.
Acceptability ratings for all conditions were analyzed with a single cumulative link mixed effects model in R version 3.3.2 (R Development Core Team 2019) with the ordinal package (Christensen 2019). The model had conditions as a fixed effect and random intercepts and slopes for both participant and items.

Results
Averaged ratings for the conditions are presented in Table 1; no condition was particularly acceptable in absolute terms. Of the six conditions, the ungrammatical plausible missing NP condition received the highest ratings. Put differently, native speakers did not find grammatical DCE sentences more acceptable. Table 2 presents results of the statistical analysis, with the plausible missing NP condition as the baseline. While this condition received higher ratings than the grammatical DCE condition, the difference was not significant (p = 0.15). On the other hand, all four implausible missing NP conditions were less acceptable than the plausible missing NP condition (all p < 0.01).

Discussion
Experiment 1 confirmed informal impressions of a missing NP illusion. The plausible missing NP condition received numerically higher ratings than its grammatical counterparts. Even though the difference was not significant, calling it a grammatically illusion is still justified. Unlike other kinds of ungrammatical sentences, these missing NP sentences are clearly rated as no worse than their grammatical counterparts. The illusion is analogous to the missing VP illusion, despite differences in word order and thematic relations.
This result has a parallel with Gibson & Thomas 1999, which also reported no significant contrast between missing VP sentences and their grammatical counterparts in English. It is possible, as Gibson & Thomas suggest, that the offline nature of the task let participants reread sentences and more easily notice that missing NP sentences are ungrammatical. For transparency, we also note that whether the numerical difference is significant depends on the statistical model. When we fitted the data with a model with only random intercepts, the difference turned out to be significant, like the other four implausible conditions (all p < 0.01).
More interesting are the contrasts for the other conditions. They suggest that speakers can relate the second post-verb NP (NP2) to all three verbs: as the theme argument (object) of V1 and V2 and as the subject (agent or experiencer) of V3. This finding is unexpected, having no analog in the missing VP illusion. In a missing VP sentence, the second VP is the equivalent of the Mandarin second NP, but there is no evidence that English speakers interpret this VP as taking all three NPs as its arguments, in particular, with both NP1 and NP2 as agent or experiencer arguments (subjects).
These findings do not follow easily from the structural forgetting and interference accounts. Under both accounts, the parser should have forgotten or overlooked V2's predictions for an NP object and a de. Since the object is interpreted as V2's theme argument and the NP that appears immediately after de is interpreted as V2's agent, the forgetting or overlooking of the object and de should cause V2 to become thematically orphaned, without a theme or agent (see Figure 3). This should in turn lower the acceptability of a missing NP sentence. The data, however, show that plausible missing NP sentences are relatively acceptable and V2 is not orphaned.
While we did not predict that missing NP sentences would feature such a set of thematic relations, these results turn out to be consistent with our thematic relations hypothesis, which attributes the center-embedding illusion to the parser establishing thematic relations between all arguments and predicates in the sentence. More specifically, in a missing NP sentence, all NPs (including the main clause subject) are thematically related to some verbal predicate. The subject of the sentence, appearing in a canonical subject position preceding V1, is interpreted as V1's experiencer or agent. The two post-verb NPs are interpreted as arguments of all three verbs, as ratings for the implausible conditions show. Conversely, every verbal predicate also gets at least one argument: V1 has an agent/experiencer in its subject and a theme argument in NP2; V3 has an agent in NP2 and a theme in NP1. While the experiment did not test whether V2 has an agent argument, our results indicate that it has a theme argument in NP2.
Finally, the illusion is not predicted by an account in which center-embedding illusions (or the lack thereof) are derived from language experience. Given the noun-final nature of Mandarin nominal expressions and the presence of noun sequences in the language, such an account predicts that Mandarin should lack the illusion, a point we return to in Section 7 (Experiment 4).

Experiment 2
Experiment 1 suggested that Mandarin has a missing NP illusion parallel to the missing VP illusion reported for English. However, as reviewers pointed out, this conclusion assumes that participants parsed missing NP sentences as being doubly center-embedded. Although we chose verbs and added adverbs to facilitate such a parse, one might still be concerned that some of these sentences were parsed as singly center-embedded. In that event, the absence of a third post-verbal NP would pose no structural problem, leading to higher acceptability.
As an illustration, in such a scenario, the missing NP sentence in (10) could receive the parse indicated in (16). Here, there is only one relative clause, with V2 zébèi-guo "rebuked" as the verb of the clause. V2 takes a VP-like complement headed by V3, shěnlǐ "hear." NP2 fǎguān "judge" is interpreted as V2's relativized subject. To the extent these parses are available, it implies that V2 verbs like "rebuked" can take VP-like complements in general, so structurally simpler sentences like (17) should also be acceptable. This seems unlikely. We had chosen verbs that we judged as incompatible with VP complements, especially when combined with a past-oriented time adverb. In our judgment, (17)  Nevertheless, to avoid prejudging verb complementation facts, we ran a follow-up acceptability judgment study to determine how acceptable sentences like (17) are. For thoroughness, we looked at all V2 and V1 verbs used in Experiment 1, just in case there were also V1 verbs that can take VP-like complements.
Of course, it is difficult to know how to interpret acceptability ratings without baselines: how do we decide whether a verb like "rebuke" can take a VP-like complement? To address this question, we identified three verbs -chéngrèn "admit (to)," fǒurèn "deny," and biǎoshì "say"that can take VP-like complements containing past-oriented temporal adverbs (18). The goal was to aggregate the three verbs to form a baseline. Given our choice of V1 and V2 verbs in Experiment 1, we expected them to be less acceptable on average than the baseline verbs. We also expected the V1 and V2 verbs to vary in acceptability, although we had no predictions about which verbs would be more acceptable.

Materials
We constructed six Subject-Main verb-Adverb-Embedded verb-NP sentences for each of the 42 unique V1 and V2 verbs used in Experiment 1. (In principle, there should have been 48 verbs, but six verbs occur in both V1 and V2 positions.) Within this set of six sentences, one (or two, for the six overlapping verbs) was constructed by reusing constituents from Experiment 1, so that the resulting sentence resembled the original grammatical DCE sentence. An example of this can be seen in (17), where the main verb is zébèi-guo "rebuked": we used NP3 bùzhǎng "minister" as the subject, the adverb gāng "recently," V3 shěnlǐ "hear" as the embedded verb, and NP1 tānwū-àn "corruption case" as the NP. We added modal auxiliaries, aspect markers, or adverbs to the verbs of interest where necessary for felicity. Huang and Phillips Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1292 For the other sentences in each set, we replaced all lexical items other than the main verb and adverb. We made sure the replacement verbs and NPs were as plausible as possible given the choice of main verb. For instance, another sentence was constructed with xuézhě "scholar" as the subject, tàntǎo "investigate" as the embedded verb, and shèhuì wèntí "social issue" as the object (19). These are arguably as compatible with zébèi-guo "rebuked" -a scholar might rebuke someone's investigation of social issues to the same extent as a government minister might rebuke someone's hearing of a corruption case. If these sentences are still unacceptable, we can then reasonably attribute that to structural factors. We also created six sentences for each baseline verb, using the same frame.
We wanted to keep the experiment short to help participants focus on the task. To that end, we randomly sorted the 42 verbs into two sets of 21 verbs. We then added the three baseline verbs to each set, forming two sets of 24 verbs. Within each set of 24 verbs, the six sentences for each verb were distributed to produce six lists, such that each verb appears once in each list. Put differently, there were a total of 12 lists (2 sets of 24 verbs × 6 lists), each containing 24 target sentences (one sentence for each of the V1, V2, and baseline verbs).
Finally, we added 24 filler items to each list, so there were as many filler items as target and baseline sentences. Filler sentences were similar in length and structure, consisting of a subject, an adverb, an auxiliary or a second adverb, and a verb and an NP object. 21 of these sentences were grammatical, while the remaining three were not. On the assumption that the non-baseline target items were unacceptable, this distribution ensured that participants saw an equal number of acceptable and unacceptable sentences.

Participants
Participants were 36 native speakers of Mandarin on Prolific, recruited from the 60 participants who completed Experiment 1. The recruitment process was blind, in the sense that participants' demographic information and their responses from Experiment 1 were not taken into consideration. Each participant received US$1.60, based on our estimate of the time needed to complete the experiment and Prolific's recommended hourly rate.

Procedure
Recall that the V1 and V2 verbs were sorted into two sets and six lists of sentences were constructed for each set. We randomly assigned three participants to each list, so that a total of 18 participants rated sentences for each set of verbs. In other words, through this arrangement, we collected 18 responses for each V1 and V2 verb.
As in Experiment 1, participants rated the sentences on Ibex, with the same 7-point Likert scale. Before starting the experiment, they saw seven practice sentences. For the first four practice sentences, participants saw brief comments about each sentence and what kind of ratings to assign. The fourth practice sentence here was a sentence involving an apparent subcategorization violation (an intransitive verb occurring with a post-verb NP).

Data analysis
After data collection, we noticed errors in three of the target sentences (one featuring the verb dào "visit, go to" and two featuring jiārù "join"); these sentences have been excluded from analysis. To ensure that participants had taken the time to read a target sentence before rating it, we also excluded responses provided in less than one second after the sentence was presented. Altogether, 1.2% of responses were excluded.
We analyzed the results with a cumulative link mixed effects model for each set of verbs. There were two conditions (baseline verbs vs. V1 and V2 verbs), by-subject and by-verb random intercepts, and a by-subject random slope in each model. Huang and Phillips Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1292 Table 3 presents a summary of acceptability ratings. The sentences for the three baseline verbs received uniformly very high ratings, while the sentences for the V1 and V2 verbs received much lower ratings. Statistical analysis confirmed that the difference is significant (p < 0.001, Table 4). There was also some variation within the class of V1 and V2 verbs, but no verb had ratings comparable to the baseline verbs.

Discussion
The results provide clear evidence that the V1 and V2 verbs in Experiment 1 do not allow VP complements containing a past-oriented time adverb. It is therefore unlikely that the missing NP sentences in Experiment 1 were relatively acceptable because some of those sentences had a grammatical single center-embedding parse in which V1 or V2 verbs take such a VP complement.
To sum up: the results of Experiments 1 and 2 jointly indicate that Mandarin Chinese exhibits a center-embedding illusion analogous to the missing VP illusion reported for English. The implausibility manipulations in Experiment 1 further reveal an interesting difference: in a missing NP sentence, the two NPs that appear after the verbs get interpreted as objects and subjects of the verbs in a way that lacks a parallel in English missing VP sentences. These interpretations are also not predicted by existing accounts of the center-embedding illusion.
However, these remarks presume that our description of how missing VP sentences are interpreted is accurate. So that we have a stronger basis for comparing Mandarin with English, we ran a replication of Gibson & Thomas 1999. Our goal here was not to test their claims about structural forgetting. Rather, we were interested in using their manipulations to run an experiment parallel to Experiment 1: verifying whether English exhibits the missing VP illusion, and if so, which NPs serve as the arguments of which verb in a missing VP sentence.

Materials
Experimental materials were based on the 12 items listed on pp. 247-248 of Gibson & Thomas 1999. These were used to generate sentences for the four conditions in (3) Table 3 Acceptability ratings for Experiment 2, aggregated across both sets.
(1 = completely unacceptable, 7 = completely acceptable). Table 4 Mixed effects regression results. (baseline: baseline verbs chéngrèn "admit," fǒurèn "deny" and biǎoshì "say"). As noted previously, the missing VP conditions feature plausibility manipulations to confirm claims that speakers treat the third NP as the first VP's subject, while treating the first NP (but not the second NP) as the subject of the second VP. If so, one expects a plausibility violation in the missing VP1 condition (20b): the first VP had typed quickly would take the third NP the publishing company as an agent. A similar violation would occur in the missing VP3 condition (20d), where the second VP, had typed quickly, takes the first NP the novel as its agent.
The 12 test sentences were distributed across four lists in a Latin Square design, so that in each list were three sentences per condition, and no two sentences were variants of each other. Following Gibson & Thomas, test sentences were mixed with 48 filler sentences of comparable length and complexity, featuring various kinds of adjunct, relative, and complement clauses. 30 of these sentences were well-formed and acceptable and 18 were ill-formed and unacceptable. This ratio was chosen so that participants would see as many acceptable sentences as unacceptable sentences, on the assumption that all four experimental conditions are relatively unacceptable.
We also prepared 7 practice items. Three of these items were grammatical and comprehensible while the other four items were not. One of the four ungrammatical practice sentences was the following non-DCE sentence missing a VP, used by Gibson & Thomas as a practice sentence: The form was stamped by the bureaucrat who worked at the ministry where everyone who had walked strangely.

Participants
Participants were 32 US-based workers on Amazon Mechanical Turk. All 32 participants were self-identified native speakers of American English and had passed a native speaker proficiency test, which tested for knowledge of relatively subtle grammatical rules and constraints of American English. Participants received US$2.50.

Procedure
The procedure was largely the same as that of Experiment 1. We departed from Gibson & Thomas in our choice of scale. Gibson & Thomas used a 5-point scale, where "5" indicated "hard to understand" and "1" "easy to understand." To be consistent with Experiment 1, we used a 7-point scale and reversed it, so "1" was used for unacceptable sentences and "7" for acceptable ones.
For familiarization purposes, participants first judged the practice sentences. For the first three practice sentences, participants saw brief comments about each sentence and what kind of ratings to assign, e.g. a sentence should have received a low rating because it described an implausible scenario. The fourth practice sentence was Gibson & Thomas's non-DCE sentence that lacked a VP. Following Gibson & Thomas, the comment for that sentence indicated that they should give similar sentences a low rating, although it did not explain why.

Data analysis
To ensure that participants took the time to read a target sentence before judging it for acceptability, we excluded responses that were provided in 3 seconds or less upon the presentation of the sentence; doing so eliminated about 6% of responses.
As was the case for Experiment 1, ratings were analyzed with a cumulative link mixed effects model, with conditions as a fixed effect and random intercepts and slopes for both participant and items. Ratings for the missing VP2 condition were used as a baseline. Table 5 shows that in absolute terms, all four conditions received low to medium acceptability ratings. Of the three missing VP conditions, the missing VP2 condition was rated the most acceptable; this outcome was also observed by Gibson & Thomas and Frank & Ernst (2019).

Huang and Phillips
Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1292 Statistically, the missing VP2 condition was significantly more acceptable than the missing VP1 condition (p = 0.001) and the missing VP3 condition (p < 0.001) ( Table 6). Although the missing VP2 condition also received numerically higher ratings than the grammatical DCE condition, the difference is not significant (p = 0.357).

Discussion
Our results replicated Gibson & Thomas's key findings. The fact that ungrammatical missing VP2 sentences were not less acceptable than grammatical DCE sentences indicates a grammaticality illusion. We also confirmed their observations about missing VP sentences: the first VP has a subject in the third NP, while the second VP has a subject in the first NP but not the second NP. In other words, the second VP is thematically related to the first NP, but not the second. This result contrasts with the finding for Mandarin missing NP sentences in Experiment 1, where the second post-verb NP (the equivalent of the second VP in English) is thematically related to both the first and second verbs (the equivalent of the first and second NPs in English).
Experiment 3 also provides another baseline to verify if a grammatical parse is at all possible for Mandarin missing NP sentences. Specifically, if the plausible missing NP sentences in Experiment 1 are ungrammatical (but illusory), they should have the same acceptability profile as missing VP2 sentences, whose ungrammaticality is not in doubt.
Post-hoc analyses of these experiments, discussed in more detail in Appendix 1, bear out this prediction. For example, plausible missing NP sentences, like missing VP2 sentences, generally receive lower ratings compared to unambiguously grammatical filler items, which are structurally similar (Figures 5 and 6).
Ratings for plausible missing NP sentences also varied substantially within and across participants as well as items (Figures 7 and 8), which is expected if these sentences are ungrammatical but elicit the illusion probabilistically. The same variability was found in missing VP2 sentences (Figures 9 and 10). These parallels provide strong evidence that missing NP sentences are ungrammatical.

Condition
Mean acceptability rating (standard error)         To sum up, Experiments 1 and 3 show that ungrammatical DCE sentences with missing NPs and VPs in Mandarin and English exhibit a grammaticality illusion, even though these ungrammatical sentences are assigned very different thematic relations.
These findings have implications for cross-linguistic accounts of center-embedding illusions. Ideally, an account must be fine-grained enough to explain the differences in thematic relations between Mandarin and English. However, it should also capture a higher-level generalization, namely, that Mandarin and English speakers experience a grammaticality illusion, while German and Dutch speakers are less likely to do so. In the next section, we address the second issue on the cross-linguistic distribution of the illusion. In particular, we evaluate an approach that attributes the distribution to cross-linguistic differences in language experience.

Experiment 4
The language experience hypothesis was proposed to explain why not all languages exhibit a center-embedding illusion to the same degree. Under this hypothesis, the verb-final syntax of German and Dutch means that long dependencies between subjects and verbs and verb sequences occur relatively frequently. Exposure to these linguistic structures is argued to help German and Dutch speakers process DCE sentences more easily.
We argued that Mandarin presents a problem for this hypothesis. Because its noun phrases are consistently head-final, long dependencies between verbs and the head noun of objects are relatively frequent, as are sequences of nouns. This hypothesis therefore predicts that Mandarin should pattern like German and Dutch. 3 In this section, we make this argument more explicitly, using Futrell et al.'s (2020) noisy surprisal model (also Futrell & Levy 2017), an implementation of the language experience hypothesis. We adapt the model for Mandarin and show that it incorrectly predicts that Mandarin should pattern like German and Dutch.

The model
We chose Futrell et al.'s model over simple recurrent neural network (RNN) models of the illusion used by Christiansen, Vasishth, Frank, and colleagues for three reasons. First, as mentioned, Futrell et al.'s model explicitly derives the presence and absence of the illusion in English and German as a consequence of the fact that German relative clauses are always verbfinal, while English relative clauses are not. Second, the code is publicly available, which lets us reproduce their analysis faithfully. Third, as they argue, RNN models can be thought of as a "special case" of a lossy-context surprisal model (Futrell et al. 2020: 17-18 (Vasishth et al. 2010), it is reasonable to use it to model acceptability judgments. Frank & Ernst (2019) note that Dutch and German have similar verb-final syntax and DCE sentences in both languages have the same reading time profiles. They further point out that reading time differences between grammatical DCE and missing VP sentences in Dutch and English correlate with an acceptability difference.
Futrell et al.'s model uses a simple probabilistic grammar, simulating English and German as a set of intransitive sentences. The focus on intransitive sentences is presumably driven by the goal to model the parsing of a prototypical DCE sentence, in which the subject's head noun is separated from the verb by a relative clause containing another relative clause.

3
A reviewer asked if this prediction might simply follow from the type of relative clauses in DCE sentences, rather than from language experience: English DCE sentences feature object relatives; German and Dutch DCE sentences feature either object or subject relatives. Not unlike German and Dutch, Mandarin DCE sentences feature subject relatives. If subject relatives have a processing advantage, one might expect German, Dutch, and Mandarin DCE sentences to be easier to process and more acceptable. (Note that there is a debate on whether subject relatives have a processing advantage, especially for Chinese. See Hsiao & Gibson 2003;Lin & Bever 2006, 2011Vasishth et al. 2013;Jäger et al. 2015.) We are less certain about this possibility. Even if there is a genuine subject relative processing advantage, it is unclear whether this processing advantage would persist under double center-embedding, especially given the memory demands of double center-embedding. Huang and Phillips Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1292 In this grammar (Table 7), a noun is modified by a relative clause 25% of the time. For English, 20% of relative clauses are verb-final, reflecting the fact that verb-final object relative clauses are infrequent in naturalistic contexts. For German, all relative clauses are verb-final. With these parameters, the grammar generates sequences of syntactic categories (corresponding to sentences) and their probabilities. Noise is then applied to these sequences, so that each syntactic category is deleted with a probability of 20%, producing sequences of varying grammaticality. It is this noisier set of sequences that the model is exposed to.
We extend the model with a simple probabilistic grammar of Mandarin. Since the prototypical Mandarin DCE sentence features a transitive verb separated from its object's head noun by a doubly embedded relative clause, we have this grammar generate transitive sentences instead. Following Futrell et al., we assume that 20% of relative clauses are object relative clauses.
One might wonder whether these parameter values are realistic. With these values, the grammars for all three languages assign low, but non-zero, probabilities to grammatical DCE sentences before noise is applied, even though DCE sentences are probably effectively non-existent in naturalistic contexts. However, this model is arguably not intended to show how actual differences in language statistics can derive differences in how DCE sentences are parsed. Rather, it is a proof of concept, demonstrating how the parsing of DCE sentences can be conditioned by general word order differences, such as whether relative clauses are uniformly verb-final. These parameter values serve as simplifying assumptions necessary for the demonstration of this point.

Model results
We first ran the model for English and German/Dutch to replicate Futrell et al.'s results. Doing so also allows us to confirm that their choice of parameter values would be appropriate for Mandarin, for the sake of fairness. Having done that, we ran the model for Mandarin.
The English and German/Dutch models were presented with the DCE sequence NCNCNVV (N = noun, V = verb, C = relative pronoun). The model then calculated the probability that this sequence is followed by a V or an end-of-sentence symbol. To the extent that the English model generates the illusion, the surprisal of observing the end-of-sentence symbol should be lower than the surprisal for a third verb, even though the grammar generates DCE sentences. We predict the opposite contrast for German/Dutch. These predictions are borne out ( Table 8).
The Mandarin model was presented with the DCE sequence NVVVNCN (here, C = de), and it calculated the probability that this sequence would be followed by a C (de) or the end-ofsentence symbol. As Table 8 shows, the model predicts the absence of an illusion: the observation of C, the grammatical continuation, has a lower surprisal value.  We also ran a post-hoc analysis for Mandarin, varying the conditional probability of subject relative clauses. Doing so let us assess the alternative hypothesis that Mandarin exhibits the illusion because subject relative clauses are less common in the language; Hsiao & Gibson (2003) estimate the rate of subject relative clauses to be about 57.5% (although Vasishth et al. (2013) estimate it to be higher, around 73%). The model predicts that Mandarin only starts exhibiting the illusion when the probability of subject relative clauses falls below 20% (Table 9), which is unrealistically low. We therefore discount this alternative hypothesis.
An anonymous reviewer raised the possibility that the results might be an artifact of the deletion rate. To address this issue, we reran the model with different deletion rates. As Table 10 shows, when the deletion rate is low (e.g. below 0.3), the model predicts that Mandarin lacks the illusion, patterning like German/Dutch: grammatical continuations elicit a lower surprisal than ungrammatical continuations. When the deletion rate is higher, the model predicts that Mandarin starts exhibiting the illusion, like English.
One possible takeaway is that the deletion rate should have been higher. For instance, contrary to our current assumptions (and Futrell et al.'s), perhaps the deletion rate increases with processing load, so that the deletion rate is higher for DCE sentences than for simpler sentences. This is not an implausible scenario. However, it requires a richer theory of deletion and representations beyond what is now available. For instance, one would need to specify when in incremental processing and by how much the deletion rate starts to rise (cf. Gibson & Thomas 1999). More generally, there are also potential consequences regarding how missing VP/NP sentences are interpreted, since this operation ostensibly deletes words from a linguistic representation. As far as we can tell, though, these issues have not been spelled out, so it is unclear how we can explain existing findings. For these reasons, we will not adopt this alternative perspective here, but flag it for future research.

General discussion
To recap, Experiments 1 and 3 confirmed that Mandarin and English both exhibit a centerembedding illusion. However, both experiments also showed that the nouns and verbs of these sentences are thematically related in strikingly different ways. As shown in Figure 11, in English, the final VP is linked to only the first NP. In Mandarin, however, the final NP appears to be linked not only to the first verb, but also to the second and third verbs. P(subject relative clauses | relative clause) 80% (see Table 8 Table 9 Mandarin surprisal difference and probability of subject relative clauses. Despite this difference, there is a more abstract way in which both types of sentences are the same: there are thematic relations connecting all arguments and predicates; no NP or verb is "orphaned" thematically. The Mandarin and English results are thus consistent with the thematic relations hypothesis, which posits that the presence of such relations produces the sense of completeness that characterizes the missing VP/NP illusion. Experiment 4 further showed that the Mandarin illusion poses a challenge for accounts that explain cross-linguistic variation in the illusion by appealing to the relative frequency of verbfinal (or noun-final) structures in a language (Vasishth et al. 2010;Frank et al. 2016;Futrell et al. 2020). Here, we consider an alternative account, namely, the interference account of Bader (2016), exploring how it can be adapted to accommodate the Mandarin data.
To recap, Bader assumes that the parser builds different syntactic representations for stringequivalent German and English DCE sentences. The verb of a German (and Dutch) main clause appears in a verb-second position. In the formalism that Bader adopts, this verb (V3) is found in C, having moved from a VP position, as shown in (21a). The other two verbs, on the other hand, remain in their VPs. In contrast, English verbs all remain in their respective VPs (21b).

V1
Bader argues that this syntactic difference affects how the parser processes the second verb. In English, both the V2 and V3 slots are inside VPs, so the parser is likely to confuse them and misattach the verb to the V3 slot, producing the classic missing VP effect. Mis-attachment to the V3 slot is even likelier if this slot enjoys a primacy advantage over the V2 slot, as Häussler & Bader (2015) suggest. In contrast, the risk of mis-attachment is lower in German, because the two slots are structurally distinct -the V2 slot is inside a verb-final VP, while the V3 slot is inside C.
The above description of (mis)attachment can be implemented in terms of memory retrieval and interference (Lewis & Vasishth 2005;McElree et al. 2003;McElree 2000, etc.), with the assumption that retrieval cues can refer to the structural features of the attachment site, such as the node that immediately dominates the site (i.e. VP or C). Upon observing the second verb, the parser needs to retrieve the correct verb attachment site from memory. By hypothesis, it sets one of the retrieval cues to be [VP]. In English, since there are two VPs with empty V slots, the parser might incorrectly retrieve the main clause VP instead. In contrast, in German, there is only one VP with an empty V slot, so the parser is less likely to make a mistake.

Huang and Phillips
Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1292 The relevant syntactic facts for Mandarin resemble English more strongly than German: Mandarin does not syntactically distinguish between objects in main clauses and subordinate clauses (22). We suggest that this might explain why Mandarin and English exhibit the illusion. Ideally, when processing a Mandarin DCE sentence, the parser should attach the second postverb noun to the NP2 slot and not the NP3 slot in (22). However, both slots are structurally similar, being NP slots dominated by another NP node. The Mandarin parser is thus likely to confuse them and mis-attach the noun into the NP3 slot instead. However, as discussed in Section 3.3, this analysis only predicts that V1 and V3 have arguments; V2 might still be "orphaned" thematically. This is inconsistent with the results of Experiment 1, which showed that the second noun also gets interpreted as a theme argument of V2. A purely syntactic approach cannot capture these thematic relations. There is no well-formed double center-embedding syntactic representation where the second noun receives such an interpretation.
We suggest supplementing the analysis with the assumption that the parser tracks thematic relations throughout sentence processing and repairs them when necessary. In English, repair is likely to be unnecessary, since mis-attachment of the second verb does not result in any of the VPs or NPs becoming thematically orphaned. In contrast, in Mandarin, mis-attachment causes V2 to be thematically orphaned. To repair this problem, the parser proceeds to link V2 to NP2 (Figure 12). Successful repair therefore ensures that there are thematic relations between all arguments and predicates, producing the same illusion of completeness.
Since there is no way to derive these thematic relations syntactically, it is unlikely that the parser is repairing a syntactic representation like the one in (22). These thematic relations therefore implicate the presence of a second representation, presumably a non-syntactic one. This is consistent with a suggestion of ours in Section 2.2. There, we argued that although the parser might build a fully articulated syntactic representation of a DCE sentence, for memory load reasons, it might instead use a shallower representation to directly track thematic relations in the sentence. Such a representation could also make repairs easier to implement.
In principle, this repair scenario predicts differences in how quickly speakers detect implausibility in the various missing NP conditions in Experiment 1, where verbs were chosen so that NP1  and NP2 would be inappropriate arguments. More specifically, the repair scenario predicts that speakers might take longer to notice that NP2 is an inappropriate argument for V2, since the relation between NP2 and V2 is built late. This prediction, being a claim about timing, is difficult to verify with the offline acceptability judgment ratings collected in Experiment 1. We leave it to future work to test this prediction using methods that yield better temporal resolution, such a self-paced reading paradigm.

Conclusion
We demonstrated that Mandarin speakers experience a missing NP illusion when processing sentences with double center-embedding, analogous to the better-studied missing VP illusion in languages like English (Gibson & Thomas 1999). We noted that the interpretations of missing NP sentences do not follow easily from structural forgetting accounts like Gibson & Thomas's and interference accounts like Häussler and Bader's. Mandarin also presents a challenge for language experience hypotheses, which are intended to account for cross-linguistic variation in this illusion (Vasishth et al. 2010;Frank et al. 2016;Futrell & Levy 2017;Frank & Ernst 2019;Futrell et al. 2020). These hypotheses predict that Mandarin should lack the illusion.
Our experiment results suggested that the exceptional acceptability of missing VP and NP sentences is likely because speakers were able to build thematic relations between all arguments and predicates denoted by the NPs and VPs in these sentences. In light of our modeling results showing that the language experience hypothesis predicts the absence of the missing NP illusion, we discussed adapting an alternative interference hypothesis of Bader (2016). We suggested that the interference approach, supplemented with a repair mechanism, can better explain why such the illusion surfaces in languages like English and Mandarin, but not in languages like German and Dutch. Abbreviations cl: classifier; de: the particle de; exp: experiential aspect (-guo); pfv: perfective aspect (-le); prog: progressive aspect

Ethics and consent
Experiments were approved by the University of Maryland Institutional Review Board. Informed consent was obtained from all experiment participants.