1 The emergent–generative debate
Linguists are currently debating the nature of our grammatical knowledge. Many have argued that such knowledge is innate and that we use this knowledge to generate utterances; others have argued that grammatical knowledge emerges from linguistic experience, and that such knowledge forms exemplars for modeling novel utterances. Generative grammar advocates emphasize the way that utterances are generated by combining individual words as needed (Hauser et al. 2002; Newmeyer 2010, and many others). In contrast, emergent grammar advocates emphasize that prefabricated sequences of abstract and concrete linguistic forms are recycled (Thompson 2002; Ellis 2008; Beckner et al. 2009; Bybee 2010; Dąbrowska 2014, and many others; see MacWhinney & O’Grady 2015 and references therein).
Consider Dąbrowska’s (2010) example (9a), repeated here as (1a). This sentence illustrates the long-distance dependency between the wh word in the main clause and the trace in the object position for the verb do in the subordinate clause. Dąbrowska points out that word choice in sentences with long-distance dependencies in natural conversation is constrained; that is, the main clause auxiliary in such sentences is do over 90% of the time, the subject is you about 90% of the time, and the verb is think or say about 90% of the time. Dąbrowska gave several more examples that illustrated these tendencies, and we have repeated two of them as (1b) and (1c). She concludes that utterances with long-distance dependencies followed specific templates most of the time, and emerged frequently because “they are psychologically more basic than non-prototypical ones… prototypical questions are produced more fluently, judged to be more acceptable, remembered better and acquired earlier by children” (Dąbrowska 2010: 63).
|a.||What do you think you are doing?|
|b.||Who do you think you are?|
|c.||What do you think that means?|
Such skewed distributions occur in other patterns. For example, Hopper (2015: 320) reported that in pseudo-cleft constructions such as what I have to do is…, the verb was either do or happen a majority of the time. Based on such observations, a new model of grammar has emerged that emphasizes the reuse of prefabricated speech fragments (Thompson 2002; Ellis 2008; Bybee 2010; Dąbrowska 2014; 2015; Ellis et al. 2015). The main idea of this approach is summarized concisely as “grammar may best be understood as combinations of reusable fragments” (Thompson 2002: 141).
But others disagree. Newmeyer (2010: 15, emphasis in original) stated, “I have no problem with the idea that many of the more commonly-used phrases are stored in memory, but the idea that a grammar might be a stock of fragments strikes me as utterly implausible.”
Yet, the evidence for emergent grammar continues to accumulate. After reviewing universal grammar examples in language acquisition, Ibbotson & Tomasello (2016: 74) concluded that “the notion of universal grammar is plain wrong”; rather, it was felt that “[i]n its place, research on usage-based linguistics can provide a path forward for empirical studies of learning, use, and historical development” (Ibbotson & Tomasello 2016: 75).
At the same time, the evidence for generative grammar continues to accumulate. A recent neurophysiological brain activity study (Nelson et al. 2017) claims to have demonstrated the neurophysiological reality of phrase structure. In the authors’ opinion, their results “strongly motivate the view of language as being hierarchically structured and make it increasingly implausible to maintain the opposite nonhierarchical view” (Nelson et al. 2017: E3676).
And the debate continues.
Thompson, Newmeyer, and many others, agree that commonly-used fragments of speech are stored in memory, and that those fragments are reused during language production. What they do not agree on is the importance of such phrases. The proponents of the emergent grammar approach argue that such phrases, along with more abstract templates derived from those phrases—which are collectively called exemplars—are the end result of the acquisition of grammar knowledge. Through cumulative language experiences, we track and store the relative importance of each exemplar, perhaps as knowledge reflected by the exemplar’s distinctiveness (Adelman et al. 2006) or relative weight (McClelland 2015). Exemplar knowledge and the respective knowledge of distributions form our grammar knowledge.
In contrast, proponents of the generative grammar approach argue that grammatical knowledge is derived from Universal Grammar and an associated set of parameters. They emphasize the role of syntactic structure—particularly its hierarchical nature—during language processing.
Exemplar grammar models explain many linguistic usage patterns; for example, more predictable words, phrases, and patterns are shorter (Bell et al. 2009; Seyfarth 2014), articulated less clearly (Bell et al. 2003), and show increased omission of optional functional words such as the complementizer that (Jaeger 2010). On the other hand, exemplar models struggle to explain grammatical knowledge that is clearly not based on previous language experience, such as grammatical innovations by children during language acquisition which do not have target language exemplars (Singleton & Newport 2004; Culbertson & Newport 2015).
Still others take a third approach that attempts to strike a balance between a completely generative approach and a completely usage-based approach. A characteristic example is the processing determinism approach by O’Grady (2008; 2015). This approach emphasizes the role of processing pressures during language formation, and “assigns a significantly smaller role to the input than is common in usage-based theories and no role at all to Universal Grammar” (O’Grady 2015: 8). O’Grady argues that there are two sources that shape language acquisition. The first source is external factors that arise from the language environment, such as frequency of occurrence. The second source is “internal pressures that stem from the burden that particular computational operations place on working memory” (O’Grady 2015: 9). Note that the computational operations are not necessarily innate by design, and O’Grady argues against the need for faculty-specific principles that cannot be learned from experience. What differentiates this approach from an emergent grammar approach is the role of experience-based knowledge: in the words of O’Grady (2008: 623), such knowledge “does in fact have a very important role to play in understanding how language works—but only if we acknowledge that its effects are modulated by the efficiency-driven processor that is at the heart of the language faculty.” It is this style of approach that we support with this work.
In summary, evidence supports both sides of the emergent–generative grammar debate, with many researchers arguing in favor of one approach. Still others argue for an approach that combines elements of the emergent approach with elements of the generative approach. Our goal is to contribute to this debate by supporting the third approach. We demonstrate that both emergent knowledge and generative knowledge play a role in speech processing, and more importantly, that these two types of knowledge compete. For that purpose, we present a corpus study of the variable omission of object case markers in spoken Japanese.
Standard Japanese obligatorily marks the relationship between the noun and the verb with a case marker immediately following the noun phrase, as in examples (2a) and (3a). However, in casual spoken Japanese, the case marker is variably omitted without any change in meaning, as in examples (2b) and (3b).
- ‘(I) ate lunch.’
- ‘(I) went to school.’
Previous work on variable case marker omission in Japanese has focused on either the accusative case or the nominative case (Fujii & Ono 2000; Matsuda 2000; Fry 2003; Yoshizumi 2016, inter alia). In this study, however, as the focus is on object–verb pairs, we do not examine the nominative case. However, we expand previous work by extending the scope of our study to the dative case, as seen in the examples in (3).
We proceed as follows. This study was motivated by Ullman’s (2004; 2008; 2016) declarative/procedural model of language. Therefore, we begin with a review of it. We then look more closely at Japanese case marker omissions, and present specific hypotheses. We subsequently present our corpus study on Japanese case marker omission, after which we discuss the implications of our results on the emergent–generative grammar debate.
2 The dual route model of language processing
Michael Ullman, a neurobiologist who researches language and memory, argues that language processing depends primarily on the two memory systems in the brain: declarative memory and procedural memory (2004; 2008; 2016). Briefly, declarative memory underlies the rapid learning and linking of different bits of knowledge across multiple contexts and modalities, such as the flavor of strawberries with the word strawberry. This knowledge is time- and context-specific. Declarative knowledge is learned rapidly, after as little as a single exposure, but strengthens with additional exposures. Declarative memory plays a crucial role in the acquisition of words, including their phonological forms, meanings, and subcategorization patterns, as well as relationships between words. Declarative memory also stores our knowledge of multi-word phrases, idioms, collocations, and irregular inflectional patterns, such as their frequency and patterns of use. Furthermore, declarative memory is capable of acquiring and abstracting such patterns into schemas such as those posited by construction grammar (Hoffman & Trousdale 2013).
On the other hand, procedural memory underlies the learning and processing of implicit sequences and rules. These rules are most likely probabilistic in nature, rather than deterministic. Procedural memory plays a key role in hierarchical and probabilistic sequencing across all linguistic subdomains, from syntax to phonology. Procedural memory also facilitates structural processing with a mechanism for predicting.
These two systems both overlap and compete. To some extent, both memory systems acquire the same or analogous knowledge. Thus, declarative memory may contain the past tense form walked while procedural memory contains the rule-based computation to derive the corresponding form walk + ed. During processing, both forms are accessed in parallel, with the faster form winning out: “Access to a stored representation which has similar mappings to one which could be composed compositionally… would block completion of the latter computation” (Ullman 2004: 247).
The concept of blocking may turn out to be exaggerated. Nevertheless, the concept of parallel processing routes working at different speeds has been supported by recent work in morphology (Schmidtke et al. 2017; Lõo et al. 2018), and it is this aspect of Ullman’s model that we test in this paper.
Schmidtke et al. (2017) conducted a series of lexical decision and eye movement experiments on English and Dutch morphologically derived (teach + er), pseudo derived (corn + er), and simple words. They measured a variety of frequency, morphological, orthographic, and semantic variables, and then used a statistical technique called survival analysis to determine the earliest point in time at which a given variable exerts an influence over their response variables (judgment reaction times and eye fixation durations). They found that in general the first variables to begin to show effects were frequency, followed by semantic, then finally the morphological and orthographic. Their results show that speakers make use of frequency information long before they start making use of morphological information. They point out that their results support “dual- or multi-route theories of morphological processing, which argue for a parallel and interactive use of properties associated with whole words and their morphemes” (Schmidtke et al. 2017: 18).
Lõo et al. (2018) conducted a large-scaled word naming study of Estonian nouns. They measured the delay between stimulus presentation and beginning of articulation (production latency). They placed particular emphasis on three factors: whole-word frequency, inflectional paradigm size, and morphology family size. They broke up the production latency data into quantiles, so that the fastest 10% of the responses formed the first quantile, the next fastest 10% the second quantile, etc. A quantile regression analysis then showed at what point each of these three factors had the strongest effect. They found that the effect of whole-word frequency peaked in strength at the fourth quantile, followed by morphological family size in the sixth quantile, and then finally inflectional paradigm size in the eighth quantile. Their results show that whole-word frequency has its strongest effect when words are processed relatively rapidly, whereas inflectional paradigm size has its strongest effect on words that are processed relatively slowly.
The results from both of these studies mesh neatly with Ullman’s claim for parallel processing routes. Furthermore, it seems that declarative knowledge (frequency and semantic information) is fast whereas procedural knowledge (inflectional paradigm patterns) is slow. If declarative and procedural knowledge compete, then we expect to see an interaction during language production. Specifically, we expect that the more a linguistic form relies on declarative knowledge, the less of a role procedural knowledge plays. Since declarative knowledge is fast, if it suffices then it will do so quickly. If it does not prove to be enough, then and only then will the slower procedural knowledge fully engage. In the rest of this article, we test this expectation by looking for such an interaction in spoken Japanese.
Many others similarly argue for the engagement of both types of knowledge, from a range of research areas such as phonetics (Pierrehumbert 2016), phonology (Guy 2014), derivational and inflectional morphology (Baayen et al. 1997; Vannest et al. 2005; Kuperman et al. 2009; Bakker et al. 2013), theoretical syntax (Reuland 2010), first language acquisition (O’Grady 2015), language evolution (Reuland 2010; Miyagawa 2017), and formulaic language (van Lancker Sidtis 2012). Some of these researchers propose models that a quite similar to the declarative/procedural model. For example, in spite of the notably different terminology, O’Grady’s external factors and internal factors resemble declarative knowledge and procedural knowledge.
2.1 Dispersion as a proxy of declarative knowledge
One of the most consistent and robust effects found in psycholinguistics studies has been the frequency effect: words and phrases that occur more frequently are processed faster, as shown by word judgment reaction times (Forster & Chambers 1973; Ellis 2002; Brysbaert et al. 2018). Each time a linguistic form is heard, it is imprinted in declarative memory, and the more often the imprinting occurs, the easier it is to recall that word or phrase.
However, several researchers have argued that the correlation between lexical frequency and the processing speed of a word is confounded by dispersion (McDonald & Shillcock 2001; Adelman et al. 2006; Perea et al. 2013). Dispersion refers to the number of different contexts that word is seen in. In practice, context tends to be defined as a single block of text or speech, such as a novel, textbook, or conference presentation, in which the target word appears one or more times. Research on memory has shown that the memory benefit from repeated exposure to a stimulus diminishes unless the context of the stimulus is changed (e.g., Verkoeijen et al. 2004). Work on dispersion has repeatedly shown that it is more predictive of word judgment reaction times than word frequency. This effect remained even after considering the effect of the covariates such as ambiguity, word length, and orthographic neighborhood size.
Since more dispersedly-used words are processed faster, we take dispersion as a proxy measure of the cost to process in declarative memory. Declarative memory has the capacity to store and process multiword phrases (Bannard & Matthews 2008; Tremblay et al. 2011). With each such processing, the cost of doing so by declarative memory reduces. If such a phrase is processed frequently, then the cost of processing in declarative memory becomes lower than the cost of processing in procedural memory. In extreme cases, multipart expressions phonetically and cognitively merge together to form single units (Bybee & Scheibman 1999; see Heffernan & Sato 2017 for Japanese examples). We expect the most dispersed object–verb pairs to show the least evidence of processing in procedural memory.
2.2 Grammatical complexity as a proxy of procedural knowledge
We indirectly observe processing by procedural memory by building on the observation that “sentences that have more complex syntactic structures are more difficult and time consuming to understand” (Caplan & Waters 1999: 79; see also Scontras et al. 2015). Several theories have been posited to account for the increased processing costs of complex sentences. We espouse the Dependency Locality Theory (Gibson 2000). This theory equates differences in sentence comprehension performances with differences in the cost associated with integrating the syntactic structure of an incoming word into the current structure held in memory. For our purposes, it is sufficient to note that the cost is related to the size of the structure held in memory. Gibson measures integration costs in arbitrary energy units, with one unit equal to one discourse referent. As he was primarily concerned with the different costs incurred when comprehending subject-gap relative clauses and object-gap relative clauses, his theory does not assign costs to non-discourse referents such as determiners and adjectives. Yet determiners and adjectives must also be held in memory before integration. Our preliminary analysis of our data (see Table 1) by modifying constituents showed a clear difference between bare nouns, nouns modified by determiners, and nouns modified by more complex structures such as relative clauses. For the sake of simplicity, in this paper we equate the complexity of a noun phrase with the number of constituents modifying the projection of the head. Thus, in the phrase the flower, one constituent (the DP) modifies the head, giving the phrase a complexity value of one. In comparison, the bare noun flower has a complexity value of zero, this flower has a complexity value of one, and freshly picked flowers has a complexity of two. We give specific Japanese examples in Section 4.
|Modifier type||Token counts||Omission rate||Reclassification scheme|
Our primary hypothesis is that declarative knowledge and procedural knowledge compete during language processing. We are taking dispersion as a proxy of declarative knowledge. Similarly, we take the number of modifying constituents as a proxy of procedural memory. In other words, we hypothesize that dispersion and number of modifying constituents interact. We elaborate how once we have introduced previous work on case particle omission in Japanese.
3 Japanese case marker omission
In this section, we first review four recent studies of variable Japanese case marker omission. We then present our specific case marker omission hypotheses.
3.1 Previous work on case marker omissions in Japanese
Fujii & Ono (2000) examined accusative case marker omissions in 40 minutes of speech made up of seven short conversations. They found that speakers tended to omit accusative case markers in the following situations:
- In idiomatic expressions, such as me no iro kaeru ‘get serious; lit., change eye color’
- With direct objects that were demonstratives (kore ‘this’), indefinite pronouns (nanka ‘something’), and interrogative pronouns (dare ‘who’)
- With direct objects that occurred adjacent to the verb
- With direct objects with specific referents in the conversation.
From these observations, they concluded that first, speakers used the accusative case marker as a rhetorical device to draw attention to important information, such as the discourse topic, something newsworthy, or part of a contrast, and second, speakers used the accusative case marker to “facilitate the processing of information that may require some cognitive effort on the part of the listener” (Fujii & Ono 2000: 28). We build on this second conclusion in this study.
Matsuda (2000) investigated variable accusative case marking in 37 conversations between Tokyo Japanese speakers using variable rule analysis. Similar to Fujii & Ono, Matsuda found the strongest effect for verb adjacency. He also found a significant effect for object type, with objects showing greater case marker omission in the following order: nouns modified by relative clauses < non-interrogative pronouns < other nouns < interrogative pronouns. Matsuda also investigated the effect of an object having a previous referent in the conversation, but did not find a significant effect.
Fry (2003) investigated ellipses and case marker omissions in short excerpts from 120 telephone conversations between Japanese residing in the United States and close acquaintances residing in Japan, and found that accusative case marker omissions significantly increased with interrogative pronouns, shorter utterance lengths, and objects adjacent to the verb; however, speaker gender, dialect, word length, and object animacy were not found to be significant.
Yoshizumi (2016) examined case marker omissions for subjects and direct objects in conversational interviews with 16 heritage Japanese speakers and 16 native Kansai Japanese speakers. Using variable rule analyses, she found that a focus particle and a direct object that occurred adjacent to the verb significantly correlated with omissions; however, gender, age, the presence of sentence-final particles, and whether the clause contained an object–verb pair in a subordinate or a main clause were not found to be significant.
These four studies all focused on accusative case markers. We build on these studies by expanding the scope of our investigation to also include dative case markers. As introduced in the example sentences in (3), speakers also variably omit the dative case marker in conversation.
3.2 Our case marker omission hypotheses
Our hypotheses are grounded on the claim that case marker omission in spoken Japanese correlates with processing effort. Previous work on case particle omissions in Japanese supports this assumption: case markers tend to be omitted when the object is closer to the verb, when the object is not modified by a relative clause, and in shorter utterances. This assumption is also supported by Fedzechkina et al. (2017), a miniature artificial language study on case marker omission. They contrasted the learners acquiring an artificial language with fixed order against one with variable order, from which it was found the first group omitted case significantly more frequently. Fedzechkina et al. argue that their results reflect the learners’ attempts to balance effort with information transmission. Interestingly, the learners consistently produced case markers more often in OSV sentences than in SOV sentences. The authors posited several possible explanations for this imbalance, but needless to say all their explanations involved the role of case markers in reducing cognitive processing effort.
Our primary hypotheses are as follows. Our first hypothesis builds on the observation that phrases that occur more frequently (that is, more dispersed) are processed faster, which we take as an indication of less cognitive effort. Similarly, case particles tend to be omitted from utterances that require less cognitive effort. Therefore, we hypothesize that increased case marker omission correlates with increased dispersion.
Second, given that more syntactically complex constituents require more processing time, we hypothesize that increased case marker omission correlates with an increase in the number of constituents modifying the object of the object–verb pair. Matsuda’s (2000) results support this claim.
Third, we hypothesize that the above two correlations interact. If, as argued by Ullman, rapid processing by declarative memory preempts processing by procedural memory, then we should find a stronger correlation with number of modifier constituents for object–verb pairs that have low dispersion values. First consider object–verb pairs with high dispersion values. A high dispersion value indicates that an expression is used widely in a variety of contexts. Such phrases are presumably processed rapidly by declarative memory. During processing, the faster declarative memory processing wins out, and the slower procedural processing is aborted. Since number of modifying constituents is our proxy measure of processing in procedural memory, and procedural processing does not complete in these cases, then the correlation between case marker omission and number of modifying constituents should be weak.
Now consider object–verb pairs with low dispersion values. In these cases, declarative memory processing is relatively slow, and therefore the faster procedural memory wins out. If procedural memory is sensitive to syntactic complexity, then we should see a strong correlation between the number of modifying constituents and case marker omission for these cases.
Besides dispersion and number of modifying constituents, we also consider the following factors: speaker gender, speaker age, and speech style. Previous work on accusative case marker omissions did not find gender to be a significant factor, and we also predict that gender does not correlate with case marker omissions. However, we still include this factor for thoroughness. Similarly, we predict that age does not correlate with case marker omissions.
Given that case marker omissions occur in casual speech (Tsutsui 1984), we predict that case marker omissions correlate with speech style; specifically, speakers who use a more vernacular style omit case more often.
As mentioned, our study includes not only transitive verbs, but also unaccusative and unergative verbs. Due to the lack of studies on variable case marker omission for verbs other than transitive verbs, this aspect of our study is exploratory in nature, and it is difficult to a priori hypothesize about case marker omission patterns in other verb types.
For our data, we used the Corpus of Kansai Vernacular Japanese, which consists of 150 sociolinguistic interviews conducted by university students attending a private university in the Kansai region of Japan. The interviewees were either family members or close acquaintances of the interviewers. Both the interviewers and the interviewees were Kansai Japanese native speakers. Interviewers were instructed to speak in a casual manner using the local Japanese vernacular. They talked about a wide range of freely chosen topics such as school life, dating, work, family, history, and tragic events, for approximately one hour. Each interview was transcribed, checked for accuracy, parsed at the morpheme level, and tagged with part of speech information using MeCab (Kyoto University Graduate School of Informatics & Nihon Telegraph and Telephone Corporation 2013). Collectively, the interviews comprise 1.71 million lines of data (morphemes plus punctuation), of which 1.06 million lines were produced by the interviewees. We only examine the interviewees’ speech (hereafter, speakers). Table 2 shows the age and gender distribution of the speakers.
|High school student||University student||Young adult (25 ~ 39 yrs)||Middle-aged adult (40 ~ 59 yrs)||Elderly adult (60 + years)|
Our initial investigation revealed a highly skewed distribution of the linguistic contexts. For example, bare nouns occurred over one hundred times more frequently than nouns modified by na-adjectives (see below for examples of these categories). To maximize the volume of rare linguistic categories such as nouns modified by na-adjectives, we extracted all applicable data using a semi-automated process. Every token that met the criteria described below was first extracted and classified with a Python programming language script. We then hand checked three hundred randomly selected tokens, and modified the script to account for consistent errors. We repeated this process until the data was as error free as possible.
4.1 Extraction criteria
For practical reasons, we limited the study to the most frequent verbs only. We classified the 500 most frequently used verbs by verb type (transitive, unaccusative, or unergative), excluding any verbs not of these types, such as verbs of existence. We limited the list to transitive verbs that naturally occur with direct objects marked with the accusative case marker o, and unaccusative verbs and unergative verbs that naturally occur with indirect objects marked with the dative case marker ni. For example, since the stative transitive verb wakaru ‘understand’ does not occur naturally with the accusative case marker o, we omitted it.
In line with previous research on case marker omission (Matsuda 2000; Yoshizumi 2016), we omitted Sino-Japanese verbal nouns such as benkyō ‘study’ when they occurred with the verb suru ‘do, play.’ We included other nouns with suru, such as tenisu ‘tennis.’ We controlled for the distance between the object and the verb by only extracting the objects immediately adjacent to the verb. Finally, we did not include objects modified by focus particles.
We extracted nouns that immediately preceded a verb on our list of frequent verbs. We excluded numerals and adverbial nouns such as toki ‘time, when’ and asa ‘morning.’ We also excluded pronouns and proper nouns, as these words tend to not be modified by relative clauses. We treated compound nouns such as borantia-kei ‘activities such as volunteering’ as single nouns.
We coded each token by verb category: transitive, unergative, or unaccusative. Three highly frequent verbs accounted for almost 35% of the tokens: suru ‘do,’ iku ‘go,’ and naru ‘become.’ We coded each of these verbs separately. Altogether, we found 12,792 object–verb pairs formed from 4,799 nouns and 500 verb lemmas, or 7,721 unique object–verb pairs. The average case particle omission rate is 70.5%. Table 3 summarizes our data by verb category.
|Verb category||Token counts||Omission rates|
4.2 Token classification
For our measure of dispersion, we used the DP measure derived by Stefan Th. Gries (2008). See Gries (2008) for advantages over other alternatives, and details of the calculation. We briefly summarize the calculation procedure here. We expect each speaker to produce an equal proportion of the number of occurrences of a specific expression. Thus, if we have data from ten speakers, then we expect each speaker to produce one tenth of the tokens for any given expression. We then calculate the observed proportion of tokens produced by each speaker. If two out of ten speakers produced half of the tokens each, then the observed proportion for each of those two speakers is 0.5, and 0 for the rest of the speakers. We next sum the absolute values of the difference between the expected proportion and the observed proportions. Finally, we divide the sum by 2. This procedure produces a value that theoretically ranges from zero to one, with zero representing perfect dispersion (i.e., every speaker produced exactly the expected proportion of tokens), to approximately one.
We calculated DP values for each of the 7,721 object–verb pairs based on the number of speakers that used that object–verb pair during their interview. Our highest value is 0.993 for an object–verb pair used by only one speaker in our corpus. Our values ranged from 0.560 to 0.913. Table 4 lists the fifteen most dispersed object–pairs in our data. The DP scores are extremely skewed: over half of our tokens (N = 7,179) have the highest value (i.e., they were only used by one speaker). In order to compensate for the extreme sparsity of data in the lower range, we grouped the data into four levels of dispersion, with one representing low dispersion and four representing high dispersion. We give the specific DP value ranges for each dispersion level along with token and type counts in Table 5. Table 5 also lists the average case marker omission rate by dispersion level. The case marker omission rate increases by roughly 1% with each increase in dispersion level. This result supports our first hypothesis, which is increased dispersion correlates with increased case marker omission.
|Expression||Number of speakers||DP value|
|toko-ni iku ‘go to a place (that is…)’||66||0.560|
|gakkō-ni iku ‘go to school’||42||0.720|
|gohan-o taberu ‘eat a meal’||37||0.753|
|ie-ni kaeru ‘return home’||36||0.760|
|daigaku-ni iku ‘go to university’||34||0.773|
|kanji-ni naru ‘have a feeling’||33||0.780|
|ryokō-ni iku ‘go on a trip’||33||0.780|
|hanashi-o kiku ‘listen to a story’||31||0.793|
|ki-ni naru ‘be concerned’||25||0.833|
|daigaku-ni hairu ‘enter university’||23||0.847|
|hanashi-ni naru ‘become that way’||23||0.847|
|baito-o suru ‘do part-time work’||22||0.853|
|ki-o tsukeru ‘take care’||21||0.860|
|terebi-o miru ‘watch TV’||20||0.867|
|Tōkyō-ni iku ‘go to Tokyo’||20||0.867|
|Dispersion level||Token counts||Type counts||Omission rate||DP range|
|2||2,289||856||70.9%||0.980 ~ 0.987|
|3||1,943||268||73.2%||0.920 ~ 0.973|
|4||1,381||46||74.1%||0.560 ~ 0.913|
4.2.2 Syntactic complexity
We initially coded the object noun by the constituent type it was modified by, which was either a determiner (kono ‘this’), an i-adjective1 (takai ‘expensive’), a na-adjective (kirei-na ‘pretty’), a noun phrase with a genitive case marker (sensei-no ‘teacher’s’), a relative clause, or a na-adjectival phrase (sensei-no-yō-na ‘like the teacher’).2 When the preceding constituent did not directly modify the object, then we coded the object as bare. The examples given in (4), taken from the corpus, illustrate each of these categories. The relevant noun is highlighted.
- ‘First, (you) have to dig a ditch.’
- that person.ACC
- ‘The person who scouted her is amazing, don’t you think?’
- ‘(I) want to drink, like, hot coffee.’
- ‘My Japanese has become strange.’
- genitive noun phrase
- senior students.GEN
- shashin Ø
- ‘collecting pictures of the senior students’
- relative clause
- bōru Ø
- the case that.NEG
- ‘It is not like baseball, where you hit a moving ball.’
- na-adjectival phrase
- Hito Ø
- ‘It has become so that (the company) no longer needs employees.’
Initial analysis of the case marker omission rates (Table 1) shows that we can simplify this classification scheme. If we consider the rates of case marker omission (the second column of Table 1), then we can reclassify the data into three groups corresponding to zero, one, or more than one modifying constituent. We therefore reclassified the tokens by the number of modifying constituents as either ZERO, ONE, or TWO or more (the third column of Table 1). From Table 1 we see that case particle omission rate decreases as the number of modifying constituents increases. This result supports our second hypothesis.
4.2.3 Other factors
We classified each token for the following social characteristics: age group (five groups, see Table 2), gender (male or female), and speech style index (as described in Heffernan & Hiratuka 2017). Briefly, the speech style index is a value that theoretically ranges from zero to one and which indicates the degree to which the speaker used Standard Japanese during the interview (with a higher value indicating a greater use of Standard Japanese). The value was derived by combining the following seven measures:
- Proportion of standard versus regional copula variants (for example, da vs. ya ‘be’);
- Proportion of standard versus regional verbal negative suffixes (for example, tabe-nai vs. tabe-hen ‘not eat’);
- Proportion of standard versus regional verbs of existence (iru vs. oru ‘be’);
- Proportion of nasalization of a verb final -ru (for example, taberu vs. taben ‘eat’);
- Proportion of non-regional versus regional sentence-final particles (for example, yo vs. de);
- Proportion of non-regional versus regional adverbial intensifiers (for example, erai vs. kekko ‘very’);
- Proportion of standard ii ‘good’ versus regional variant ee ‘good.’
Each of these seven variables was scaled between zero and one, with each speaker’s average indicating the speech style index for that speaker.
We coded whether or not the object had been previously mentioned, as Fujii & Ono (2000) found that case marker omissions tended to occur with objects that had direct referents in the conversation.
Finally, if we observed a case marker directly following the noun then we coded the case marker as INCLUDED, otherwise as OMITTED.
5 Mixed-effects analysis
Our objective of this section is twofold. First, we want to confirm the correlations between case marker omission rate, dispersion, and number of modifying constituents, while taking into consideration other covariates such as speech style. Second, we want to test our third hypothesis, which was that declarative knowledge and procedural knowledge interact. For this purpose, we conducted a mixed-effects analysis with a logit link using the lme4 package in the R environment (R Core Team 2015). Mixed-effects modeling is ideal for linguistic data because such models consider interspeaker variability when determining the factor weights (Johnson 2009; Gries 2015).
We conducted our analysis following the step-up strategy outlined in West et al. (2014: section 2.7.2). This approach begins with a model that has a fixed intercept as the only fixed parameter, with the intercepts for SPEAKER and WORD (object–verb pair) added as random effects. From there we added each of the other fixed effect covariates, beginning with those factors that had shown an obvious effect in our preliminary data analysis. After each addition beyond the initial model with just the random effects, we conducted a likelihood ratio test comparing the new model with the old model. If the two models were found to be significantly different, then we retained the added factor; otherwise, we removed it. We repeated this process until we had either added or rejected each of the fixed effect covariates.
Once we had our initial model with random and fixed effects, we then tested for interactions between the fixed effects. Again, we added each interaction term one at a time, and compared the revised model with the new model. We present the final model as Table 6. Note that GENDER, AGE, and all the interactions except one were not chosen as significant.
|Num. modifying constituents||0.777||0.083||9.365||<2.00e-16|
|Verb = unergative||0.418||0.118||3.542||0.0004|
|Verb = transitive||–0.966||0.095||–10.141||<2.00e-16|
|Verb = iku ‘go’||–0.073||0.118||–0.625||0.5323|
|Verb = naru ‘become’||2.282||0.135||16.852||<2.00e-16|
|Verb = suru ‘do’||–1.935||0.156||–12.392||<2.00e-16|
|Previously mentioned = yes||–0.147||0.056||–2.646||0.0081|
|Dispersion × Num. modifying constituents||–0.119||0.041||–2.915||0.0036|
The tables have two important information sources that allow for easy comparison between covariates. The first is the relative magnitude and sign of the parameter estimates, with a positive sign indicating a positive correlation. The second information source is the magnitude of the p value, with a smaller p value indicating a stronger correlation between case marker omissions and the factor. Comparing the covariates in this way confirms the correlations seen in Tables 1 and 5: increased case particle omission correlates with fewer modifying constituents and increased dispersion. As expected, SPEECH STYLE shows a strong correlation with case particle omission, as well as whether or not the object was previously mentioned. These results agree with the previous research on case particle omission in Japanese.
Crucially, the interaction between dispersion and the number of modifying constituents reached statistical significance, supporting our third hypothesis, that declarative and procedural memory compete with each other during language processing. In order to better understand this interaction, we plotted the interaction using the interplot package in R (Solt & Hu 2015) as Figure 1. The interplot package plots the changes in the coefficient of one variable in an interaction conditional on the value of the other variable. The plot also includes simulated 95% confidence intervals of these coefficients. Figure 1 shows the changes in the estimated coefficient for the number of modifying constituents for each level of dispersion. As the dispersion of the object–verb pair increases, the estimated coefficient decreases, indicating a gradual weakening of the impact of the number of constituents on case particle omission.
The 95% confidence interval also appears to decrease in size as dispersion decreases. However, this change simply reflects the lower number of tokens in the more dispersed subsets (confidence interval size is proportional to the inverse of the square root of the number of tokens). This result is consistent with Ullman’s claim that declarative memory and procedural memory compete, with declarative memory preempting procedural memory for the most disperse expressions. This result is also consistent with recent studies in morphology (Schmidtke et al. 2017; Lõo et al. 2018) showing that frequency (i.e., declarative knowledge) impacts morphological processing sooner than morphological factors (i.e., procedural knowledge).
Before concluding the paper, we first discuss the other covariates. The correlation between speech style and case marker omission is perhaps not surprising. Half a century of work on stylistic variation, beginning in the 1960s with work by Labov (1966/2006), has shown that speakers subtly control socially meaningful linguistic variation in order to dynamically portray specific social identities (Schilling-Estes 2002).
How exactly the impetus to perform socially meaningful work interacts with cognitive load is a question that still remains largely unexplored. However, we are aware of at least one study that addresses this question. Heffernan & Hiratuka 2017 examined stylistic variation in spoken Japanese negative verbal suffix choice. Japanese has both standard and vernacular variants for the verbal negative suffixes. Heffernan & Hiratuka observed a strong correlation between negative verbal suffix choice and other markers of vernacular speech—except for the verbs shiru ‘know,’ wakaru ‘understand,’ and iru ‘need.’ They argue that since the negated forms of these verbs (i.e., don’t know, don’t understand, and don’t need) occur relatively more frequently in spoken Japanese than their uninflected forms, they are processed in declarative memory as a single cognitive unit. Generally speaking, verbs tend to occur most frequently in their non-past, non-negative form. This form presumably forms the base of an inflectional paradigm, and in most cases, the negative form of the verb is derived in procedural memory. However, in the cases of verbs shiru ‘know,’ wakaru ‘understand,’ and iru ‘need,’ the faster processing in declarative memory blocks the slower processing in procedural memory—more evidence that declarative knowledge and procedural knowledge compete during processing.
Finally, consider the results of the VERB CATEGORY factor group. The ranking of the specific verb categories by their estimated coefficients in the order in which the verb type favors case marker omission yields (5). As there has been little previous research on case marker omissions in the dative case, this work is groundbreaking. Our results show that the accusative case marker favors omissions more often than the dative case marker.
|(5)||suru ‘do’ > transitive > iku ‘go,’ unaccusative > unergative > naru ‘become’|
We tentatively suggest that this difference could be captured by generative grammar theory. Imanishi (2017) proposes that the gradient accusative case marker omission pattern seen in spoken Japanese parallels the categorical pattern of pseudo noun incorporation in Niuean (Massam 2001). Massam argues that pseudo noun incorporation only applies to a constituent in the complement position of a VP head. Crucially, dative-marked elements in unaccusative and unergative sentences appear in an adjunct position and not in a complement position. Therefore, there is a noticeable parallel between the gradient case marker omission patterns in Japanese and pseudo noun incorporation in Niuean, further supporting Imanishi’s claim that the two phenomena are theoretically similar.
As pointed out by one of the anonymous reviewers, this conclusion bears some resemblance to the behavior of accusative case marking in Old Japanese, as discussed in Miyagawa (1989; 2012) and Yanagida (2007). Miyagawa (1989) provides an in-depth analysis of the accusative case marking for direct objects of eighth-century Old Japanese, which can be summarized as follows. The object of the attributive form of the verb must bear the accusative case marker -o, whereas the object of the conclusive form need not (=zero marking) so long as it is adjacent to the verb (Miyagawa 1989: 212). However, as Yanagida (2007) has discovered, there are exceptions to Miyagawa’s generalization that the object of an attributive form must bear morphological case. Yanagida found 55 examples of exceptions in the Man’yōshū, an anthology of Japanese poems composed during the seventh and eighth centuries. As she points out, 54 out of these counterexamples contain objects that can be analyzed as bare nouns. Miyagawa (2012) suggests that these zero-marked objects are licensed by head incorporation (Baker 1988; but see Yanagida 2007 for a different analysis). In this analysis, we see a categorical contrast between bare nouns and modified nouns, with only bare nouns undergoing noun incorporation (and hence case marker omission). Such an analysis closely resembles Imanishi’s treatment of spoken Japanese, only really differing by a matter of degree (categorical versus gradient). But that is often the case when comparing records of written language such as the Man’yōshū with records of spoken language such as the Corpus of Kansai Vernacular Japanese.
Our study examined case marker omissions in object–verb pairs, with the objective of contributing to the emergent–generative grammar debate. We specifically looked for evidence of an interaction between emergent knowledge and generative knowledge, as defined in Ullman’s declarative/procedural language model (2004; 2008; 2016). We took the degree of dispersion of the object–verb pair within the corpus as a proxy of declarative knowledge, and the syntactic complexity of the object (as measured by the number of modifying constituents) as a proxy of procedural knowledge.
Our study is built on the assumption that increased case marker omission reflects reduced cognitive processing effort (Fedzechkina et al. 2017). Since more dispersed words require less cognitive effort (McDonald & Shillcock 2001), we hypothesized that increased dispersion correlates with increased case marker omission. Similarly, since more syntactically complex expressions require more cognitive effort (Caplan & Waters 1999), we hypothesized that increased syntactic complexity of the object correlates with reduced case particle omission. Furthermore, motivated by Ullman’s (2016) assertion that declarative knowledge and procedural knowledge compete during processing, we hypothesized that dispersion level and number of modifying constituents interact. Specifically, we expected the impact of the number of modifying constituents to decrease as dispersion level increased.
In order to test these hypotheses, we conducted a mixed-effects analysis of case marker omissions. After taking into consideration the effects of the speakers, the object–verb pairs (i.e., word effects), speech style, verb type, and whether or not the object had been previously mentioned, we found support for all three of our hypotheses.
Some recent work in variationist linguistics continues to debate the role of lexical frequency. For example, Walker (2012) examined t/d-deletion in conversational data from 47 speakers and concluded (Walker 2012: 410) that “once we take into account the contribution of a small set of highly frequent lexical items,” then frequency no longer correlates significantly with variation. Similarly, Bayley et al. (2013) examined a variety of factors such as gender, verbal tense and subject person/number, and the frequency of the verb form in subject pronoun omissions in conversational data from 29 Spanish speakers, and found that verb form frequency had a relatively small effect. They did find, however, that correlations with the factors are generally stronger for non-frequent verb forms. In a way, we are building on these results by examining similar questions, but replacing lexical frequency with dispersion. Furthermore, we also found a similar, albeit more constrained, result to that of Bayley et al.: the correlation with one of our factors was stronger for the less dispersed object–verb forms.
Our results do not support the usage-based theory of grammar. In this approach, the cognitive organization of language is based directly on experience with language, the basic units of syntactic structure replaced by constructions (Beckner et al. 2009). Such an approach could account for the correlations between our measure of syntactic complexity—the number of modifying constituents—by appealing to the different rates of exposures a speaker has to modifying constituents; a determiner modifying a noun occurs more frequently than a relative clause modifying a noun. However, such an approach does not explain the interaction between dispersion and modifying constituents since in a usage-based grammar approach one is not dependent on the other.
Our results also do not support a model of grammar that ignores user experience, since we found a strong correlation between dispersion and case marker omissions. Contrary to Walker (2012) and Bayley et al. (2013), we conclude that morphosyntactic variationist studies need to continue to take into account some measure of community-level usage rates of specific expressions, such as dispersion.
Our study has limitations. We say nothing about the innateness of Universal Grammar. Ullman (2004; 2016) argues that procedural memory, which underlies the computational system which gives structure to language, also subserves other cognitive functions. O’Grady (2008) similarly argues for general cognitive functions underlying the language faculty. Our results are compatible with such an approach, or with an innateness approach.
We also have not considered the role of prediction in our study (Gahl & Garnsey 2004; Levy 2008; Kuperberg & Jaeger 2016). These researchers account for differences in cognitive processing with a model of prediction based on language experience. A listener dynamically processes linguistic input as each new word is heard by simultaneously predicting multiple candidate structural representations of the partial input. She then ranks each representation based on its probability of occurrence. The difficulty of cognitively processing a new word corresponds to how closely that new word matches the predicted representations, and the cost to rerank the candidate representations when predictions are not met.
Previous research has demonstrated that measures of prediction correlate with the acoustic duration of words (Bell et al. 2009; Seyfarth 2014), fixation times in eye tracking studies (Levy 2008), and reaction times in lexical decision studies (Balling & Baayen 2012) and event-related potentials (Van Petten & Luka 2012). However, we have avoided a measure of expectation in this study. As O’Grady points out (2015: 9), any measure of rate of occurrence such as dispersion will confound any measure of parsimonious processing such as predictability. Thus, adding both dispersion and predictability risks multicollinearity problems; we opted for dispersion. But what about using a measure of prediction as a proxy of procedural knowledge? Ullman allocates predictive linguistic processing to procedural knowledge, suggesting that a measure of predictability might make a better proxy for procedural knowledge than our rather simple measure of syntactic complexity. We leave this development for future research.
What is the next step? We have empirically supported Ullman’s claim that declarative and procedural knowledge interact, but we still know very little about how. Reuland (2010) speculates that the declarative/procedural interface along with the limited capacity of working memory define key characteristics of syntax, such as the locality restrictions on wh movement in English known as island constraints. For example, English wh words cannot move out of an adverbial clause (=adjunct island). Consider the relevant example discussed by Reuland, reproduced here as (6a). Reuland points out that, in contrast, pronominal binding does not have such a restriction, as shown by (6b). We have no difficulty interpreting “him” in terms of “every boy” in the matrix clause. Reuland explains the asymmetry as follows. Incomplete linguistic input is temporarily stored in working memory while the declarative/procedural interface parses it into a larger chunk. That chunk is then shifted to a longer term memory, freeing up working memory for the next chunk. Reuland speculates that in the case of (6a), the wh word must stay in working memory until its dependency is resolved. The limited capacity of working memory eventually overloads, leading to the ungrammaticality of the sentence. The expression “every boy” does not jam up working memory because it does not necessarily have a dependency; although “every boy” can bind the pronominal “him,” it may also refer to some other entity in the larger discourse. Such dependencies are presumably resolved in declarative memory.
|a.||*What did John think Mary would be unhappy about after she ruined (what)?|
|b.||Every boy thought that Mary would be unhappy after she ruined his apartment.|
Reuland’s ideas about the declarative/procedural interface lead to more predictions. Again we expect to see gradient interactions. In a similar discussion about wh movement in Japanese, Saito & Fukui (1998) give examples that are ungrammatical as well as examples that are only slightly so. We make two predictions. Frist, we should be able to come up with a full range of grammatical violations, from mild to severe. Second, the severity of the violation should interact with some indicator of declarative knowledge, such as dispersion. That is, the more an expression is dispersed, the more speakers should show tolerance for grammatical violations due to wh movement. There should also be specific violation types that are clearly unacceptable regardless of dispersion level. Further exploring the interaction between declarative knowledge and procedural knowledge in this way should lead to a better understanding of intuitions about grammaticality. The methodology used by Lõo et al. (2018) seems ideally suited to such a task.
Gries (2012: 477) laments, “Linguistics is fundamentally a divided discipline, as far as theoretical foundations and empirical methodologies are concerned.” Certainly there will always be researchers narrowly focused on one or another aspect of language. But we hope that with this study, along with other advocates of dual route models, we are building a bridge over this theoretical gap.