Whereas most previous studies on (super-)gang effects examined cases where two weaker constraints jointly beat another stronger constraint (

A _{2} or ℂ_{3} wins since violating ℂ_{1} is more fatal. In (1)-c, however, the candidate that violates both ℂ_{2} and ℂ_{3} loses because the summation of ℂ_{2} and ℂ_{3} outweighs ℂ_{1}.

(1)

Gang effect

a.

_{1}) > _{2})

b.

_{1}) > _{3})

c.

_{2}) + _{3}) > _{1})

It was reported that the mere addition of constraint weights often cannot capture the cumulative effect correctly because forms with two marked structures occur even less in natural languages than what grammar predicts by multiplying the probabilities of two separate forms with one marked structure each (

Jäger & Rosenbach (_{1} loses because violating ℂ_{1} is more severe than violating the weaker constraint ℂ_{2}. However, in (2)-b, the candidate that incurs multiple violations of ℂ_{2} loses because a single violation of ℂ_{1} is outweighed by multiple violations of ℂ_{2}.

(2)

Counting cumulativity

a.

_{1}) > _{2})

b.

_{2}) × 2 > _{1})

This paper focuses on the super-additive version of this counting cumulativity, which I call ^{b}

The remainder of the paper is organized as follows. Through an illustration of a toy example of a super-additive counting cumulativity in §2, I first show that these effects cannot be entirely captured in MaxEnt Harmonic Grammar (

I start by defining terms that are used in the paper in §2.1. In §2.2, I present a toy dataset with super-additive counting cumulativity and compare three different violation assessment methods to see how each captures the data. I first analyze the toy data with a conventional penalty assessment method where the product of the constraint weight and the number of violations is summed over all the constraints. I show that super-additivity cannot be entirely captured in a conventional MaxEnt Grammar. In §2.3, I take an alternative approach, a constraint family model, and discuss how it fails to capture monotonicity of such patterns wherein more violations incur more severe penalties. In §2.4, I introduce the power function model in which the degree of penalty is scaled up according to the number of violations, through a power function. In §2.5, I introduce a MaxEnt implementation of the power function model, called the Power Function MaxEnt Learner, with a focus on how it differs from the conventional MaxEnt learner. I present two learning simulations to show the capacity of this learner. I first fit a dataset without super-additive counting cumulativity using the Power Function MaxEnt Learner; I show that the learner successfully lets all the

Various terms regarding the topic of constraint interaction have been used in the literature. In this section, I will introduce these terms and determine which of these will be used in the paper.

As mentioned in §1, this paper focuses on

Tableau (3) illustrates an example that exhibits a super-additive counting cumulativity, motivated by a Korean dataset that will be introduced later in §2. The weights of the constraints and predicted probabilities are computed by the Maxent Grammar Tool (_{1} will always be incurred by candidate (ii) and will be constant over the inputs. Violation of ℂ_{2} will always be incurred by candidate (i), with a monotonic increase. The Korean dataset in §3 will not include inputs like (d) because the majority of Korean native nouns are monosyllabic or disyllabic and therefore not long enough to incur three violations of ℂ_{2}. Even if the word length allows, a co-occurrence of three laryngeally marked consonants is highly disfavored, reported in Park (

(3)

Conventional MaxEnt grammar: bad fit

Comparing (3)-a and (3)-b, a single violation of ℂ_{2} lowers the observed frequency of the winner only marginally and does not reverse the choice of winner. Multiple violations of ℂ_{2} can reverse the winner, however, as seen in (3)-c and (3)-d. This is a clear case of counting cumulativity, where multiple violations of a weaker constraint ℂ_{2} overpower a single violation of the more powerful constraint ℂ_{1}. The weighting condition for these two constraints is summarized in (4). The notation

(4)

Weighting schema for the counting cumulativity of ℂ_{2}

_{2})×_{1}) > _{2}), where

Taking a closer look at the observed probability distributions of (3)-c and (3)-d, the losing candidates with multiple violations of ℂ_{2} barely occur between the two possible outputs. This shows that ℂ_{2} is cumulative in a _{2} surpass a mere multiplication of the severity of its single violation.

The weights of the constraints, computed by the Maxent Grammar Tool (_{2} (0.6) weighs less than one violation of ℂ_{1} (0.7), which in turn weighs less than two or three violations of ℂ_{2} (1.2, 1.8). However, the probabilities of the candidates predicted by the weighted constraints do not successfully match the input distributions. To reproduce the observed pattern of super-additivity, a grammar must satisfy the two following conditions. First, one violation of ℂ_{2} should weigh low enough to correctly capture the marginal frequency difference between the winners of (3)-a and (3)-b. Second, multiple violations of ℂ_{2} should weigh high enough to capture the extreme gang-effect in (3)-c and (3)-d. However, this weighted grammar meets neither of these conditions because _{2}) is stuck in the middle; the candidate (3)-b-i was penalized too much for violating ℂ_{2} once (predicted 52%, while 60% observed) while (3)-c-i was not penalized enough for incurring multiple violations of ℂ_{2} (38% predicted, while 4% observed).

The simulation illustrates that there is no way for a traditional MaxEnt grammar to satisfy the two conditions: _{2})×1 being low enough while _{2})×2 and _{2})×3 high enough. This is because the disparity between one and multiple violations is wide, but _{2}) is fixed and violations are assessed only linearly. Under this linear strategy, multiple violations are merely doubly or triply penalized, which is not enough to capture the super-additivity.

The observed pattern of super-additive counting cumulativity can be captured by a constraint family where a separate self-conjoined constraint is responsible for each number of violations. For the toy data, ℂ_{2} was replaced by a set of constraints in (5), whose weights were computed using the MaxEnt Grammar Tool (

(5)

(ℂ_{2})^{1}: Penalize exactly one violation of ℂ_{2}.

(ℂ_{2})^{2}: Penalize exactly two violations of ℂ_{2} (=ℂ_{2}&ℂ_{2}).

(ℂ_{2})^{3}: Penalize exactly three violations of ℂ_{2} (=ℂ_{2}&ℂ_{2}&ℂ_{2}).

(6)

Constraint family model: good fit

The constraint family provides precise frequency matching, which is expected; there is a separate constraint responsible for every number of violations, and the weight can be adjusted to cater to each input and its candidate distribution.

In this approach, however, constraints that stand for greater numbers of violations are not guaranteed to have higher weight. For example, a constraint family can predict a language in which violating _{2})^{3} is less severe than _{2})^{1} but more severe than _{2})^{2}, which is highly unnaturalistic and unattested (e.g., _{2})^{1} > _{2})^{3} > _{2})^{2}). A de Lacian approach of stringency hierarchy (_{2}, decreases to 60% at 4 violations, remains constant until 6 violations and approaches 0% with more. Thus, this line of approaches disregards the nature of super-additive counting cumulativity wherein more violations lead to the strictly lower probability of the offending candidate, as Zymet (

In the same vein, a constraint family also dismisses learners’ abilities to make predictions about larger violations that are not evidenced in the existing lexicon. For example, the Korean native lexicon has very few monomorphemic nouns with two marked consonants, as briefly mentioned in §2.2; however, they know that tensifying a compound formed with these nouns is extremely unlikely.

A power function is a function of the form ^{b}_{2} violations _{2}) with _{2})^{b} (^{b}^{b}^{b}^{b}

With a ^{b}^{b}

This section outlines the MaxEnt implementation that automatically learns parameters for the power function model introduced in §2.4. In the conventional MaxEnt grammar, harmony of a certain phonological form _{i}_{i}(x)

(7)

The learner of the power function model is crucially different in harmony calculation since the number of violations _{i}

(8)

The learner’s goal is to find the set of weights that minimizes the negative log likelihood of the training data. Therefore, I formalize learning as minimizing the standard loss function, the sum of the negative log likelihood. For optimization, batch gradient descent is used. The weights for all constraints are initialized at 0 and the

I first fit a dataset without super-additivity in which a linear increase in the number of violations results in a fairly gradual decrease in observed probabilities using the Power Function MaxEnt Learner. In (9), for each input, the observed probability of the candidate (i) gradually decreases with larger violations of ℂ_{2}; the data portrays cases where the counting cumulativity of ℂ_{2} is no more than additive. The observed probabilities in this tableau are adjusted from (3) for demonstration but this type of gradual decrease/increase in probabilities that arises from violating a scalar constraint is well attested in linguistics (English genitive variation in Jäger & Rosenbach (

(9)

Counting cumulativity without super-additivity: good fit

The Power Function MaxEnt Learner converged after 815 iterations and was able to successfully detect that increasing the _{2} is not necessary; for both constraints, the

Subsequently, the dataset with super-additivity (3) was fitted using the Power Function MaxEnt Learner. The learner converged after 50,134 iterations. As seen in (10), only the exponent for ℂ_{2} increased to 4.5 whereas the exponent for ℂ_{1} stayed at 1, which shows that the learner is able to detect the super-additive constraint given the input data and let _{2} would be beneficial in this case. The parameters in (10) show that the power function approach enables the grammar to reflect the crucial aspects of the super-additive counting cumulativity; a single violation of ℂ_{2}(0.2) is outweighed by a single violation of ℂ_{1}(0.5), which in turn is outweighed by multiple violations of ℂ_{2}(0.2 × 2^{4.5} = 4.5).

(10)

Output of the Power Function MaxEnt Learner: good fit

The power function model gives precise frequency matching with an addition of only one parameter on top of a single baseline constraint, which is more parsimonious than a family of constraints in §2.3. Moreover, by relying on the nature of the mathematical concept, the weight is guaranteed to monotonically and strictly increase as the number of violation increases (

I illustrated a toy example of super-additive counting cumulativity and evaluated three different models: linear assessment, constraint family, and power function. In §2.2, it was shown that the conventional way of assessing violations cannot properly capture super-additivity. In §2.3, a set of self-conjoined constraints provided a good fit to the toy data; but these constraints can be arbitrarily weighted, in which case the observed monotonicity is not guaranteed. In §2.4 and §2.5, a grammar with exponentiated penalties showed a good fit to the toy example. With a slight modification to the traditional MaxEnt model, the Power Function MaxEnt Learner fits the parameters that are necessary for the power function model. More importantly, the mathematical nature of the power function guarantees more violations to be penalized more severely, when the exponent

As the first real-life example that exhibits a super-additive cumulativity, Korean compound tensifcation is investigated in this section. I introduce the phenomenon in §3.1. In §3.2, I analyze the observed pattern using a set of OCP constraints that are sensitive to the position of a word boundary and whether the offending structure is derived or not. The observed super-additivity is attributed to the counting cumulativity of a specific OCP constraint. In §3.3, I report a learning simulation using the Power Function MaxEnt Learner and show that the suggested method correctly captures the observed super-additive pattern.

Korean features a distinction between plain consonants (/p/, /t/, /s/, /c/, /k/) and laryngeally marked consonants, which include tense (/p’/, /t’/, /s’/, /c’/, /k’/) and aspirated (/p^{h}/, /t^{h}/, /c^{h}/, /k^{h}/), for obstruents. In a compound composed of two nouns, W_{A} and W_{B}, if the initial onset of W_{B} is a plain obstruent, it often undergoes tensification. Tensification is required for some compounds (/san/ ‘mountain’ + /pul/ ‘fire’ [sanp’ul] ‘wild fire’) while not allowed in others (/sil/ ‘thread’ + /panci/ ‘ring’ [silpanci] ‘thread ring’), and there are some compounds that are variably realized as either tensified or non-tensified form (/pul/ ‘fire’ + /kituŋ/ ‘pillar’ [pulkituŋ] ^{h}oŋ/ ‘bean’ + /ki.rɯm/ ‘oil’ [k^{h}oŋ.ki.rɯm], *[k^{h}oŋ.k’i.rɯm] ‘soybean oil’ and /tol/ ‘stone’ + /to.k’i/ ‘ax’ [tol.to.k’i], *[tol.t’o.k’i] ‘stone ax’. Considering that the compounding process derives another laryngeally marked consonant in W_{B}, the blocking effect of a laryngeally marked consonant arises from laryngeal co-occurrence restrictions (

Examining phonological factors that contribute to the likelihood of tensification in Korean compounds, Kim (_{B} initial onset is a lax obstruent, which are collected from the two data sources: Korean Usage Frequency (

Kim (_{A} or W_{B}) and the number of laryngeally marked consonants, as summarized in

Tensification rate according to the type/number of consonant in W_{A} and W_{B}.

W_{B} |
plain/sonorant | .60 | 3,447/5,754 |

one tense | .31 | 112/357 | |

one aspirated | .31 | 85/273 | |

W_{A} |
plain/sonorant | .58 | 2,809/4,830 |

one tense | .62 | 521/840 | |

one aspirated | .56 | 308/546 | |

W_{A} |
two marked Cs | .04 | 6/168 |

First, the presence of a single laryngeally marked consonant in W_{B} significantly lowers the tensification rate. Unlike the strong effect of a marked consonant in W_{B}, there need to be two marked consonants in W_{A} in order to exhibit sigificant OCP effects; the presence of a single laryngeally marked consonant is not significantly different from the complete absence of any marked consonant on tensification, whereas the presence of two marked consonant in W_{A} lets the tensification rate plunge. If a laryngeally marked consonant is considered a tensification blocker, the effect of two in W_{A} goes beyond doubling the effect of one in W_{A}; two laryngeally marked consonants in W_{A} gang up in a super-additive manner and block tensification.

Analyzing the significance of OCP effects, I constructed a mixed effects logistic regression model using the _{B} (backward difference coded: zero, one) and the number of marked consonants in W_{A} (backward difference coded: zero, one, two). The model also included random slopes and random intercepts for participants and compounds. The model is shown in

Regression model for the Korean survey results.

(Intercept) | –3.11 | 0.51 | –6.02 | <0.001 |

0 _{B} |
–3.75 | 0.62 | –5.99 | <0.001 |

0 _{A} |
0.00 | 0.39 | 0.00 | 0.99 |

1 _{A} |
–5.32 | 1.17 | –4.53 | <0.001 |

In the model reported in _{B} significantly reduces the likelihood of tensification, (ii) the presence of one marked consonant in W_{A} has no effect on tensification and is not significantly different from the absence of marked consonant, and (iii) the presence of two marked consonants in W_{A} significantly dampens tensification. Moreover, model comparison using the _{A} makes significant improvements (^{2}(2) = 24.3,

This section provides a formal analysis of Korean compound tensification. I first introduce constraints that are responsible for the occurrence of tensification. And then, I analyze the OCP pattern observed in §3.1: one laryngeally marked consonant in W_{B} suffices to block tensification while more than one in W_{A} is needed to clearly show the blocking effect. This is attributed to different markedness thresholds in different morphological domains: word and compound.

Traditionally, tensification in Korean compounds has been considered a way of helping a compound be perceived as consisting of two elements (_{B} has been formally described in the literature as an insertion of a stop segment such as /s/ or /t/ at the compound juncture, which would allow an automatic process of post-obstruent tensification, or an insertion of a floating feature that would allow tensification of the following segment, such as [constricted glottis] or [tense]. Among many, following the recent study of Ito (_{A} final coda, because codas are unreleased and hence cannot be tensified in Korean (_{B} initial onset segment. While R_{B} initial onset segments from undergoing tensification, as demonstrated in (11). The weights in the tableau are computed by the MaxEnt Grammar Tool (

(11)

Compound tensification triggered by R

An offending structure (a co-occurrence of two laryngeally marked consonants) is treated differently depending on whether a word boundary intervenes or not. First, considering that the site of tensification is the initial onset of W_{B}, the reason for the stronger OCP effect in W_{B} is that a co-occurrence of two laryngeally marked consonants is significantly less preferred within a single morpheme. This strong OCP effect in W_{B} can be captured by a trigram that penalizes an offending structure with a preceding word boundary, defined in (12). In the constraint definition, T represents a laryngeally marked consonant and + represents a boundary.

(12)

*+TT: Assign a violation mark for every subsequence of a word boundary followed by a laryngeally marked consonant followed by another laryngeally marked consonant.

On the other hand, a laryngeally marked consonant in W_{A} cannot block tensification as strongly because a co-occurrence of two marked consonants is better tolerated in a compound. This can be captured by a similar trigram (13), which penalizes an offending structure with an intervening word boundary.

(13)

*T+T: Assign a violation mark for every subsequence of a laryngeally marked consonant followed by a word boundary followed by another laryngeally marked consonant.

The constraint *+TT should be weighted highly enough to capture the clear OCP effect in W_{B} while *T+T should be weighted lowly enough to allow two marked consonants to still co-occur over a word boundary. The OCP pattern of Korean compound tensification corroborates the observation that some phonological restrictions that hold within a morpheme can be loosened or even lifted at morpheme boundaries (

The tensification-blocking effect of two laryngeally marked consonants in W_{A} is attributed to the counting cumulativity of *T+T violations, as summarized in (14). A single violation of *T+T is outweighed by the tensification-triggering constraint (R^{h}u/ ‘chili’ + /kirɯm/ ‘oil’, represents cases where a marked segment in W_{A} does not block tensification. The example in (16), /k’oc^{h}i/ ‘skewer’ + /kui/ ‘roast’, represents cases where two marked consonants in W_{A} block tensification. In these tableaux, the input frequencies are not accurately matched with these weights although they satisfy the weighting condition laid out in (14), because two violations of *T+T are more severe than simply doubling the severity of one violation, indicating that *T+T cumulates in a super-additive manner.

(14)

Counting cumulativity of *T+T

(15)

A single violation of *T+T is weaker than R

(16)

Two violations of *T+T are stronger than R

The constraint *T+T is super-additive only if it is violated by a derived pair of laryngeally marked consonants. Consider tableau (17).

(17)

Multiple violations of *T+T not fatal if violated by old pairs

Tableau (17), represented by the example /k^{h}oŋ/ ‘bean’ + /kuks’u/ ‘noodle’, illustrates cases where W_{A} and W_{B} both include a laryngeally marked consonant. The candidate with tensification (17)-a still occurs 20% of the time, contrary to (16)-a almost never occurring, although it violates *T+T twice for having two pairs of laryngeally marked consonants over the word boundary: k^{h}+k’ and k^{h}+s’. This is because the underlying T+T subsequence, k^{h}+s’, is tolerated and does not contribute to the super-additivity of *T+T violations. In contrast, the two T+T pairs included in (16)-a, k’+k’ and c^{h}+k’, both contribute to the super-additivity since both include a derived tense consonant k’. Since underlying T+T subsequences behave differently than derived ones, I replace *T+T with *T+T_{O} and *T+T_{N}, as seen in the re-evaluation of the same example from (17), presented in (18). In the tableau, RM represents R_{N} is violated by the derived T+T subsequence, *T+T_{O} is violated by the subsequences that are already present in the fully faithful candidate, which are defined as a candidate without violating any faithfulness constraints (Comparative markedness; _{O} is super-additive, *T+T_{N} is clearly super-additive.

(18)

*T+T_{N} violated by derived subsequences and *T+T_{O} by old subsequences

Making a similar distinction for *+TT is well motivated by the lexical statistics of Korean monomorphemic words. As mentioned earlier, *+TT needs to be weighted highly to capture the significant blocking effect of a marked consonant in W_{B}. However, some combinations of laryngeally marked consonants are actually overrepresented in Korean monomorphemic lexical items. For example, Ito (_{N} and *+TT_{O}, where *+TT_{N} is violated by derived TT pairs and *+TT_{O} is violated by underlying TT pairs that are already present in the fully faithful candidate, which is essentially the input. In the paper, *+TT_{O} is not included in the analysis because the dataset used here has no compound with two underlying marked consonants in W_{B}; *+TT_{O} is vacuously satisfied by all the compounds.

In this section, I fit the Korean data using the Power Function MaxEnt Learner. I first summarize the necessary constraints. Then, I present the result of the learning simulation on the Korean data. I show that incorporating the power function into violation assessment allows accurate frequency matching of the input data. I also show that the Power Function MaxEnt Learner is able to raise

As mentioned above, I assume that R_{B}. In contrast, I

The laryngeal co-occurrence patterns are captured by a set of two OCP constraints, which penalize derived marked pairs with a preceding or intervening word boundary, as defined again in (19) and (20). The counterpart constraints *T+T_{O} and *+TT_{O} are excluded in the analysis as these constraints are not decisive in predicting candidate distributions; *+TT_{O} is never violated in the entire dataset as mentioned above and *T+T_{O} is violated by either all candidates or no candidates for each input.

(19)

*+TT_{N}: Assign a violation mark for every subsequence of a word boundary followed by a laryngeally marked consonant followed by another laryngeally marked consononant that is not present in the input.

(20)

*T+T_{N}: Assign a violation mark for every subsequence of a laryngeally marked consonant followed by a word boundary followed by another laryngeally marked consonant that is not present in the input.

The Korean data was fitted using the Power Function MaxEnt Learner. The weights of the constraints were all initialized at 0. The update to a constraint’s weight was the negative of the learning rate (0.1) times the gradient that was computed by _{B} underlyingly has a single laryngeally marked segment.

In the tableau, only the exponent for the super-additive constraint *T+T_{N} increased to 4.5 whereas those for all the other constraints stayed at 1, which shows that the learner is able to detect the super-additive constraint given the input data and let

(21)

Output of the Power Function MaxEnt Learner: good fit

The fact that one violation of *T+T_{N} has no effect of blocking tensification is reflected in the very low weight of *T+T_{N} (_{N} is reflected in its exponent 4.5. With these parameters, the weighted grammar correctly captures the super-additive counting cumulativity of *T+T_{N}: one violation of *T+T_{N} (_{N} (^{4.5} = 4.5).

In this section, I investigated super-additivity of laryngeally marked consonants in Korean compound tensification, a process whereby the initial onset of W_{B} can be tensified when compounded. The tensification likelihood of a given compound can be partially predicted a number of phonological factors, one of which is the presence of another marked consonant, motivated by laryngeal co-occurrence restrictions. Co-occurrences of marked consonants are tolerated differently in different morphological domains (word and compound); whereas a single laryngeally marked consonant in W_{B} blocks tensification, there must be two marked consonants in W_{A} to block the process. The observed OCP patterns were captured by multiple OCP constraints that are sensitive to the position of the word boundary and the input-output mappings, one of which (*T+T_{N}) was super-additive and therefore responsible for the super-additive cumulativity of a marked consonant in W_{A}. The Power Function MaxEnt Learner was able to detect this constraint as super-additive and adjusted the parameters, successfully matching the frequencies of the input data.

As another case of super-additive counting cumulativity, nasals in Japanese Rendaku are investigated in this section. I provide a brief introduction of the process in §4.1. I describe the compound databases that I use and report some tendencies found in the data in §4.2. In §4.3, I give a phonological analysis of the observed pattern using OCP constraints that operate on different natural classes, one of which cumulates super-additively. In §4.4, I use the Power Function MaxEnt Learner to fit the Japanese data. I summarize the section in §4.5.

In a compound composed of two elements (W_{A} and W_{B}) in Japanese, if the second element begins with a voiceless obstruent (/t/, /s/, /k/, /h/), it often voices (/jama/ ‘mountain’ + /

One of the best known factors is the presence of a voiced obstruent in W_{B}; if W_{B} already contains a voiced obstruent (/b/, /d/, /z/, /ɡ/), Rendaku is almost always blocked (/jama/ ‘mountain’ + /

I employ two complementary compound databases for investigating various phonological restrictions of Rendaku in this paper. The main database that I use, Irwin et al. (_{B} begins with a voiceless obstruent and does not include a voiced obstruent. The database provides the compounds’ pronunciations that can be retrieved from either or both of those two dictionaries and I work with the pronunciation data sourced from the Kenkyūsha dictionary in this paper. I included compounds whose pronunciations in Kenkyūsha are either specified as “+”, meaning Rendaku is exhibited, or “–”, meaning Rendaku is not exhibited. I excluded compounds whose pronunciations are not available in Kenkyūsha or annotated as “+–”, meaning that both Rendaku and non-Rendaku pronunciations are possible, in order to treat Rendaku application as a binary variable.

As I am investigating phonological restrictions of Rendaku, I excluded items that could potentially be affected by other factors that have been reported to dampen Rendaku. Irwin et al. (_{B} is from the native stratum, as it has been reported that W_{B} from either the Sino-Japanese or the foreign stratum can resist Rendaku (_{B} is written with exactly one Kanji character in order to eliminate the effect of the Right Branching Condition. For example, compounds with /kanamono/ (金物) ‘hardware’ or /tatemono/ (建物) ‘building’ as the second noun were excluded. It is not the perfect metric for determining the monomorphemicity of a word because there are simplex words that can be written with two Kanji characters, as Tanaka (_{B} is monomorphemic and the initial segment of W_{B} belongs to the right branch of the compound.

I also excluded coordinate compounds, as they are known to systematically avoid Rendaku application (_{B} is three moras or longer. The process above resulted in an ideal dataset for investigating phonological restrictions on Rendaku. The data includes 2,130 items, of which 1,772 underwent Rendaku (83%).

Since compounds with a voiced obstruent in W_{B} are already excised from the main database Irwin et al. (_{B}, whose total moraic count does not exceed four. Rosen (_{B}: 78% vs. 75%, (ii) no nasal in W_{B}: 75% vs. 72%, (iii) one nasal in W_{B}: 85% and 87%, and Rosen (_{B}. Therefore, I use the counts and Rendaku probability obtained from Rosen (_{B}. With these items included in the extracted Irwin et al. (

The Rendaku probability depends on the type and number of context consonants, as summarized in _{B} can categorically block Rendaku. Unlike the strong effect of voiced obstruents, it has been described in the literature that the presence of a nasal in W_{B} does not block Rendaku (_{B} can heavily dampen Rendaku, as in /touzoku/ ‘thief’ + /kamome/ ‘seagull’ → [touzokukamome] ‘pomarine jaeger (a type of seabird)’ and /hito/ ‘one’ + /tsumami/ ‘knob’ → [hitotsumami] ‘easy victory’. If we regard a nasal consonant as a Rendaku blocker, the effect of two blockers goes beyond merely doubling the effect of one; two nasals in W_{B} cumulate in a super-additive manner and block the application of Rendaku.

Rendaku rate according to the type/number of consonant in W_{B}.

Irwin et al. ( |
0 | .83 | 1,772/2,129 | |

0 | .85 | 1,339/1,580 | ||

1 | .80 | 418/520 | ||

>2 | .52 | 15/29 | ||

Rosen ( |
1 | 0 | 0/116 | |

Analyzing the significance of the observed nasal effects, I constructed a logistic regression model using the _{B} (backward difference coded: zero, one, two or more) was the independent variable. The result is shown in

Regression model for the Rendaku database.

SE( |
||||

Intercept | 1.06 | 0.13 | 8.10 | <0.001 |

0 nasal |
–0.30 | 0.13 | –2.32 | <0.05 |

1 nasal |
–1.34 | 0.38 | –3.46 | <0.001 |

In the model, a negative coefficient means that the factor contributes to dampening Rendaku and the larger absolute value means that the effect is stronger. As can be seen in _{B} (= 2,442 items in total). The model is reported in

Regression model with Right Branching Condition as an extra predictor.

Intercept | 0.26 | 0.15 | 1.69 | <0.1 |

0 nasal |
–0.32 | 0.11 | –2.79 | <0.01 |

1 nasal |
–1.23 | 0.33 | –3.75 | <0.001 |

Right Branching Condition | 0.82 | 0.13 | 6.07 | <0.001 |

As can be seen, the effect of Right Branching Condition is clearly present, which is compatible with the traditional description on Rendaku: Rendaku is more likely with a monomorphemic W_{B}. However, this model also reveals that the nasal effects are still significant even if the Right Branching Condition is included in the model. Moreover, model comparison using the _{A} makes significant improvements to the fit to the data (^{2}_{B} is robust in the lexicon of Japanese.

Kumagai (

In this section, I first introduce constraints that are responsible for the occurrence of Rendaku. Then, I give a formal analysis of the patterns described in §4.2. I explain why a single nasal would still tolerate the application of Rendaku whereas a single voiced obstruent and multiple nasals would significantly lower the Rendaku probability in W_{B}, by making a connection to previously established similarity avoidance effects in Japanese; higher similarity between surface segments is more strongly avoided. I define OCP constraints that operate on different natural classes, one of which is responsible for the super-additive cumulativity of nasals.

Similar to the Korean compound tensification case, it is assumed that the compound juncture marker [+voice] is inserted between two nouns when they form a compound. The constraint R_{B}, whereas I

The resistance to Rendaku increases if the Rendaku process results in a co-occurrence of more similar segments. _{B} differently contribute to the overall similarity between consonants with the application of Rendaku. If /tene/ undergoes Rendaku, it results in a co-occurrence of consonants that agree in one feature out of two. This degree of similarity (50%) is presumably tolerable since Rendaku still applies in this case. However, if /teɡe/ underwent Rendaku, it would generate two overlapping feature pairs, which is a total identity in this case (100%). Likewise, the application of Rendaku on /teneme/ would result in 4 overlapping feature pairs out of 6 possible pairs (67%; [+voice]×3, [+sonorant]×1). In the last two cases, the degree of similarity would be intolerably high if Rendaku applied and therefore Rendaku does not take place. The number of overlapping features is not a perfect way of measuring similarity since some features are more perceptually salient than others (

Feature specification of hypothetical Rendaku-applied W_{B} forms.

consonant tier | d | n | d | ɡ | d | n | m |

voice | + | + | + | + | + | + | + |

sonorant | – | + | – | – | – | + | + |

The effect of similarity avoidance on Rendaku applicability has been frequently brought up in the literature. Kawahara & Sano (_{B} (W_{A}+/tadanu/ → [W_{A}tadanu], *[W_{A}dadanu]). Thus, the application of Rendaku is a way of either resolving or preventing identity at the moraic level. Moreover, there is a growing body of experimental studies proving that the similarity avoidance effect is gradient and that higher similarity tends to be more strongly avoided. Sano (_{i}V_{z}.C_{j}V_{z}) is avoided but total identity (C_{i}V_{z}.C_{i}V_{z}) is even more strongly avoided in Japanese verbal inflection. Kumagai (

In order to capture the different Rendaku-blocking effects of nasals and voiced obstruents, I employ OCP constraints that are defined over two sets of segments: voiced consonant and voiced obstruent. As stated by the gradient similarity-avoidance effects, the strength of these constraints depends on the homogeneity of the natural class the constraint is defined over; voiced obstruents ([bdzɡ]) are more homogenous than voiced consonants ([bdɡzmnjwr]), and the OCP constraint on voiced obstruents is stronger.

First, the OCP constraint that bans a co-occurrence of two voiced consonants is defined in (22). In the constraint definition, C̬ is used to represent any consonant that is voiced ([+voice, +consonantal]). This constraint is responsible for blocking Rendaku application with a nasal in W_{B}. The weights computed by the MaxEnt Grammar Tool (_{B} consonants on Rendaku, the first element of the compound is represented as an abstract form W_{A}. The second element is represented by three consecutive CV moraic units, or light syllables (e.g., CVCVCV). For the simplicity, syllables with a voiced obstruent will be represented by D, syllables with a nasal will be represented by N, syllables with neither will be represented by

(22)

*+C̬C̬: Assign a violation mark for every subsequence of a word boundary followed by a voiced consonant followed by another voiced consonant.

(23)

One violation of *+C̬C̬ allows Rendaku application most of the time

(24)

Multiple violations of *+C̬C̬ dampen Rendaku application

The Rendaku-applied candidate (23)-a violates *+C̬C̬ once for containing a subsequence of voiced consonants, such as [d…n], and occurs 80% of the time. Considering that the compounds with no nasal in the corpus data underwent Rendaku 85% of the time as reported in _{1}], [d…n_{2}], and [n_{1}…n_{2}]. Compared to candidate (23)-a occurring 80% of the time, candidate (24)-a occurs only 52% of the time because three violations of *+C̬C̬ is more severe than simply tripling the severity of one violation, indicating that *+C̬C̬ is cumulative in a super-additive way.

The weighting condition between *C̬C̬ and R

(25)

Counting cumulativity of *+C̬C̬

The categorical Rendaku-blocking effect of a voiced obstruent is captured by the constraint defined in (26). In the definition, D is used to represent any segment that is [+voice, –sonorant]. Thus, this constraint will penalize co-occurrences of [bdzɡ]…[bdzɡ] in W_{B}. A sample evaluation is provided in (27).

(26)

*+DD: Assign a violation mark to every subsequence of a word boundary followed by a voiced obstruent followed by another voiced obstruent.

(27)

No Rendaku applied when W_{B} has one voiced obstruent

Unlike the Korean case where the OCP constraint needed to be sensitive to whether the offending structure is derived, Japanese does not require *+C̬C̬ to be separated into *+C̬C̬_{N} and *+C̬C̬_{O} because the constraint *+C̬C̬ is consistently inert in both dynamic alternations and static phonotactic restrictions of Japanese; as we just observed, a single nasal in W_{B} does not inhibit Rendaku, which implies that *+C̬C̬ is not powerful enough to work against the morphophonological alternation. Moreover, it is violated fairly frequently in existing words, as there are stems where two nasals freely co-occur (/mono/ ‘object’).

In this section, I present the result of a learning simulation on the Japanese data to show that incorporating the power function into violation assessment enables the grammar to accurately match the frequencies of the data with super-additive counting cumulativity. I also show that the Power Function MaxEnt Learner is able to detect the super-additive constraint and raise the

As mentioned above, I adopt R_{B} are captured by a set of OCP constraints: *C̬C̬ and *DD.

The Japanese database was fitted using the Power Function MaxEnt Learner. Weights and exponents of the constraints were initialized and updated the same way as in the Korean simulation. The learner converged after 13,280 iterations. The resulting set of weights and exponents, as well as the fit of the grammar to the data are reported in (28). As mentioned above, tableau (28) is only showing the phonological contexts of W_{B}. To reiterate, D represents a light syllable with a voiced obstruent, N represents a light syllable with a nasal consonant, and _{B} underlyingly has a single nasal consonant.

(28)

Output of the Power Function MaxEnt Learner: good fit

The Power Function MaxEnt Learner successfully detected the super-additive constraint, *+C̬C̬, and raised the exponent for that constraint only. The adjusted exponents and the weighted constraints correctly capture the crucial aspects of the observed Rendaku patterns. First, the nearly inviolable restriction of Lyman’s Law is reflected by the high weight of *+DD. Second, R^{1.6} ≈ 2), which are incurred by a W_{B} with two nasals undergoing Rendaku, outweigh R

In this section, the effects of nasals on blocking Rendaku were examined. The presence of one voiced obstruent in the second element significantly blocks voicing due to Lyman’s Law (

This paper investigated a type of cumulativity observable in natural languages, called ^{b}

I reported two natural language examples that show super-additive counting cumulativity. In a compounding process of Korean and Japanese, the initial onset of the second element undergoes an alternation. The presence of marked consonants inhibits the likelihood of this alternation, motivated by OCP constraints, in a super-additive manner; the presence of one only has negligible effects whereas the presence of two significantly dampens the alternation process. The negligible effect of one violation was analyzed in Korean as the morpheme-internal restriction being weakened over a morpheme boundary (e.g.,

This paper also has provided an implementation of the modified MaxEnt model which learns exponents and weights of the constraints. Learning simulations on the quantitative data obtained from a survey and a corpus show that the implemented MaxEnt learner successfully detected super-additive constraints and raised the exponents only for those constraints. With the adjusted parameters, the MaxEnt model was able to reproduce the probabilistic candidate distributions.

One of the reviewers asked if the common alternative, tier-based local constraints, as opposed to constraints against offending subsequences, could be an option here. The observed pattern can certainly be reproduced by an alternative approach. For example, I can suppose a tier on which only the actively interacting segments, laryngeally marked consonants and the morpheme boundary, are visible. If I attend to substrings on this tier, the difference between the candidate (15)-a and the candidate (16)-a will have to be made by introducing another constraint *TT+T, which penalizes (16)-a and not (15)-a. The candidate (16)-a would then incur one violation of *TT+T on top of one violation of *T+T, rather than incurring two violations of *T+T. While the weight of *T+T would capture the small OCP effect and the weight of *TT+T would capture the maximized OCP effect, successfully reproducing the observed input distributions, grammars with these constraints are not guaranteed to give *TT+T greater weight than *T+T, which can lead to typological pathologies like the one described in §2.3. By contrast, if I attend to subsequences and assign multiple violation of one constraint for the candidate (16)-a, the weights are guaranteed to monotonically increase with more violations, under my proposed power function approach.

Among other frameworks that deal with derived environments, I use this specific approach of Comparative Markedness for the simplicity and the ease of presentation.

One of the reviewers asked if two liquids and two approximants also can contribute to blocking Rendaku. The current dataset does not allow a clear investigation of these effects due to the lack of relevant data but it is likely that those effects exist. First, of 51 compounds with two /r/s in W_{B}, only 7 (14%) undergo voicing; but all of these examples can be explained away by either having a verb as W_{B} or having a tautomorphemic W_{B}, violating the Right Branching Condition. Second, while there were no compounds with two approximants in W_{B} in the current database, the recent nonceword study by Kawahara & Kugamai (

My special thanks go to Michael Becker for his wonderful advising. I also thank Gaja Jarosz and Max Nelson for their help with the development of my MaxEnt learner. I appreciate the associate editor as well as three anonymous reviewers for insightful comments and helpful feedback. The Japanese portion of this project would have been impossible on my own. Specifically, I deeply thank Mark Irwin and Eric Rosen for kindly letting me use their databases. I also received tremendous help from my Japanese colleagues in linguistics: Chikako Takahashi, Hironori Katsuda, Mari Kugemoto, and Yosho Miyata. Remaining errors are all my own.

The author has no competing interests to declare.