1 Introduction

In conversational English, word-final coronal stops in consonant clusters are variably omitted, a process known as coronal stop deletion (CSD). As with other variable processes, the probability of any such stop being deleted is sensitive to a range of contextual factors. In the case of CSD, the strongest conditioning factor is the segment that comes after the coronal stop: deletion rates are dramatically lower before vowels than before consonants. This empirical fact, which I refer to as the following segment effect (FSE), has been observed in quantitative CSD studies across many different English varieties, including general American English (Labov et al. 1968; Wolfram 1969; Guy 1980), African American English (Labov et al. 1968; Wolfram 1969; Fasold 1972; Labov 1972), Chicano English (Santa Ana 1991), Appalachian English (Wolfram & Christian 1976; Hazen 2011), British English (Tagliamonte & Temple 2005), Canadian English (Walker 2012), New Zealand English (Holmes & Bell 1994; Guy et al. 2008), Singapore English (Lim & Guy 2005; Gut 2007), Hong Kong English (Hansen Edwards 2016), Nigerian English (Gut 2007), and even the English-lexified Jamaican Creole (Patrick 1991).

In this paper I use corpus data on CSD to show that other contextual factors can influence the magnitude of the FSE. Wagner (2012) observes that cross-word following context effects in general, which are commonplace in the quantitative study of phonological variation, may vary in their strength depending on the juncture between the target segment and the following environment. He proposes that this observation can be understood with reference to online sentence production planning, an idea dubbed the Production Planning Hypothesis (PPH) by Tanner et al. (2017). In Wagner’s words, “phonology in principle applies blindly across word boundaries, but…the phonological environment in the next word might not be available at the time of phonological evaluation simply because production planning has not yet progressed that far” (2012: 2). I refer to these hypothesized cases where the phonological makeup of the next word is not yet available in the speaker’s speech plan as co-presence failures. Co-presence failures bleed following context effects, so environments in which co-presence failures are more likely should have weaker following context effects.

Exactly which environments increase the rate of co-presence failures is an open question that hinges on the mechanisms with which sentences and their articulatory implementation are planned. Planning itself is known to be flexible in scope depending on task and memory demands (Smith & Wheeldon 1999; Ferreira & Swets 2002; Wagner et al. 2010; Konopka 2012: inter alia), and the entire process of planning out an utterance—from conceptual structure to word order to phonological content to gestural commands—is generally thought to implicate distinct planning subsystems for different levels of linguistic structure (Garrett 1975; Levelt 1989; Levelt et al. 1999: inter alia). Building on previous work on the PPH, I investigate a number of factors that I call planning proxies. Planning proxies are contextual factors that are expected to either affect or covary with co-presence failure rate (which we cannot observe directly) and thereby modulate the strength of the FSE (which we can observe directly). The planning proxies under consideration in this paper, which I discuss and motivate in Section 1.3, are post-boundary phonetic duration, lexical frequency, speech rate, and syntactic boundary strength. The first three of these planning proxies have been recently tested on corpus CSD data (Tanner et al. 2017) as well as corpus data from other variable phonological processes (Kilbourn-Ceron 2017a; b; Kilbourn-Ceron et al. 2017; Kilbourn-Ceron & Sonderegger 2018), while the latter has been tested on other phonological variable processes using lab-based sentence reading tasks (Wagner 2011; Kilbourn-Ceron 2017b; Kilbourn-Ceron et al. 2017). By manually coding an existing CSD database for the syntactic boundaries intervening between target stop and following segment, I show that, as predicted, the FSE is weakened across strong syntactic boundaries. Syntactic boundaries have a similar effect in this corpus CSD data, then, as they do in experimental data on other variables, strengthening the connection between the experimental and corpus literatures on this topic and adding to the mounting evidence in favor of the PPH.

I further observe that the weakening of the FSE across strong syntactic boundaries is asymmetrical, in that it is primarily driven by increased stop deletion in the pre-vocalic context across strong boundaries. The converse effect appears minimal: the rate of CSD before consonants is very similar across strong and weak boundaries. This asymmetry is not straightforwardly predicted by the PPH; rather, it is reflective of how the PPH interacts with whatever mechanism is responsible for the FSE. I argue that the asymmetry points toward an explanation of the FSE in which it is the influence a following vowel that produces the FSE, rather than the influence of a following consonant: if the FSE arises from a vowel-specific process, then the bleeding effect of co-presence failures should primarily affect pre-vocalic contexts. One such explanation is Guy’s (1991) intuitively-appealing argument that syllabification as the onset of the following word can save a pre-vocalic stop from deletion. In contrast, another intuitively-appealing mechanism that has been proposed to explain CSD and its FSE—the masking of the coronal stop gesture by overlap from the following gesture (Browman & Goldstein 1989)—is consonant-driven and should exhibit an asymmetry opposite to that observed. I therefore suggest that the pattern of results in this analysis provide a novel source of evidence that weighs in favor of a syllabification-based account of the FSE.

Section 1.1 briefly recaps how the FSE has been treated in previous studies of CSD. I go on to survey the empirical evidence in support of the PPH in Section 1.2, then lay out its predictions for the current study in Section 1.3. Section 2 explains the data extraction and coding methods; the results of statistical analysis of these data are presented in Section 3 and discussed in Section 4. Section 5 concludes.

1.1 The source of the following segment effect on CSD

Empirical work on English CSD has played a role in the development of a range of different theoretical frameworks; accounts of what linguistic mechanisms lead to deletion are varied. While the empirical facts are clear on the robustness and universality of the FSE, explanations for its source are equally varied.

The earliest quantitative work on CSD is found in Labov et al. (1968), a foundational study in the Labovian variable rule tradition. In this framework, the rule leading to deletion of a coronal stop can be weighted in its probability of application according to the substance of other contextual elements, such as morpheme boundaries or following segments. The formulation of CSD as a Labovian variable rule thus stipulates (by writing into the rule) that the rule applies at higher rates before consonants than before vowels. Of course, it is widely understood that such an approach can achieve great descriptive precision but could equally well describe a counterfactual state of affairs in which CSD applies more often before vowels than consonants. Such flexibility may be appropriate in cases of social conditioning where the facts could be arbitrary, but the overwhelming evidence for the universality of the basic FSE on CSD leads us to seek a deeper explanation.

One such explanation that has been put forward is that the FSE is due to syllabification bleeding deletion (Guy 1991). When the next word begins with a vowel, the word-final coronal stop that would be targeted by deletion is “saved” by recruitment as an onset consonant for the following word. One piece of evidence that Guy puts forward to support this proposal is a discrepancy between deletion rates before /l/ and /r/, which in earlier work had been combined as a natural class. In Guy’s data from several varieties of American English, deletion is frequent before /l/, which is disallowed as the second element in an English onset cluster, but rare before /r/, which may become part of a well-formed onset cluster with /t/ or /d/ (as in train or drive). In Guy’s Lexical Phonology analysis, CSD is the result of an autosegmental delinking rule that applies at each level of the morphophonology and is followed by stray erasure of unlinked segments at the end of the derivation. One question that this analysis doesn’t resolve is why there should be any pre-vocalic deletion at all; Guy suggests briefly that “syllabification may itself be a variable rule” (1991: 18) but does not pursue the implications of that possibility. Nonetheless, his syllabification-based explanation for the FSE has intuitive appeal and has been reworked for analyses of CSD in other phonological frameworks such as Optimality Theory (Kiparsky 1993; Reynolds 1994).

A quite different perspective on CSD comes from Articulatory Phonology (Browman & Goldstein 1992), in which the basic units of speech, gestures, are “primitive actions of the vocal tract articulators” (Browman & Goldstein 1989: 201). In Articulatory Phonology, the actions of different articulators (such as lips, tongue tip, or velum) are coordinated across different tiers in the production of continuous speech, with gestures on different tiers naturally exhibiting temporal overlap. The extent and perceptual consequences of such overlap are, in this framework, a major source of phonological variation, and can elegantly account for phenomena such as deletion. Browman and Goldstein observe that “with sufficient overlap, one gesture may completely obscure the other acoustically, rendering it inaudible” (Browman & Goldstein 1989: 215). They give the example of the phrase “perfect memory,” in which the /t/ is “hidden” by the labial gesture of the /m/ despite still being fully articulated by the tongue tip. The Articulatory Phonology account of CSD is supported by the observation that CSD exhibits phonetic gradience in both the acoustic signal (Temple 2009) and the articulatory gesture (Purse & Turk 2016). In Articulatory Phonology, the FSE has a natural interpretation: it is consonantal gestures, with their extreme or even complete constriction of the vocal tract, that produce the gestural masking that gives rise to the percept of deletion. Vocalic gestures do not hide consonant gestures in the same way; rather, “consonant articulations are superimposed on continuous vowel articulations” (Browman & Goldstein 1990: 10). The FSE, then, is not an adjustment of CSD rates in different phonological contexts, but in fact is one and the same phenomenon as CSD: masking of word-final coronal stops by the consonant that follows.

The observation of acoustic and articulatory gradience in CSD is also consistent with what Jurafsky et al. (2001) call the probabilistic reduction hypothesis, under which CSD may be treated as a gradient phonetic reduction process rather than a probabilistic phonological rule with binary outcomes or an epiphenomenon of gestural overlap. Under the probabilistic reduction hypothesis, words are more likely to undergo various reduction processes when they are more predictable in their context. While CSD has been a source of evidence in favor of the probabilistic reduction hypothesis and related usage-based accounts (Jurafsky et al. 2001; Gahl & Garnsey 2004; Ernestus 2014), this hypothesis does not in itself straightforwardly predict the FSE. In principle, perhaps, distributional asymmetries between consonants and vowels across contextual predictabilities could be used to construct a predictability-based account of the FSE. Probabilistic reduction is relevant to this paper, though, because contextual predictability is a factor that may influence production planning.

In Section 4.1, I will return to the nature of the FSE and argue that its interaction with syntactic boundaries provides new evidence in favor of an analysis that is at least broadly similar to Guy’s syllabification proposal.

1.2 Empirical evidence for the PPH

An early paper demonstrating that syntactic boundaries can interact with phonological rule application is Cooper et al. (1977), a study of the trochaic shortening rule in which “the duration of a stressed syllable that is immediately followed by an unstressed syllable (therefore forming a bisyllabic trochee) is normally shortened relative to its duration as a monosyllable” (Cooper et al. 1977: 1314). In a laboratory reading experiment, they compare the duration of the syllable /klɪn/ in “Clinton till,” where trochaic shortening applies within the word Clinton, with the duration of the same syllable in “Clint until,” where the domain for trochaic shortening extends across the word boundary. They also vary the structure of the sentences to change the syntactic relationship between Clint/Clinton and the following prepositional phrase. They observe that for all the boundary types they test, /klɪn/ is shorter in Clinton till than in Clint until, and suggest that the syntactic boundaries block application of trochaic shortening. While this result could profitably be connected to the expansive literature on syntactic locality or prosodic domain restrictions on phonological rules (Chomsky & Halle 1968; Selkirk 1986), Cooper et al. appeal specifically to production planning to explain the effect, writing, “For cases in which a syntactic boundary blocks a rule normally applicable across word boundaries, we can infer that the boundary acts as a juncture in the speaker’s processing, prohibiting any following information from influencing segments preceding the boundary” (Cooper et al. 1977: 1314). In a second experiment, they replicate the result using different pairs of test sentences and additionally observe that the duration difference does not obtain when the unstressed syllable that triggers shortening is in a noun phrase object (“Horace brought Clint an enormous turtle” versus “Horace brought Clinton enormous turtles”). In other words, the boundary preceding a direct object is not strong enough to block rule application. After considering various possibilities, they operationalize the relevant boundary strength differences in terms of branching depth. Finally, they briefly report similar results from a study of variable palatalization of /d/ across word boundaries before /j/, as in “had yet” being pronounced /hɑdʒjɛt/.

Wagner (2011) reports a laboratory production experiment about the English alternation between /ɪn/ and /ɪŋ/ for the <–ing> suffix. Use of the coronal form instead of the velar is supposed to be more likely when the following segment is coronal, an effect which Wagner hypothesizes should be sensitive to the syntactic relationship between the target word and the following segment. In a sentence like, “While the man was reading, a book fell off the table,” reading and a book are in a syntactically non-local relationship. This is in contrast to a sentence where a book is a direct object and thus syntactically close to reading, such as, “While the man was reading a book, a glass fell off the table.” In Wagner’s experiment, participants read aloud sentences that vary both in the syntactic juncture following the target <–ing> suffix and in whether the article used is definite or indefinite. The latter manipulation evinces the expected FSE: when a book is replaced by the book, participants use /ɪn/ more often as a result of the coronal /ð/. In the sentences where a/the book is syntactically non-local to reading as opposed to being a direct object, though, that FSE is significantly weaker. Moreover, the size of the FSE has a gradient negative relationship with the duration of the following article. Wagner interprets the duration of the article as an acoustic proxy for the likelihood that the article had been phonologically encoded when the ING variable is produced. He proposes that cases where the following word is not yet encoded bleed the FSE on <–ing> alternant choice, and that such cases are more likely across stronger syntactic and/or prosodic boundaries.

Tanner et al. (2017) spell out a number of predictions of the PPH for CSD in particular, then test these predictions on British English CSD data taken from a conversational speech corpus. Differently from the current study, which focuses on cases with no intervening pause between the coronal stop and the following segment, they are particularly interested in the gradient influence of intervening pause duration. As predicted, they find that the FSE is weaker when an intervening pause is longer. This result is similar to the gradient influence of article duration on the FSE on <–ing> choice in Wagner (2011), and is consistent with the PPH under the assumption that longer pauses may reflect cases where upcoming material is not yet planned (O’Connell et al. 1969; Butterworth 1975; Ferreira 2007; Fraundorf & Watson 2014). They also find that both lexical frequency and speech rate, two factors also included in the current study, modulate the FSE. The direction of the frequency interaction is such that the FSE is weaker in higher frequency target words, which may reflect the greater likelihood that easily-retrievable high frequency words have already been planned before the following word is encoded (Oldfield & Wingfield 1965; Jescheniak & Levelt 1994; Alario et al. 2002: inter alia). The nature of the interaction between speech rate and following segment is non-linear, in that the FSE is minimized in both particularly slow and particularly fast speech. In addition to pause duration, lexical frequency, and speech rate, Tanner et al. include in their models a measure of the conditional probability of the following word given the target word: an unpredictable upcoming word, just like a globally infrequent one, should prove more difficult to encode and therefore be associated with co-presence failure. Modeling the effect of following word conditional probability, though, proves difficult due to high interspeaker variability even in their quite large dataset; because the current study is smaller, and especially has much less data per speaker, I do not pursue replication of this predictor. Tanner et al. point out that the absence of syntactic annotation in their dataset prevents them from directly investigating the influence of syntactic locality. The current study is both an attempt to replicate Tanner et al.’s (2017) results on a quite different English variety, and a novel implementation of Wagner’s (2011) syntactic locality predictions in corpus data. Tanner et al. benefit from a large sample size but lack syntactic information; I make the trade-off in the other direction and prioritize the manual extraction of syntactically-detailed information at the partial expense of sample size.

Kilbourn-Ceron (2017b) reports three case studies aimed at testing the predictions of the PPH on quantitative data from different variable processes. The first study, further developed in Kilbourn-Ceron & Sonderegger (2018), uses data from the Corpus of Spontaneous Japanese (Maekawa et al. 2000) to look at Tokyo Japanese high vowel devoicing between voiceless consonants and before pauses. Previous work had shown that this devoicing process (which Kilbourn-Ceron and Sonderegger ultimately argue is two separate processes) is less likely across morpheme and word boundaries than word-internally (Varden 1998; Imai 2004). Kilbourn-Ceron and Sonderegger find that not only is devoicing more likely across word boundaries, but also its likelihood drops as the pause between words gets longer. Parallel to Tanner et al. (2017), they argue that this is consistent with the PPH because the devoicing rule depends on a following voiceless consonant which is less likely to have been planned in time to trigger devoicing when it is across a stronger prosodic boundary.

In Kilbourn-Ceron’s second case study (also reported in Kilbourn-Ceron et al. 2016 and Kilbourn-Ceron et al. 2017), she and her colleagues investigate contextual effects on the intervocalic flapping of coronal stops, which is variable across word boundaries in American English. Stimuli in a laboratory production experiment contain a flappable stop in a nonce word embedded in an English sentence, with syntactic boundary location manipulations comparable to those from Wagner (2011). They use pre-flap vowel duration to control for pre-boundary lengthening and conclude that stronger syntactic boundaries have an effect of reducing flap likelihood above and beyond prosodic boundary strength. They also conduct a corpus study of the same phenomenon using data from the Buckeye Corpus (Pitt et al. 2007). The corpus data produces a significant positive association between flapping probability and following word frequency. Because flapping is a process whose planning requires intimate involvement with the first segment of the following word, the predictability of the following word as measured by lexical frequency functions as a planning proxy.

The same logic underpins the final case study in Kilbourn-Ceron’s dissertation (also reported in Kilbourn-Ceron 2017a): variable French liaison, when certain consonants are realized word-finally only before following vowels. She shows that in the Phonologie du Français Contemporain corpus (Durand & Lyche 2003), global lexical frequency of both the first and second word as well as predictability of the second word given the first word positively correlate with liaison rates. This is expected under the PPH because high predictability is expected to facilitate access of phonological form (Oldfield & Wingfield 1965; Jescheniak & Levelt 1994) and so decrease the likelihood of co-presence failure. This is also the only previous study that has successfully linked PPH predictions for corpus data to questions of syntactic structure: the liaison rate is much higher across a weaker syntactic boundary (Adj–Noun as opposed to Noun–Adj).

Finally, a paper in which the predictions of the PPH find perhaps less support is MacKenzie’s (2016) study of variable English auxiliary contraction. The following context effect at play in this case is not segmental in nature but rather is the sensitivity of contraction to the grammatical category of the following constituent. This following category effect, robust through it is with respect to the probability of contraction, does not interact with duration of the following word. MacKenzie suggests that if the planning of syntactic structure and lexical content takes place earlier than phonological encoding, as many sentence production models propose (Garrett 1975; Sternberg et al. 1978; Bock & Levelt 1994), then the following constituent’s syntactic identity may invariably be in place prior to the speaker’s choice of auxiliary allomorph. Her paper, then, can be thought of not as a refutation of the PPH but rather a refinement of it: only a planning proxy at the relevant planning level should be expected to modulate following context effects.

1.3 Planning proxies

The current study tests whether the FSE on CSD interacts with each of four other contextual factors, all of which may be considered potentially relevant planning proxies because previous work on the PPH has suggested they may modulate following context effects:

  1. The strength of the syntactic boundary between the CSD target and its following segment
  2. The relative duration of the segment following a CSD target
  3. The lexical frequency of the CSD target word
  4. The speech rate around the time of a CSD target

This study was primarily designed to investigate the role of syntactic boundaries in conversational CSD. The reason I focus on the syntactic question is that, while important evidence for the PPH comes from read speech tasks in which syntactic differences are the central manipulation of interest, previous studies investigating the PPH in conversational speech have not been able to incorporate any syntactic information. A goal of this paper is thus to look for evidence that syntactic structure modulates following context effects in conversational speech, while still taking into account other planning proxies that may be related or independently active. Fully disentangling the influence of syntax and prosody on phonological variation would require a larger dataset annotated with both syntactic and prosodic information. The focus on syntactic boundaries in this paper is not intended to imply that the only relevant structure is syntactic. Given that research on the PPH is relatively new and still developing, my position in this paper is that syntax and prosody both serve merely as proxies for the variable scope of production planning, which we cannot observe directly. The relationship between syntactic and prosodic structure is, of course, a broad area of inquiry and will continue to be of interest in the study of phonological variation.

The discussion in Section 1.2 above provided background on each of these predictors; I briefly recap the motivations for the inclusion of each, then expand on the role of duration in this study because it diverges somewhat from previous similar studies.

Stronger syntactic boundaries are predicted to reduce the FSE on the assumption that direct objects are more likely to have been planned in the same planning window as the verbs that select them, whereas less local adjuncts or separate clauses are less likely to have been planned simultaneously with the target word (see Wheeldon 2012 for an overview of evidence on the syntactic scope of planning). I discuss the syntactic coding and selection of strong and weak syntactic boundary contexts at some length in Section 2.2. High frequency words are expected be easily accessible and therefore planned early (Oldfield & Wingfield 1965; Jescheniak & Levelt 1994), so co-presence failures and subsequent weakening of the FSE should be associated with high frequency words. Speech rate might influence FSEs by dynamically adjusting the size of the planning window; Wagner et al. (2010) argue that the window for planning gets narrower at faster speech rates, which would suggest that faster speech might exhibit weaker cross-word effects. Speech rate can be thought of in absolute terms (whether someone is a slow or fast talker) or relative terms (whether a stretch of speech is especially slow or fast for a particular talker); following Tanner et al. (2017), I calculate these separately as characteristic speech rate and speech rate deviation, and focus on the latter in terms of its interaction with the FSE.

Finally, stronger prosodic boundaries are predicted to reduce the FSE on the premise that planning units may in fact be prosodic units (Keating & Shattuck-Hufnagel 2002), such that prosodic breaks might correlate with unplanned upcoming material. The precise proxy used for prosodic boundary strength has varied across previous studies of the PPH. Typically boundary-adjacent durational measures are used, taking advantage of the general phenomenon that strong prosodic boundaries tend to induce phonetic fortition of adjacent segments along a number of dimensions; see Keating (2006), Fletcher (2010), and Cho (2011) for relatively recent overviews of the sizeable literature on both domain-final and domain-initial strengthening.

However, choosing an appropriate duration measure is not a trivial task. Kilbourn-Ceron et al. (2017) use pre-boundary (domain-final), pre-flap vowel lengthening as a prosodic boundary proxy in their study of /t/-flapping, while Tanner et al. (2017) use pause duration at the boundary in their corpus study of CSD. Since CSD targets the final elements of consonant clusters in particular, I avoid taking preceding segment duration as a proxy for boundary strength, because I expect this durational measure to be sensitive to the presence or absence of the coronal stop (and thus in many cases, cluster versus singleton consonant status of the preceding segment). And, as discussed in Section 2.3, the existing dataset on which this study was built excludes information about following segments across pauses of any non-negligible length. Having ruled out pre-boundary and at-boundary durational measures, I turn to post-boundary (domain-initial) segment duration as a potential reflection of prosodic boundary strength.

The use of post-boundary segment duration as a prosodic proxy partly follows Wagner’s (2011) use of domain-initial function word reduction in his experimental work on PPH effects on <–ing>. The advantage of the experimental context in that case is that there were only two following words in use, “a” and “the.” Doing speaker-specific word duration normalization across the many possible following words in the corpus data is not viable. By-speaker segment duration normalization is more achievable, and is further motivated by the general finding that domain-initials strengthening effects are most strongly apparent on the first segment after the boundary (Byrd et al. 2006; Byrd & Choi 2010: inter alia). But the use of following segment duration poses its own challenges. While there is strong and cross-linguistically diverse evidence for domain-initial consonant lengthening (Fougeron 2001), the facts about domain-initial vowel lengthening are somewhat murkier. While it has been claimed that domain-initial vowels exhibit minimal if any lengthening, many instances of this claim have been made on the basis of vowels in a #CV context, which are not, strictly speaking, linearly domain-initial (e.g. Cho & Keating 2001). Fougeron (2001) finds only one out of four test subjects to exhibit domain-initial vowel lengthening in French. More recent papers on Korean from Lee (2007) and Cho et al. (2014), both comparing C#V and #CV contexts, diverge in their results. Lee finds that word-initial vowels are shorter when they follow stronger prosodic boundaries, whereas Cho et al. find domain-initial lengthening for both vowels and consonants. I return to the question of consonant–vowel lengthening differences, and their ramifications for this study, in Section 2.3.

2 Data and methods

The data for this study come from a 118-interview subset of the Philadelphia Neighborhood Corpus (PNC) (Labov & Rosenfelder 2011) for which the dependent variable of CSD had already been auditorily coded, as reported in Tamminga (2014). The recordings that constitute the PNC are of sociolinguistic interviews with English speakers native to the Philadelphia area. The particular corpus subset used in Tamminga (2014) contains partly- or fully-transcribed and forced-aligned interviews (30–60 minutes long) with white upper working class and lower middle class Philadelphians. Three types of possible CSD tokens were excluded prior to auditory coding. For white Philadelphian English speakers, there is essentially no CSD in /rt/ and /rd/ clusters (Cofer 1972; Guy 1980), so those contexts were excluded. The hyper-frequent lexical item and was also excluded from the original round of data coding because it has been argued to have a distinct allomorph that has no underlying stop (Neu 1980; Guy 2007). Finally, CSD tokens in neutralization contexts (that is, before coronal obstruents that mask the CSD outcome) were also excluded from the original dataset. The master dataset of all the tokens that were coded for Tamminga (2014) (that is, all of the CSD tokens in 118 interviews except those in the three exclusion contexts just listed) contains 15,874 observations of CSD. However, many of these observations are not ideal for inclusion in an analysis targeting the research questions laid out in Section 1.3. I therefore made two further rounds of data exclusions to achieve certain analytical aims. The first was a narrowing of the dataset on morphological and phonological grounds to control some of the wide range of factors known to influence CSD. After hand-annotating the relevant syntactic relationships for the entirety of that dataset, I then extracted the subset of all the observations that exemplified the syntactic contexts to be compared. These rounds of data extraction and coding are described in more detail in the following two subsections.

2.1 Restrictions on included data

The predictors of CSD are many and varied, posing a statistical modeling challenge for a study of yet more predictors and their interactions. A guiding principle of this study was to seek a reasonable compromise between two competing goals: 1) to reduce the number of predictors and interaction terms to avoid an overly complex model that is difficult to interpret; and 2) to retain enough data that real effects will be apparent. For several of the predictors that are known to influence CSD, I chose a strategy of including only tokens with one or two values of the predictor prior to manual coding of syntactic structure. I limited my attention to the two largest morphological categories typically distinguished in CSD studies: monomorphemes, such as “act” or “blend,” or regular preterite forms like “walked” or “zoned.” Passive participles, although often combined with the preterites, are excluded in this study, as are all irregular past tense forms like “went,” “swept,” “lost,” or “cost.” All observations included in this dataset are monosyllabic to improve consistency across observations in lexical stress patterns. They also all have a homovoiced final cluster (voiceless consonant plus /t/ or voiced consonant plus /d/)—final clusters in preterite forms are, by voicing assimilation rule, all homovoiced, so I limit the monomorphemes included to match. The segment preceding the coronal stop is another well-established determinant of CSD rates, if not an especially strong one (Guy 1980). Observations of CSD are distributed fairly evenly across a larger number of preceding segments, so the narrowing approach of focusing on a single category in this case would result in an insufficient amount of data. Preceding segment is instead controlled through the statistical analysis reported in Section 3, with six types of preceding segment distinguished: stop, fricative, liquid, nasal, sibilant, affricate. Of these, previous work exhibits minor disagreements but suggests that sibilants should favor deletion most highly and liquids the least (Labov 1989; Patrick 1991).

After restricting the existing data set in these ways, there were a total 2,488 tokens remaining in the dataset to be coded for syntactic structure.

2.2 Coding syntactic boundaries

The CSD-coded tokens were matched back up to the original orthographic transcript using a Python script, then the dependent variable outcome codes hidden until after all syntactic coding had taken place. The goal of the syntactic coding was to extract the two subsets of the data that, when juxtaposed, represent maximal differentiation of boundary strength, rather than to exhaustively characterize each token’s syntactic structure. Table 1 lists and provides examples of the structural boundary types that were used to achieve this goal. Following the results from Cooper et al. (1977), I make a comparison between weak boundary sentences, where the following segment begins a direct object, and strong boundary sentences, where there is a greater hierarchical distance between the coronal stop and the following segment. The sentence types included were selected as being relatively straightforward to define and detect while occupying near-endpoints to a possible continuum of boundary strengths. The first two categories in Table 1 are those where the coronal stop falls right before a juncture between two separate independent (matrix) clauses, with or without a conjunction (Matrix CP + Matrix CP; Matrix CP + Conjunction + Matrix CP). The second two categories are those where there is an adjunct, typically a prepositional phrase or adverbial phrase, that is either preposed (High adjunct + Matrix CP) or right-adjoined higher than the verb phrase containing the target word (Matrix CP + High adjunct). The latter are often similar to the sentence types in which Cooper et al. (1977) identified the syntactic blocking of trochaic shortening. Taken together, I treat the total of 269 utterances falling into these four categories as having strong syntactic boundaries. In the remaining category, I include only cases in which the coronal stop is at the end of a verb (whether monomorphemic or past tense) and the following segment begins a direct object to that verb. This is the boundary type that Cooper et al. (1977) found did not block cross-word rule application. There are a total of 567 such utterances in the dataset, which I treat as having weak syntactic boundaries. The dataset for analysis, after these rounds of exclusions and subsetting, contains 836 observations from 96 unique speakers and 261 unique words. The distribution of word types and following segments across the syntactic boundary types are given in Table 2.

Structure Example

Strong syntactic boundary
Matrix CP + Matrix CP They go up the street too fast. ‖ I think we need speed bumps.
Matrix CP + Conj. + Matrix CP And then I make my crust ‖ and I fill it up.
High adjunct + Matrix CP When you get old, ‖ everything bothers you.
Matrix CP + High adjunct I thought he was a good friend ‖ until that point.
Weak syntactic boundary
Verb + Direct object You can’t find ‖ a cork today.

Table 1

Syntactic boundary types included in the analysis. Target CSD token in boldface.

Boundary type Total tokens Unique words Tokens pre-C/pre-V

Strong 269 55 98 pre-C/171 pre-V
Weak 567 118 193 pre-C/374 pre-V

Table 2

Distribution of lexical items and following segments across syntactic boundary types.

An alternative characterization of the weak versus strong syntactic boundary distinction made here is that of syntactic locality versus non-locality. Framing the contrast in locality terms would be consistent with the discussion from Wagner (2012). However, it is beyond the scope of this paper to assess whether the active distinction is a binary local/non-local one or a more gradient measure of hierarchical distance. I do not attempt to quantify boundary strength (for example, by counting branches as Cooper et al. 1977 suggest) because to do so would require imposing a great deal of syntactic theory without knowing exactly what quantification of boundary strength is relevant to phonological planning. I avoid locality-related terminology because it implies a stricter claim than the boundary strength terminology: a non-local relationship could fairly be described as involving a strong syntactic boundary, but it may not be the case that all syntactically non-local configurations involve boundaries strong enough to interfere with production planning. It remains quite possible that locality is in fact the relevant distinction; future experimental work might pit these possibilities against each other rather than deliberately confounding them.

2.3 Extraction and normalization of other predictors

2.3.1 Following segment duration

Following segment duration was extracted automatically from the FAVE aligned Praat TextGrids used in the original coding. Because different phonemes will have different inherent lengths, I z-score normalized following segment duration by speaker by segment. So, for example, a following primary-stressed /o/ from Speaker A will only have a long normalized duration value if it is a particularly long instance of stressed /o/ for Speaker A, not just by virtue of being a tense vowel or by virtue of Speaker A having an overall slow speech rate. The occurrence of following segments in the data set at hand is not robust enough to allow well-estimated normalization, since most speakers contribute only a few observations each. To obtain sufficient data to z-score normalize segment duration within speaker, I used a Python script to extract all instances of every phoneme from the TextGrid of that speaker’s entire conversation transcript.

The FAVE aligner does not recognize pauses with durations shorter than 30 milliseconds (Rosenfelder et al. 2014), so true continuous speech and speech with extremely short pauses between words is not differentiated in this data set. Cases where there is a pause longer than 30 milliseconds were excluded because information about the word following the pause is absent in the pre-existing dataset from which the data for this study is taken. In the CSD literature, a following pause is traditionally treated as a distinct following environment, not as something intervening between the coronal stop and its following environment. The motivation for treating following pause as its own type of environment is that Guy (1980) found that American English dialects can differ in whether a following pause favors or disfavors deletion, meaning the effect of a following pause is arbitrary and must be learned. Given this traditional treatment of following pause, excluding pre-pausal CSD can be thought of as another simplifying move excluding a possible context from analysis, limiting attention to cases in which pause length is controlled to be negligible or zero. It is worth bearing in mind, however, that since pause presence/duration and syntactic/prosodic boundaries are expected to correlate, omitting pre-pausal CSD is expected to disproportionately exclude data from the strong syntactic boundary condition. Since Tanner et al. (2017) found that pause length is gradiently associated with FSE reduction (the longer the pause, the smaller the FSE), the exclusion of pre-pausal tokens should, if anything, lead me to underestimate the FSE-reducing effect of a strong syntactic boundary. The positive results on this predicted interaction that I report in Section 3 thus cannot be attributed to exclusion of pre-pausal CSD.

As discussed in Section 1.3, an issue of some confusion but also theoretical importance to the current study is whether the post-boundary lengthening effect is uniformly distributed over vowels and consonants. One way to investigate whether vowels and consonants differ in how they reflect prosodic boundaries in this data is to look at the distribution of durations across syntactic boundary types. Since syntactic structure is a major contributor to prosodic structure, it might be expected that strong syntactic boundaries correlate with longer following segment durations. Indeed, Wagner (2011) reports that the syntactically non-local condition and following word duration are highly correlated in his read speech data. If both consonants and vowels are longer after stronger prosodic boundaries, that effect may be visible in their apparent durational sensitivity to syntactic boundaries as well.

Figure 1 shows that there is an interaction between following segment type, syntactic boundary, and following segment relative duration. For consonants, the relationship between boundary and duration is in the expected direction: the duration density distribution has a wider right tail in the strong boundary condition than the weak boundary condition, indicating lengthening of consonants after strong syntactic boundaries. For vowels, the effect is the opposite: the weak boundary condition has a wider right tail indicating lengthening, and the strong boundary condition has a wider left tail indicating shortening. Two Wilcoxon rank-sum tests (non-parametric analogue to an independent 2-sample t-test) for syntactic boundary condition within the consonant and vowel subsets indicates that these durational differences are statistically significant (consonants: p = 0.0012; vowels, p < 0.001). While the vowel result may seem surprising, it bears an intriguing resemblance to Lee’s (2007) result that Korean vowels shorten domain-initially even while being strengthened in other respects. This result stands to complicate the interpretation of any observed effects of duration in this study, since it may be the case that vowels and consonants differ in the direction of their cue to prosodic phrasing and production planning. Further pursuing the behavior of domain-initial vowels is beyond the scope of this study. However, in order to maintain the function of following segment duration as a control predictor, I add a three-way interaction to the model: syntactic boundary strength by following segment type by following segment duration.

Figure 1 

Density of relative duration of following segment, by syntactic boundary strength and following segment type (observed data, N = 836).

2.3.2 Lexical frequency

Word frequencies are log-transformed whole-word frequencies from the SUBTLEX corpus (Brysbaert & New 2009) (so for example, the frequency used for walked does not reflect either the additional or relative frequency of its stem walk). Log frequency is z-score transformed before inclusion in the model so that the coefficients of other predictors are evaluated at an average log frequency value rather than an extreme one.

2.3.3 Speech rate

Following the discussion in Tanner et al. (2017), I include two different types of speech rate measure in the model. The first is characteristic speech rate, and the second is speech rate deviation. Characteristic speech rate reflects whether a speaker’s rate of speech in the interview is overall fast or slow compared to other speakers, whereas speech rate deviation reflects whether a particular CSD token is located in a stretch of speech that is fast or slow for that speaker. Both speech rate measures are based on the automatic calculation of number of vowels per second in a seven-word window centered on the word containing the target underlying coronal stop. While syllables per second may be a more usual unit speech rate, the data in question contain no syllabification information; since syllables are built around vowels, it is expected that vowels-per-second and syllables-per-second should be closely related measures. The raw syllables-per-second values are then normalized in two different ways to obtain the two distinct speech rate measures. To calculate characteristic speech rate, each speaker’s mean speech rate is calculated across all of their CSD tokens from the original dataset (which contained an average of 134 observations per speaker). Characteristic speech rate is this speaker-specific mean, z-scored normalized across speakers. Speech rate deviation, then, is within-speaker z-score normalized speech rate for each observation, again using the full CSD dataset to calculate the speakers’ means and standard deviations.

3 Results

Data analysis was done with mixed effects logistic regression using the lme4 package (Bates et al. 2015). While models including by-speaker random intercepts failed to converge,1 the model was built to include a by-word random intercept; however, the variance in the word-level random intercept turns out to be 0, suggesting that any apparent lexical variation can actually be captured by the other predictors. While there are many possible predictors and interaction terms, the model built was aimed at testing specific theoretically-relevant predictions, as enumerated in Section 1.3, and thus included only and all of the relevant predictors and interaction terms.

The categorical predictors of morphological class and preceding segment are included as control predictors and not entered into any interaction terms. I use sum-coded contrasts for the morphological class predictor so that the effects of interest will be evaluated at the weighted grand mean of monomorphemes and preterites. To achieve the same effect for the six-level categorical predictor of preceding segment would require quite a complicated contrast structure without being of particular interest here, so I instead use treatment coding with stops as the reference level because they are well represented and expected to have middling deletion rates. For the categorical predictors that are under theoretical investigation here, following segment and syntactic boundary, I use treatment contrasts with consonant and local boundary as the respective reference levels. As a reminder, the continuous predictors are transformed in various ways as described above in Section 2, even though for brevity’s sake the names of the predictors given in Table 3 do not refer to those transformations. The results of the regression model are shown in Table 3.

Estimate z value Pr (>|z|)

Intercept 1.34 4.71 <0.001
Monomorpheme 0.32 2.16 0.030
Following vowel –4.06 –12.72 <0.001
Preceding fricative –0.41 –0.92 0.355
Preceding liquid –0.10 –0.31 0.755
Preceding nasal 1.23 3.62 <0.001
Preceding sibilant 0.89 2.57 0.010
Preceding affricate 2.60 5.36 <0.001
Characteristic speech rate –0.01 –0.06 0.956
Speech rate deviation 0.10 0.60 0.549
Lexical frequency 0.18 1.10 0.272
Following segment duration –0.23 –0.89 0.375
Strong boundary –0.32 –0.86 0.387
Following vowel × strong boundary 1.12 2.50 0.013
Following vowel × following seg. dur. –0.02 –0.05 0.959
Following vowel × lexical frequency 0.39 1.71 0.088
Following vowel × speech rate dev. 0.00 0.01 0.996
Foll. vowel × strong boundary × foll. seg. dur. 0.09 0.18 0.855

Table 3

GLMM predicting coronal stop deletion, N = 836.

3.1 Main effects

The first seven lines of Table 3 after the intercept report the familiar effects of morphological class and phonological context on CSD. Monomorphemes show more deletion than preterite forms (β^=0.32,p=0.030). Following vowels have a large and significant negative effect on deletion rate (β^=4.06,p<0.001). This, of course, is the FSE, and by far the largest effect in the model. While fricatives and liquids have previously been reported to be less favorable to deletion than stops, here they do not differ significantly from stops (β^=0.41,p = 0.355; β^=0.10,p=0.755 respectively), although their coefficients are still in the expected direction. The three preceding segment types that were expected to favor deletion more than stops, though, do indeed each have a significant positive effect on deletion rate compared to the reference stop category (nasals: β^=1.23,p < 0.001; sibilants: β^=0.89,p = 0.010; affricates: β^=2.60,p < 0.001). These results are broadly consistent with many previous studies, and indeed with prior analysis of the larger dataset that these data were drawn from (Tamminga 2014; 2016).

Neither characteristic speech rate nor speech rate deviation has a significant effect on CSD in this dataset (β^=0.01,p = 0.956; β^=0.10,p = 0.549, respectively). Lexical frequency also does not have a significant main effect on CSD (β^=0.18,p = 0.272), nor does following segment duration (β^=0.23,p = 0.375). Finally, there is not a significant main effect of syntactic boundary strength (β^=0.32,p = 0.387); bear in mind that this means there is not a significant difference between the pre-consonantal deletion rates across strong and weak syntactic boundaries.

3.2 Interactions

The model contains five theoretically-motivated interaction terms: the two-way interaction of the following segment with each of the four planning-relevant factors plus the three-way interaction of following segment, following segment duration, and syntactic boundary. Of these, only the interaction of following segment and syntactic boundary is significant: across a strong syntactic boundary, there is more deletion before a following vowel than there is across a weak syntactic boundary (β^=1.12,p=0.013). Figure 2 shows the observed (not predicted) differences in CSD rate across the cross-tabulated categories of these two predictors. None of the other interaction terms are statistically significant (following vowel × following segment duration: β^=0.02,p=0.959; following vowel × lexical frequency: β^=0.39,p=0.088; following vowel × speech rate deviation: β^=0.00,p=0.996; following vowel × strong boundary × following segment duration: β^=0.09,p=0.855).

Figure 2 

Effect of following segment is reduced across stronger syntactic boundaries (observed data, N = 836).

3.3 Likelihood ratio tests

As a supplement to the Wald tests of the significance of individual predictors reported in Table 3, I also used a series of likelihood ratio tests (LRTs) to assess whether the model as a whole performed better with or without the interactions of theoretical interest. Each LRT compares a model with an excluded term to the full model. The first LRT compares the full model to a model excluding the three-way interaction. The subsequent LRTs test the effect of excluding each two-way interaction in turn, necessarily also excluding the three-way interaction when testing the two-way interactions that are nested in it. The chi-squared values and p-values from these LRTs, which are consistent with the results from the Wald tests reported in Table 3, are given in Table 4.

Term excluded χ2 (d.f.) Pr (>|χ2|)

Following vowel × strong boundary 9.54 (4) 0.049
Following vowel × following seg. dur. 8.67 (4) 0.070
Following vowel × lexical frequency N/A N/A
Following vowel × speech rate dev. 0 (1) 0.996
Foll. vowel × strong boundary × foll. seg. dur. 0.71 (2) 0.702

Table 4

Likelihood ratio tests of reduced models compared to the full model. Model excluding the following vowel × lexical frequency term did not converge.

4 Discussion

The premise of the PPH is that cross-word phonological processes may lack the relevant right-side information if production planning has not proceeded far enough to make that information available at the time the process applies in speech. While such co-presence failures are not directly observable, their consequences may be evident in reduced following context effects across junctures that are associated with planning delays. In Section 1.3, I discussed four proxies for the scope of production planning which might be expected to modulate the FSE on CSD: syntactic boundaries, prosodic boundaries (as reflected in domain-initial duration effects), lexical frequency, and speech rate deviation. While there is evidence from other corpus studies for the utility and relevance of the latter three proxies, the evidence of a role for syntactic boundaries on modulating following segment effects has come only from laboratory-based read speech experiments. This study was designed to investigate the relationship between syntactic boundary strength and the FSE while still controlling for the other proxies. The results in Section 3 indicate that the FSE on CSD is significantly smaller when the following segment is separated from the coronal stop by a strong syntactic boundary.

The significant interaction between FSE and syntactic boundary is strikingly similar to Wagner’s (2011) experimental demonstration that /ɪŋ/ ∼ /ɪn/ variation shows weakened regressive assimilatory conditioning across strong syntactic boundaries. Wagner’s interpretation of his results as reflecting production planning constraints can plausibly be applied to this study as well. Under such an interpretation, it is not the presence of the syntactic boundaries themselves that interact with the FSE, à la Chomsky & Halle (1968). Instead, syntactic boundaries are correlated with the scope of production planning, with strong syntactic boundaries being associated with delayed planning of the upcoming word. This gives rise to co-presence failures in which the phonological makeup of the next word is not available early enough to influence CSD.

Under this interpretation, the sensitivity of the FSE to syntactic boundaries is also in line with the results of Tanner et al.’s (2017) similar CSD corpus study in that the FSE is weaker when co-presence failures are more likely. Although the specific proxies they found to be associated with a weaker FSE were intervening pause duration, lexical frequency, and speech rate, their corpus was not syntactically annotated and therefore they were unable to directly test the effect of syntactic boundary strength. In the current study, the effect of a syntactic boundary was statistically significant while controlling for speech rate deviation, lexical frequency, and following segment duration. It seems that in this dataset, the most effective planning proxy is the strength of the syntactic boundary. In the current study, the effects of speech rate, lexical frequency, and following segment relative duration were not significant. It would be misguided to interpret this null result as good evidence for a true lack of effect. This study has considerably less power than Tanner et al.’s, and the lack of main effects of these predictors makes it unreasonable to expect significant interactions with the following segment effect.

When considering the relative importance of syntactic and prosodic boundaries, it is also important to keep in mind that the syntactic boundary predictor has an advantage over the prosodic boundary predictor in this dataset: I coded the syntactic boundaries directly but inferred the strength of prosodic boundaries through a boundary-adjacent durational measure. The effect that prosodic boundaries might have had is also, presumably, mitigated by the exclusion of contexts where there is a non-negligible pause at the boundary—an environment that drove a substantial portion of the effects in Tanner et al. (2017). We should remain open to the possibility that the syntactic boundary predictor is actually serving as a prosodic proxy if the intended prosodic boundary proxy, following segment relative duration (interacted with segment type), failed to adequately capture the strength of the prosodic break. Conversely, it also remains possible that Tanner et al.’s (2017) results may reflect the same syntactic boundary effect as seen in this study, through proxy measures that correlate with syntactic boundary strength. It is also quite plausible, and consistent with the PPH, that syntactic structure and prosodic phrasing, both being relevant to aspects of production planning, may both contribute to the weakening of following context effects; if so, it would be unsurprising that in this study, where syntactic context differences were maximized, the effect of syntax is most apparent, while in Tanner et al. (2017), where prosodic differences were maximized, the effect of prosody is most apparent. It will simply not be possible to disentangle the effects of syntax and prosody until we have a dataset that is both more comparable in size to the one from Tanner et al. (N = 11,504) but also enriched with the type of syntactic information manually coded here, a daunting task without a syntactically-parsed corpus of conversational English speech.

Under the PPH, though, all of these measures are actually just proxies for the variable, and invisible, span of the production planning window. Finding effects of any of these measures can be taken as support of the PPH from the perspective of understanding that planning constrains the production of variation. The more precise questions about exactly which circumstances disproportionately cause co-presence failures can be thought of primarily as questions about the scope and nature of sentence planning. Indeed, if the PPH is on the right track, and we are able in future work to estimate the unique contributions of syntactic structure, predictability measures, and prosodic phrasing to the reduction of following context effects, quantitative data on phonological variation could become a novel naturalistic source of information about how sentence planning proceeds.

4.1 The nature of the following segment effect

Beyond observing the existence of an interaction between syntactic boundary strength and following segment, the results in Section 3 also provide evidence that this interaction is asymmetrical. The rate of CSD before consonants is minimally sensitive to boundary strength, as can be seen in the lack of significant main effect for boundary strength when following segment is at its reference level of consonant in the regression model. These statistical results reflect a pattern that is equally apparent in the observed data, as graphed in Figure 2. The observed rate of CSD before consonants is 81% across weak syntactic boundaries and 78% across strong syntactic boundaries. The interaction with syntactic boundary strength arises primarily in the context of following vowels: the observed rate of pre-vocalic CSD goes up from 11% across weak syntactic boundaries to 33% across strong ones.

One way to think of this asymmetry is that the deletion-inhibiting effect of a following vowel is reduced in the context of a strong intervening syntactic boundary. This suggests interruption of a process that applies only in pre-vocalic environments and not in pre-consonantal environments, pointing us back towards syllabification-based explanations for the FSE (Guy 1991). To recap from Section 1.1, the essence of the proposal is that word-final coronal stops can be syllabified as onsets to a following vowel, which protects them from deletion. Stops preceding word-initial consonants cannot be resyllabified and therefore are always vulnerable to CSD. If this is correct, it is the pre-vocalic environments specifically that should be affected by variability in production planning. If a following vowel is not yet encoded (a co-presence failure), it cannot host the coronal stop as its onset, and a stop that might otherwise have been protected and therefore retained is instead subject to variable CSD. When there is a co-presence failure with a following consonant, however, it does not affect CSD rate because the consonant would have played no role in CSD rate even if it had been present at the right moment.

The other accounts for the FSE reviewed in Section 1.1 do not provide the same ready explanation for this asymmetry. Under an analysis where the presence of a consonant or vowel is equally “active” in influencing the likelihood of CSD (for example, in a variable rule with stipulated probabilities for those environments), we would expect co-presence failures to be distributed evenly across pre-consonantal and pre-vocalic deletion opportunities. We would thus expect the reduction of the FSE to be symmetric, with both a higher deletion rate before vowels and a lower deletion rate before consonants in the strong syntactic boundary context. In an Articulatory Phonology framework, we first of all might expect that any boundary-interacting effects should be more responsive to prosodic boundary strength than syntactic boundary strength, as Articulatory Phonology easily encompasses phrasal phonology but has no mechanism for connecting to the syntax. This point, though, is susceptible to the caveat discussed in the previous section, that the syntactic boundary may actually be serving as a prosodic boundary proxy. But even if the relevant boundary is the prosodic one, the gestural masking account of CSD in particular makes the wrong prediction about the asymmetry of the FSE’s sensitivity to boundary strength. Recall that gestural masking is consonant-driven, rather than vowel-driven. Greater gestural overlap across weaker boundaries should be associated with a greater perceptual deletion in pre-consonantal environments but not a shift in perceptual deletion in pre-vocalic environments.

A wrinkle here is that it is possible to formulate a syllabification-like explanation for the FSE within Articulatory Phonology; as Cho et al. point out about C#V strings within a phrase, “C and V gestures may reorganize temporally as having an in-phase coupling relationship” (2014: 98). Furthermore, such an account can also capture the failure of syllabification across strong boundaries; Cho et al. argue that “an IP boundary is likely to block C and V gestures in the C#V context from reorganizing temporally” (2014: 98). To say that the basic gestural-masking explanation of CSD does not make the correct prediction for this data is not to argue that Articulatory Phonology as a framework cannot account for the set of facts here.

4.1.1 An objection to a role for syllabification

Syllabification-based explanations for the FSE on CSD have been called into question by Labov’s (1997) discussion of the phonetic allophones with which word-final prevocalic coronal stops are realized. Labov’s objection to what he calls the “the myth of resyllabification” (1997: 152) hinges on the point that the syllabification analysis predicts that “a final allophone would be converted to an initial allophone” (1997: 154) upon syllabification as an onset. He argues that the phonetic evidence from cases where CSD-vulnerable stops are not deleted does not provide strong evidence for this prediction. For example, out of 61 cases of prevocalic /t/, nine instances are an unaspirated or glottalized stop before a stressed vowel (a coda-like allophone where an initial aspirated allophone is predicted) and a further three are an unaspirated or glottalized stop before an unstressed vowel (a coda-like allophone where an ambiguous one is expected). The remaining 49 cases are ones where the realization is a flap or lenis consonant before an unstressed vowel, which Labov acknowledges are ambiguous with respect to analysis of the consonant as coda, onset, or both. The strongest evidence for surface onset allophony comes from palatalization of the coronal stops before /j/, but such forms are also a minority in that context. Labov thus contends that “the process of resyllabification is an important part of the English phonology being examined, but that its frequency is much too low to serve as an explanation for the effects of following segments on (t, d)-deletion” (1997: 169).

Labov’s point might be thought of as an instance of the more general problem that word-final inter-vocalic consonants (at least in English) are not consistently realized with onset allophones despite our strong theoretical understanding that, as Harris puts it, “any serious model of syllabification must accommodate an onset-first analysis of VCV” (2013: 362). This is most clearly relevant to the current study when it concerns liaison processes, in which a word-final consonant appears pre-vocalically but is absent before consonants or when the word is spoken in isolation. In various American and British English dialects that are generally non-rhotic, for example, /r/ can still surface before word-initial vowels, a phenomenon known as linking /r/ (in cases where it is present in the underlying representation) or intrusive /r/ (when not underlying). While it is common to suggest that this kind of process is linked in some way to syllable structure, the liaison /r/s are not acoustically identical to onset /r/s, being generally more lenited along a number of dimensions. For example, Gick (1999) finds that the degree of tongue tip raising in /ar#a/ sequences is intermediate to that in /a#ra/ (more raising) and /ar#ha/ (less raising) sequences. Côté (2011) surveys similar evidence of non-homophony of onset, coda, and liaison consonants in French; Bermúdez-Otero (2007) surveys examples from several English processes. This type of phonetic fact, found across a range of phenomena, helped motivate the proposal of possible ambisyllabicity (Kahn 1980), in which a segment can be linked to both coda and onset simultaneously and thus exhibit phonetic characteristics of both.

CSD is not standardly treated as liaison, since the words it targets are standardly pronounced with overt coronal stops in isolation. But the substance of the FSE makes for a plausible parallel to liaison processes, especially given that liaison is generally thought to arise diachronically from lenition and deletion processes that must go through periods of stochastic or gradient behavior (Morin 1986; Côté 2011). The failure of pre-vocalic undeleted coronal stops to exhibit fully onset-like acoustic properties, then, is not especially surprising and need not rule out an analysis making some reference to syllable structure.

The point may be further clarified if we compare Labov’s (1997) observations about coronal stop phonetics to the French phenomenon known as liaison sans enchaînement: liaison without forward syllabification. While early identification of coda-syllabified liaison consonants from Ågren (1973) and Encrevé (1988) proved controversial, it does seem to be the case that liaison sans enchaînement is at least possible in high registers if not common in conversational speech (Miller & Fagyal 2005; Durand & Lyche 2008). This suggests that the surfacing of liaison consonants before vowels can be dissociated from syllabification of those consonants as onsets. Another point of commonality between liaison and the FSE is that liaison is not blocked 100% of the time by even very strong prosodic boundaries (Miller & Fagyal 2005); this is qualitatively, if not quantitatively, parallel to the observation here that even across strong syntactic boundaries, the prevocalic environment is still a disfavored location for stop deletion. The comparison to liaison is also strengthened by the important role that external sandhi effects in general have played in the development of the PPH, most notably in Kilbourn-Ceron (2017b) and related publications. Kilbourn-Ceron’s demonstration that English /t/-flapping and French liaison are affected in parallel ways by the frequency of the following (triggering) word crucially distinguishes probabilistic reduction accounts from the PPH: liaison is not a leniting process and therefore is not predicted to be sensitive to the predictability of the following context if predictability licenses reduction.

I offer this comparison to liaison not because I believe CSD should be strictly equated with liaison, nor because the liaison literature provides a ready-made, clear-cut analysis of the FSE. Indeed, there are many theoretical treatments on offer for both liaison (see Côté 2011) and word-final prevocalic consonant allophony (see Bermúdez-Otero 2007); neither could be fairly called uncontroversial. Nor does the CSD data at hand in the current study offer new insight into these problems, which it was not designed to address. Rather, I merely observe that Labov’s (1997) phonetically-based objections to Guy’s (1991) syllabification analysis of the FSE might be bundled with a larger set of empirical facts that together constitute a tricky puzzle for phonological theory. Future work on CSD might thus profitably attend to the literatures on liaison and pre-vocalic word-final consonant allophony. The asymmetry of the FSE modulation seen in the current study points strongly towards a primarily vowel-driven locus of the FSE. Whatever the mechanism relating CSD to the following phonological environment, it should be thought of as a mechanism relating CSD to following vowels in particular.

5 Conclusion

The central empirical result presented in this paper is that the following segment effect on coronal stop deletion rate is smaller across a strong syntactic boundary than across a weak one. This result was obtained in data which was limited to monosyllabic target words containing homovoiced target clusters, and holds in a statistical analysis that additionally controls for grammatical category, preceding segment, speech rate, target word lexical frequency, and following segment duration. The other predicted proxies for likelihood of production-planning-induced co-presence failures—following segment scaled duration, target word scaled log lexical frequency, and scaled speech rate deviation—do not appear to attenuate the FSE as syntactic boundary strength does. They exhibit neither significant main effects when the following segment is a consonant, nor significant interactions with a following vowel compared to following consonant.

I argued that this result is consistent with the Production Planning Hypothesis: a possible explanation for why syntactic boundary strength should weaken the FSE is that following segments are less likely to have been planned in time to influence CSD if they are in a new constituent. Furthermore, I argued that the asymmetry of this interaction is most obviously compatible with syllabification-based analysis for why the FSE occurs in the first place. The reduction in the FSE across stronger syntactic boundaries comes from a reduction in the stop-preserving effect of vowels, and is not apparent in a significant shift in CSD rates before following consonants. If the FSE derives from vowels “saving” coronal stops by hosting their syllabification as onsets, co-presence failures will bleed this effect in the case of following vowels but will be vacuous in the case of following consonants.

I would note that this type of reasoning offers intriguing possibilities in other cases as well. For example, it seems quite clear from the graphical presentation of Wagner’s (2011) results that there is a similar asymmetry in the boundary-induced weakening of assimilatory conditioning on /ɪn/ ∼ /ɪn/ variation. The /ɪn/-favoring effect of a following coronal obstruent is smaller in the non-local condition than the local condition, but the syntactic conditions have equivalent rates of /ɪn/ before a vowel. One might interpret this pattern as evidence for a default status for /ɪŋ/. The PPH paradigm offers a way to observe how phonological or morphological processes can be disrupted as they take place in speech production. The shape of that disruption stands to teach us something about the content of those processes.

More generally, the results discussed here illustrate the value of adopting a dynamic perspective on sociolinguistic variation, one that makes room for the role of the speaker–hearer and her ever-shifting psychological (and social-contextual) state (Tamminga et al. 2016), rather than prioritizing the speech community as the sole target of investigation (Labov 2012). The reduction of the FSE by syntactic boundaries is comparable in magnitude to the heavily-theorized difference between CSD rates in monomorphemic and past tense forms (e.g. Guy 1991); the effect at the heart of this paper is not a subtle one. Accounting for it, I have argued, is possible by making reference to the cognitive process of production planning, which should probably not be built into either a grammar in the mentalist sense or a community grammar in the variationist sense. Production planning is what Tamminga et al. (2016) refer to as a cognitive p-conditioning factor: an extragrammatical cognitive force that is not strictly linguistic, but that interacts with linguistic structure in interesting ways as it exerts a influence on quantitative patterns of linguistic variation. The study of how production planning interacts with phonological variation may still be in its infancy, as attested to by the many unresolved questions in this paper and the recent papers that inspired it. But the new empirical questions that the PPH allows us to pose promise to deepen our understanding of both the linguistic structure underpinning linguistic variation and the relationship between that structure and our broader linguistic and cognitive abilities.