1 Introduction

Almost a hundred years of research have consistently shown that consonantal voicing has an effect on preceding vowel duration: vowels followed by voiced obstruents are longer than when followed by voiceless ones (Meyer 1904; Heffner 1937; Belasco 1953; House & Fairbanks 1953; Peterson & Lehiste 1960; Halle & Stevens 1967; Chen 1970; Klatt 1973; Lisker 1974; Fowler 1992; Laeufer 1992; Hussein 1994; Lampp & Reklis 2004; Warren & Jacks 2005; Durvasula & Luo 2012). This so called “voicing effect” has been found in a considerable variety of languages.1 These include (but are not limited to) English, German, French, Spanish, Hindi, Russian, Italian, Arabic, and Korean (see Maddieson & Gandour 1976 for a more comprehensive, but still not exhaustive list).2 Despite of the plethora of evidence in support of the existence of the voicing effect, agreement hasn’t been reached regarding its source.

Several proposals have been put forward in relation to the possible source of the voicing effect (see Sóskuthy 2013 and Beguš 2017 for an overview). Some of the proposed mechanisms for the emergence of the voicing effect refer to properties of speech production. A notable production account, which will be the focus of this study, is based on compensatory temporal adjustments (Lindblom 1967; Slis & Cohen 1969a; b; Lehiste 1970a; b). According to this account, the voicing effect follows from the reorganisation of gestures within a unit of speech the duration of which is not affected by stop voicing. The duration of such a unit is held constant across voicing contexts, while the duration of voiceless and voiced obstruents differs. The closure of voiceless stops is longer than that of voiced stops (Lisker 1957; Summers 1987; Davis & Summers 1989; de Jong 1991). As a consequence, vowels followed by voiceless stops (which have a long closure) are shorter than vowels followed by voiced stops (which have a short closure). Advocates of a compensatory mechanism propose two prosodic units as the scope of the temporal adjustment: the syllable (and, equivalently, the VC sequence or vowel-to-vowel interval, Lindblom 1967; Farnetani & Kori 1986), and the word (Slis & Cohen 1969a; b; Lehiste 1970a; b). However, the compensatory temporal adjustment account has been criticised in subsequent work.

Empirical evidence and logic challenge the proposal that the syllable or the word have a constant duration and hence drive compensation. First, Lindblom’s 1967 argument that the duration of the syllable is constant is not supported by the findings in Chen (1970) and Jacewicz et al. (2009). Chen (1970) rejects a syllable-based compensatory mechanism in the light of the fact that the duration of the syllable is affected by consonant voicing. Jacewicz et al. (2009) further show that the duration of monosyllabic words in American English changes depending on the voicing of the coda consonant. Second, although the results in Slis & Cohen (1969) suggest that the duration of disyllabic words in Dutch is constant whether the second stop is voiceless or voiced, it does not follow from this fact that compensation should necessarily target the vowel preceding the stop. Indeed, it is logically possible that the following unstressed vowel could be the target of the compensation, therefore differences in preceding vowel duration still call for an explanation.

The compensatory temporal adjustment account has been further challenged on the basis of the so called “aspiration effect” (Maddieson & Gandour 1976), by which vowels are longer when followed by aspirated stops than when followed by unaspirated stops. In Hindi, vowels before voiceless unaspirated stops are short, vowels followed by voiced aspirated stops are long, and vowels followed by voiced unaspirated and voiceless aspirated stops are in between and have similar durations. Maddieson & Gandour (1976) find no compensatory pattern between vowel and consonant duration. The consonant /t/, which has the shortest duration, is preceded by the shortest vowel, and vowels before /d/ and /tʰ/ have the same duration although the durations of the two consonants are different. Maddieson & Gandour (1976) argue that a compensatory explanation for differences in vowel duration cannot be maintained.

However, a re-evaluation of the way consonant duration is measured in Maddieson & Gandour (1976) might actually turn their findings in favour of a compensatory account. Due to difficulties in detecting the release of the consonant of interest, consonant duration in Maddieson & Gandour (1976) is measured from the closure of the relevant consonant to the release of the following, (e.g., in ab sāth kaho, the duration of /tʰ/ in sāth is calculated as the interval between the closure of /tʰ/ and the release of /k/). This measure includes the burst and aspiration (if present) of the consonant following the target vowel. Slis & Cohen (1969), however, state that the inverse relation between vowel duration and the following consonant applies to closure duration, and not to the entire consonant duration.3 If an inverse relation exists between vowel and closure duration, the inclusion of burst and/or aspiration clearly alters this relationship.

Indeed, the study on Hindi voicing and aspiration effects conducted by Durvasula & Luo (2012) indicates that closure duration, measured from closure onset to closure offset, decreases according to the hierarchy voiceless unaspirated > voiced unaspirated > voiceless aspirated > voiced aspirated, which closely resembles the order of increasing vowel duration in Maddieson & Gandour (1976). Nonetheless, Durvasula & Luo (2012) do not find a negative correlation between vowel duration and consonant closure duration, but rather a (small) positive effect. Vowel duration increases with closure duration when voicing and aspiration are taken into account. However, as noted in Beguš (2017), it is likely that this result is a consequence of not controlling for speech rate. A small negative effect of closure duration can turn positive if the effect of speech rate (which is positive) is greater, given the cumulative nature of these effects.

de Jong (1991) finds partial support for a compensatory mechanism between vowel and closure duration in an electro-magneto-articulometric study of two American English speakers. The duration of vowels in nuclear accented, pre-, and post-nuclear accented position is weakly negatively correlated with closure duration (the slope coefficients range between –0.12 and –0.35, meaning that the amount of durational compensation is between 10% and 35%). While the magnitude of the correlation is too weak to univocally support compensation, the direction of the correlation is correct (i.e., a negative correlation).

Further evidence for a compensatory account and a negative correlation between vowel and closure duration comes from the effect of a third type of consonants, namely ejectives. Beguš (2017) finds that in Georgian (which contrasts aspirated, voiced, and ejective consonants) vowels are short when followed by voiceless aspirated stops, longer before ejective stops, and longest when followed by voiced stops. Crucially, stop closure duration follows the reversed pattern: closure is short in voiced stops, longer in ejectives, and longest in voiceless aspirated stops. Moreover, vowel duration is inversely correlated with closure across the three phonation types. Beguš (2017) mentions the possibility that the negative correlation is an artefact of the vowel and closure intervals sharing a boundary. This annotation bias could generate negative correlations (by which the vowel would shorten and the closure would lengthen by the same amount when, for example, the boundary is placed to the left of the “actual” boundary). However, Beguš shows with a cross-annotator analysis that this was not the case. Beguš (2017) argues that these findings support temporal compensation (although not univocally, see Beguš 2017: Section V, and Section 4.2 of this paper).

To summarise, a mechanism of compensatory temporal adjustment has been proposed as the pathway to the emergence of the voicing effect. According to such an account, the difference in vowel duration before consonants varying in voicing (and possibly other phonation types) is the outcome of a compensation between vowel and closure duration. After reviewing the critiques advanced by Chen (1970) and Maddieson & Gandour (1976), and in face of the results in Slis & Cohen (1969), de Jong (1991) and Beguš (2017), a temporal compensation mechanism gains credibility. However, issues about the actual implementation of the compensation mechanism still remain. While compensatory temporal adjustments are plausible in light of the reviewed literature, we are still left with the necessity of identifying a speech interval the duration of which is not affected by the voicing of the post-vocalic consonant, and within which compensation can be logically implemented.

1.1 The present study

This paper reports on selected results from a broader exploratory study that investigates the relationship between vowel duration and consonant voicing from both an acoustic and articulatory perspective. Synchronised recordings of audio, ultrasound tongue imaging, and electroglottography were carried out to enable a data-driven approach to the analysis of features related to the voicing effect in the context of disyllabic (CV́CV) words in Italian and Polish.4 This study was not designed to test the compensatory account, but rather to collect synchronised articulatory and acoustic data on the voicing effect. Moreover, the design of the study has been constrained by the use of ultrasound articulatory techniques (see Section 2). Since the tongue imaging and electroglottographic data don’t bear on the main argument put forward here, only the results from acoustics will be discussed.

Italian and Polish reportedly differ in the magnitude (or presence) of the effect of stop voicing on vowel duration. On the other hand, the typical realisation of phonological voiced stops in these languages are similar (but see Huszthy 2016 and Schwartz & Arndt 2018 for a phonological and phonetic discussion on laryngeal aspects of Italian and Polish respectively).5 Cyran (2011) argues for a distinction between voicing and aspirating varieties of Polish, based on phonological arguments. Waniek-Klimczak (2011), on the other hand, cautiously argues that a possible change in progress in Polish is affecting the VOT values of voiceless stops in pre-stressed position.

The non-clear status of Polish laryngeal phonology/phonetics could be seen as a hindrance affecting the comparison with Italian. However, based on data from Italian, Kirby & Ladd (2016) propose that the distinction between voicing and aspirating languages itself (Beckman et al. 2013) cannot be straightforwardly mapped onto phonetics, and they remind us that “the production of laryngeal contrasts of all kinds are considerably more complex” than generally described in the phonological literature (Kirby & Ladd 2016: 2409). Since this study focusses on the effect of post-stressed stops on preceding vowel durations, we believe that the comparison between Italian and Polish is still feasible, even in the case Polish voiceless pre-stressed stops are articulated with longer VOT values. Given that Italian and Polish share some features of the segmental and prosodic make-up of their phonological systems, the design of the experimental material and comparison of the results were facilitated. For these reasons, these languages offer an opportunity to investigate differences that could reveal mechanisms underlying the voicing effect, at least on a general level.

Italian has been unanimously reported as a voicing-effect language (Magno Caldognetto et al. 1979; Farnetani & Kori 1986; Esposito 2002). The mean difference in vowel duration when followed by voiceless vs. voiced consonants ranges between 22 and 24 ms in these studies, with longer vowels followed by voiced consonants. The mean differences are based on 3 speakers in Farnetani & Kori (1986) and 7 speakers in Esposito (2002). Magno Caldognetto et al. (1979) don’t report estimates of vowel duration, just the direction of the effect, but the study is based on 10 speakers.

The results regarding the presence and magnitude of the effect in Polish are instead mixed. Slowiaczek & Dinnsen (1985) find that vowels followed by word-final underlyingly voiced stops are 10–15 ms longer in 5 Polish speakers, although Jassem & Richter (1989) did not replicate their results. Similarly, Keating (1984) reports a difference of 2 ms in the duration of stressed vowels in disyllabic words from 24 speakers, which the author argues to be non-significant. On the other hand, Nowak (2006) finds that vowels followed by voiced stops are 4.5 ms longer in the 4 speakers recorded. Malisz & Klessa (2008) argue based on data from 40 speakers that the magnitude of the voicing effect in Polish is highly idiosyncratic, and claim that their results are inconclusive on this matter. While they do not report estimates from the 40 speakers, a table with mean vowel durations from 4 suggests a mean difference before voiceless vs. voiced stops of 3.5 ms. Finally, Strycharczuk (2012) reports a non-significant effect in 6 speakers in pre-sonorant word-final position.

The variety of results concerning the voicing effect in Polish could be related to differences in methodology. However, no clear pattern between studies which find a voicing effect and those which don’t can be identified. For example, the studies reviewed here looked at either word-final or word-medial stops, controlled or read speech, speakers with a low or advanced proficiency in English. However, in all the individual cases both a positive and a negative result are reported depending on the study. What might be more relevant, though, is that the estimates of the difference in vowel duration are generally very low, between 3.5 and 15 ms. Given the small magnitude of the difference, it is likely that the failure to obtain significant p-values in some studies are due to low statistical power, rather than because of absence of the effect (as also hinted in Beguš 2017, see arguments in Roettger 2019 and Nicenboim et al. 2018).

The acoustic data from the study discussed here suggests that (1) a voicing effect can be detected both in Italian and Polish, and that (2) the duration of the interval between two consecutive stop releases (the release to release interval) is not affected by the voicing of the second consonant in both languages. This finding is compatible with a compensatory temporal adjustment account by which the timing of the closure onset of the stop following the vowel within said interval determines the respective durations of vowel and closure.

2 Method

2.1 Participants

Participants were sought in Manchester (UK), and in Verbania (Italy). Seventeen subjects in total participated in this study. Eleven subjects are native speakers of Italian (5 female, 6 male), while six are native speakers of Polish (3 female, 3 male). The Italian speakers are from the North and Centre of Italy (8 speakers from Northern Italy, 3 from Central Italy). The Polish group has 2 speakers from Western Poland, 3 speakers from Central Poland, and 1 speaker from Eastern Poland. For more information on the sociolinguistic details of the speakers, see Supplementary file 1. Ethical clearance for this study was obtained from the University of Manchester (REF 2016-0099-76). The participants signed a written consent and received a monetary compensation of £10.

2.2 Equipment

The acquisition of the audio signal was achieved with the software Articulate Assistant Advanced™ (AAA, v2.17.2, Articulate Instruments Ltd™ 2011) running on a Hewlett-Packard ProBook 6750b laptop with Microsoft Windows 7. Audio recordings were sampled at 22050 Hz (16-bit) and saved in a proprietary format (.aa0). A FocusRight Scarlett Solo pre-amplifier and a Movo LV4-O2 Lavalier microphone were used for audio recording. The microphone was placed at the level of the participant’s mouth on one side, at a distance of about 10 cm. The microphone was clipped onto a metal headset worn by the participant, which was part of the ultrasonic equipment.

2.3 Materials

The target stimuli were disyllabic words with C1V1C2V2 structure, where C1 = /p/, V1 = /a, o, u/, C2 = /t, d, k, g/, and V2 = V1 (e.g. /pata/, /pada/, /poto/, etc.).6 Most are nonce words, although inevitably some combinations produce real words both in Italian (4 words) and Polish (2 words, see Supplementary file 1). The lexical stress of the target words was placed by speakers of both Italian and Polish on V1, as intended.

The make-up of the target words was constrained by the design of the experiment, which included ultrasound tongue imaging (UTI). Front vowels are difficult to be imaged with UTI, since their articulation involves tongue surface positions which are particularly far from the ultrasonic probe, hence reducing the visibility of the tongue contour. For this reason, only central and back vowels were included. Since one of the variables of interest in the study was the closing gesture of C2, only lingual consonants were used. A labial stop was chosen as the first consonant to reduce possible coarticulation with the following vowel (although see Vazquez-Alvarez & Hewlett 2007). The number of target words was kept low to reduce the time required for completing the task, since the ultrasonic equipment can get very uncomfortable for the speaker when worn for more than 15/20 minutes.

The target words were embedded in a frame sentence. Controlling for meaning, segmental and prosodic make-up between languages proved to be difficult. The frames are Dico X lentamente ‘I say X slowly’ in Italian (following Hajek & Stevens 2008), and Mówię X teraz ‘I say X now’ in Polish. These sentences were chosen in order to maintain a similar intonation contour across languages.

2.4 Procedure

The participant was asked to read the sentences with the target words which were presented on the computer screen. The order of the sentences was randomised for each participant. Participants read the list of randomised sentence stimuli 6 times. Due to software constraints, the order of the list was kept the same across the six repetitions within each participant. The reading task lasted between 15 and 20 minutes, with optional short breaks between one repetition and the other. The total session time was around 45 minutes. Before the start of the experiment, the participants were spoken to in their mother tongue to try and reduce exposition to English prior to being recorded. Instructions were also given in their respective mother tongues. Each speaker read a total of 12 sentences for 6 times (with the exceptions of IT02, who repeated the 12 sentences 5 times), which yields a grand total of 1212 tokens (792 from Italian, 420 from Polish).

The experiment was carried out in two locations: in the sound attenuated booth of the Phonetics Laboratory at the University of Manchester, and in a quiet room in a field location in Italy (Verbania, Northern Italy). In both locations the equipment and procedures were the same. Data collection started in December 2016 and ended in March 2018.

2.5 Data processing and measurements

The audio recordings were exported from AAA in the .wav format for further processing. The sample and bit rate were kept as upon recording (22050 Hz, 16-bit). A forced aligned transcription was accomplished through the SPeech Phonetisation Alignment and Syllabification software (SPPAS, Bigi 2015). The outcome of the automatic annotation was manually corrected for the relevant boundaries, according to the criteria in Table 1 based on Machač & Skarnitzl (2009). Segmentation boundaries not used in the analyses have not been checked to speed up processing. The releases of C1 and C2 were detected automatically by means of a Praat scripting implementation of the algorithm described in Ananthapadmanabha et al. (2014), and subsequently corrected if necessary. The identification of the stop release was not possible in 99 tokens (8%) of C1 and 265 tokens (22%) of C2 out of 1212. This was due either to the absence of a clear burst in the waveform and spectrogram, or the realisation of voiced stops as voiced fricatives. Most of the fricativised tokens come from three speakers of Central Italian, IT12, IT13, and IT14, a variety of Italian known to show processes of lenition (Hualde & Nadeu 2011).

Table 1

Criteria for the identification of acoustics landmarks.

landmark criteria
vowel onset (V1 onset) Appearance of higher formants in the spectrogram following the release of /p/ (C1)
vowel offset (V1 offset) Disappearance of the higher formants in the spectrogram preceding the target consonant (C2)
consonant onset (C2 onset) Corresponds to V1 offset
closure onset (C2 closure onset) Corresponds to V1 offset
consonant offset (C2 offset) Appearance of higher formants of the vowel following C2 (V2); corresponds to V2 onset
consonant release (C1/C2 release) Automatic detection + manual correction (Ananthapadmanabha et al. 2014)

Moreover IT12 and IT14 produced several tokens of voiceless stops with voicing during closure (in some cases the closure was completely voiced). These tokens have been used in the analyses, because (1) the actual presence or absence of voicing during closure does not bear on the compensatory account discussed here (which concerns supraglottal gestures) and laryngeal gestures can be implemented almost entirely independently from oral gestures, and (2) the voicing effect has been shown to exist even in whispered speech, where vocal fold vibration is entirely absent (Sharf 1964).7

The durations in milliseconds of the following intervals were extracted with a series of custom Praat scripts from the annotated acoustic landmarks: word duration, vowel duration (V1 onset to V1 offset), consonant closure duration (V1 offset to C2 release), and release to release duration (C1 release to C2 release). Sentence duration was measured in seconds. Figure 1 shows an example of the segmentation of /pata/ (a) and /pada/ (b) from an Italian speaker. Syllable rate (syllables per second) was used as a proxy to speech rate (Plug & Smith 2018), and was calculated as the number of syllables divided by the duration of the sentence in seconds (8 syllables in Italian, 6 in Polish). All further data processing and visualisation was done in R v3.5.2 (Wickham 2017; R Core Team 2018).

Figure 1
Figure 1

Segmentation example of the words pata and pada uttered by the Italian speaker IT09 (the times on the x-axis refer to the times in the concatenated audio file).

2.6 Statistical analysis

Given the data-driven nature of the study, all statistical analyses reported here are to be considered exploratory (hypothesis-generating) rather than confirmatory (hypothesis-driven, Kerr 1998; Gelman & Loken 2013; Roettger 2019). The durational measurements were analysed with linear mixed-effects models using lme4 v1.1-19 in R (Bates et al. 2015), and model estimates were extracted with the effects package v4.1-0 (Fox 2003). All factors were coded with treatment contrasts and the following reference levels: voiceless (vs. voiced), /a/ (vs. /o/, /u/), coronal (vs. velar), Italian (vs. Polish). Speech rate has been centred when included in the models to make the intercept estimates more interpretable. The models were fitted by Restricted Maximum Likelihood estimation (REML). The estimates in the results section refer to these reference levels unless interactions are discussed. P-values for the individual terms were obtained with lmerTest v3.0-1, which uses the Satterthwaite’s approximation to degrees of freedom (Kuznetsova et al. 2017; Luke 2017). A result is considered significant if the p-value is below the alpha level (α = 0.05). The choice of not using likelihood ratio tests for statistical inference is based on Luke (2017) who argues that this approach can lead to inflated Type I error rates. In any case, Luke (2017: 1501) also warns that “results [from mixed-effects models] should be interpreted with caution, regardless of the method adopted for obtaining p-values”. Inspection of residual plots and QQ plots of the models described below indicated absence of patterns in the residuals.

Bayes factors were used to test whether word and release to release duration are not affected by C2 voicing (i.e., the effect of C2 voicing on duration is 0).8 For each set of null/alternative hypotheses, a full model (with the predictor of interest) and a null model (excluding it) were fitted separately using the Maximum Likelihood estimation (ML, Bates et al. 2015: 34). The Bayes Information Criterion (BIC) approximation was then used to obtain Bayes factors (Raftery 1995; 1999; Wagenmakers 2007; Jarosz & Wiley 2014). The approximation is calculated according to the equation in 1 (Wagenmakers 2007: 796).

(1) BF01expBIC 10/2)

where ΔBIC10 = BIC1BIC0, BIC1 is the BIC of the full model, and BIC0 is the BIC of the null model. Values of BF01 > 1 indicate a preference of H0 over H1. The interpretation of the Bayes factors follows the recommendations in Raftery (1995: 139): 1–3 = weak evidence, 3–20 = positive evidence, 20–150 = strong evidence, >150 = very strong evidence.

The extracted measurements were filtered before statistical analysis. Measures of vowel duration, closure duration, word duration, and release to release duration that are 3 standard deviations lower or higher than the respective means were excluded from the final dataset (this procedure generally corresponds to a loss of around 2.5% of the data). One sentence (sentence 48 of IT07, Dico pada lentamente) included a speech error and has been excluded. After excluding missing measurements, these operations yield a total of 920 tokens of vowel and closure durations, 1176 tokens of word duration, and 848 tokens of release to release duration.

2.7 Open Science statement

Following recommendations for Open Science in Crüwell et al. (2018) and Berez-Kroeker et al. (2018), the data and code used to produce the analyses discussed in this paper are available on the Open Science Framework at https://osf.io/bfyhr/.

3 Results

The following sections report the results of the study in relation to the durations of vowels, consonant closure, word, and the release to release interval. When discussing the output of statistical modelling, only the relevant predictors and interactions will be presented. The full output of statistical models (including confidence intervals and p-values) are given in Supplementary file 1.

3.1 Vowel duration

Figure 2 shows boxplots and raw data of vowel duration for the three vowels /a, o, u/ when followed by voiceless or voiced stops in Italian and Polish. Vowels tend to be longer when followed by a voiced stop in both languages. The effect appears to be greater in Italian than in Polish, especially for the vowels /a/ and /o/. There is no evident effect of C2 voicing in /u/ in Italian, but the effect is discernible in Polish /u/. In Italian, vowels have a mean duration of 106.16 ms (SD = 27.08) before voiceless stops, and a mean duration of 117.66 ms (SD = 34.63) before voiced stops. Polish vowels are on average 75.57 ms long (SD = 16.16) when followed by a voiceless stop, and 83.11 ms long (SD = 19.37) if a voiced stop follows. The difference in vowel duration based on the raw means is 11.5 ms in Italian and 7.54 ms in Polish.

Figure 2
Figure 2

Raw data and boxplots of the duration in milliseconds of vowels in Italian (top row) and Polish (bottom row), for the vowels /a, o, u/ when followed by a voiceless or voiced stop.

A linear mixed-effects model with vowel duration as the outcome variable was fitted with the following predictors: fixed effects for C2 voicing (voiceless, voiced), C2 place of articulation (coronal, velar), vowel (a, o, u), language (Italian, Polish), and speech rate (as syllables per second, centred); by-speaker and by-word random intercepts with by-speaker random slopes for C2 voicing. All possible interactions between C2 voicing, vowel, and language were included. The following terms are significant according to t-tests with Satterthwaite’s approximation to degrees of freedom: C2 voicing, C2 place, vowel, language, and speech rate. Only the interaction between C2 voicing and vowel is significant. Vowels are 16.28 ms longer (SE = 4.42) when followed by a voiced stop (C2 voicing), and 8 ms shorter (SE = 1.63) when followed by a velar stop. The effect of C2 voicing is smaller with /u/ (around 3 ms, β^=–13.1  ms , SE = 5.56). Polish has on average shorter vowels than Italian ( β^=–24.05  ms , SE = 7.83), and the effect of voicing is estimated to be about 10.55 ms (although note that the interaction between language and C2 voicing is not significant). Speech rate has a negative effect on vowel duration, such that faster rates correlate with shorter vowel durations ( β^=–16.23  ms , SE = 1.26).

3.2 Consonant closure duration

Figure 3 illustrates stop closure durations with boxplots and individual raw data points. A pattern opposite to that with vowel duration can be noticed: closure duration is shorter for voiced than for voiceless stops. The closure of voiced stops in Italian is 106.16 ms long (SD = 27.08), while the voiceless stops have a mean closure duration of 117.66 ms (SD = 34.63). In Polish, the closure duration is 75.57 ms (SD = 16.16) in voiced stops and 83.11 ms (SD = 19.37) in voiceless stops. The difference in closure duration based on the raw means is 13.33 ms in Italian and 10.87 ms in Polish. The same model specification as with vowel duration has been fitted with consonant closure duration as the outcome variable. C2 voicing, C2 place, and speech rate are significant. Stop closure is 17.5 ms shorter (SE = 4) if the stop is voiced and 3.5 ms longer (SE = 1.5) if velar. Finally, faster speech rates correlate with shorter closure durations ( β^=–8.5  ms , SE = 1 ms).

Figure 3
Figure 3

Raw data and boxplots of closure duration in milliseconds of voiceless and voiced stops in Italian (top row) and Polish (bottom row) when preceded by the vowels /a, o, u/.

3.3 Vowel and closure duration

A model addressing the relationship between vowel and stop closure duration was fitted with the following terms and interactions: vowel duration as the outcome variable; as fixed effects, closure duration, vowel, speech rate (centred); all logical interactions between closure duration, vowel, and speech rate; by-speaker and by-word random intercepts. Closure duration has a significant effect on vowel duration ( β^=–0.19  ms , SE = 0.06 ms). The effect with /u/ is greater than with /a/ and /o/ ( β^=–0.23  ms , SE = 0.08 ms). In general, closure duration is inversely proportional to vowel duration. However, such a correlation is quite weak, as shown by the small estimates. A 1 ms increase in closure duration corresponds to a 0.2–0.45 ms decrease in vowel duration. These estimates can be interpreted in terms of percentages of compensation, which range between 20 and 45%. Note, moreover, that the negative correlation found here could be a consequence of annotation bias, since the vowel and closure share a boundary. Faster speech rates elicit a bigger effect than lower speech rates, as indicated by the significant interaction between closure duration and speech rate ( β^=–0.2  ms , SE = 0.06 ms). The effect of the interaction is reduced when the vowel is /u/ ( β^=0.17  ms , SE = 0.08 ms). Figure 4 shows for each vowel /a, o, u/ the individual data points and the regression lines with 95% confidence intervals extracted from the mixed-effects model.

Figure 4
Figure 4

Raw data, estimated regression lines, and 95 per cent confidence intervals of the effect of closure duration on vowel duration for the vowels /a, o, u/ (from a mixed-effects model fitted to data pooled from Italian and Polish, see text for details).

3.4 Word duration

Words with a voiceless C2 are on average 393.72 ms long (SD = 79.05) in Italian and 387.72 ms long (SD = 73.45) in Polish. Words with a voiced stop have a mean duration of 357.07 ms (SD = 39.14) in Italian and 361.87 ms (SD = 38.51) in Polish. The following full and null models were fitted to test the effect of C2 voicing on word duration. The full model is made up of the following fixed effects: C2 voicing, C2 place, vowel, language, and speech rate. The model also includes by-speaker and by-word random intercepts, and a by-speaker random slope for C2 voicing. The null model is the same as the full model with the exclusion of the fixed effect of C2 voicing. The Bayes factor of the null against the full model is 19. Thus, the null model (in which there is no effect of C2 voicing, β = 0) is 19 times more likely under the observed data than the full model. This indicates that there is positive evidence for a null effect of C2 voicing on word duration.

3.5 Release to release interval duration

In Figure 5, boxplots and raw data points show the duration of the release to release interval in words with a voiceless vs. a voiced C2 stop, in Italian and Polish. It can be seen that the distributions, medians, and quartiles of the durations in the voiceless and voiced condition do not differ much in either language. In Italian, the mean duration of the release to release interval is 209.88 ms (SD = 43.84) if C2 is voiceless, and 208.6 ms (SD = 41.34) if voiced. In Polish, the mean durations are respectively 173.13 (SD = 22.44) and 172.67 (SD = 20.47) ms. The specifications of the null and full models for the release to release duration are the same as for word duration. The Bayes factor of the null model against the full model is 21, which means that the null model (without C2 voicing) is 21 times more likely than the model with C2 voicing as a predictor. The Bayes factor suggests there is strong evidence that duration of the release to release interval is not affected by C2 voicing.

Figure 5
Figure 5

Raw data and boxplots of the duration in milliseconds of the release to release interval in Italian (left) and Polish (right) when C2 is voiceless or voiced.

4 Discussion

A study of articulatory and acoustic aspects of the effect of consonant voicing on vowel duration in Italian and Polish has been carried out to look for a possible source of such an effect in speech production. Only the results from the acoustic part of the study bear on the main argument of this paper. The following sections discuss, in turn, the results regarding the effect of voicing on vowel duration in Italian and Polish and how the finding that the duration of the interval between the two consecutive consonant releases in CV́CV words is compatible with a compensatory temporal adjustment account of the voicing effect. The section concludes by discussing the limitations and open issues of this study.

4.1 Voicing effect in Italian and Polish

The results of vowel duration and C2 voicing indicate that vowels are longer when followed by voiced than when followed by voiceless stops both in Italian and Polish. The estimated effect is around 16 ms when C2 is voiced for Italian. This value is not too far from the estimates of previous works on this language (Magno Caldognetto et al. 1979; Farnetani & Kori 1986; Esposito 2002), the range of which is between 22 and 24 ms. The higher estimates of these studies compared to the one here could be related to differences in experimental design, or Type M (magnitude) errors due to low statistical power (see Kirby & Sonderegger 2018). The estimate of the effect of voicing on C2 closure duration is around –18 ms. Crucially, the effect of voicing on vowel and closure duration have very similar magnitudes and opposite signs. These results suggest a compensatory mechanism between vowel and closure duration.

Furthermore, the effect of voicing on the duration of Italian /u/ is smaller than with /a/ and /o/ (about 3 vs. 16 ms respectively), a fact already observed by Ferrero et al. (1978). While it is not clear why the duration of this particular vowel should not be affected by C2 voicing, the data reported here indicate that the magnitude of the difference in closure duration when the preceding vowel is /u/ is smaller than with /a/ and /o/ (about 7 vs. 17 ms respectively). If vowel duration compensates for closure duration, then a smaller difference in closure duration should correspond to a small difference in vowel duration, as the estimates seem to suggest.

The interpretation of the Polish results is less straightforward. Previous studies found either no voicing effect or a small effect in Polish (3.5–4.5 ms). In particular, Malisz & Klessa (2008) say that the effect seems to be very idiosyncratic in the 40 speakers of their analysis. The estimated effect found in the 6 Polish speakers of the present study is about 10.5 ms, and the difference based on the means of the raw vowel durations is 7.5 ms. Recall, however, that the interaction between language and C2 voicing (which gives the estimate of 10.54) is not significant (see the full model summary in Supplementary file 1). It is likely, though, that the non-significance might be related to low power. Indeed, the raw mean difference of 7.5 ms in Polish—although still higher than what found in previous studies—might be more informative.

More specifically, when one compares the raw mean duration differences of vowels with the raw mean duration differences of consonant closures, a pattern can be seen. The mean differences of Italian vowels and closures (11.5 and 13.33, respectively) are bigger than those of Polish (7.54 and 10.87), even if by just a small amount. It is plausible that the smaller effect of C2 voicing on preceding vowel duration in Polish is related to the smaller effect on closure duration, if we assume a temporal mechanism of compensation between the closure and the vowel. These patterns will need to be confirmed with a more balanced sample of Italian and Polish speakers.

On the other hand, while the estimated differences in vowel durations can be interpreted in reference to Italian and Polish as two independent linguistic objects, the patterns observed in the individual speakers does not indicate a systematic relation between magnitude of the effect and language. Figure 6 shows the random coefficients of the effect of C2 voicing on vowel duration for the individual speakers, extracted from the mixed-effects model presented in Section 3.1. Black indicates Italian speakers, while grey is for Polish speakers. As can be seen, speakers of both languages are scattered along the values of the voicing effect. These results are in agreement with the idiosyncrasy of the voicing effect of Polish found in Malisz & Klessa (2008). While large-scale studies could reveal clear language-level patterns, the data discussed here point to a scenario in which the speaker’s individual behaviour is substantial. Future studies could thus look into the respective role of individual-level and community-level factors and how these contribute to the magnitude of the durational differences across speakers and languages.

Figure 6
Figure 6

By-speaker random coefficients and error bars for the effect of C2 voicing on vowel duration, extracted from a mixed-effect model (Section 3.1).

4.2 Compensatory temporal adjustment

Vowels followed by voiced stops are long, while vowels followed by voiceless stops are short. The closure duration of voiced stops is short compared to that of voiceless stops. There seems to be an inverse relation between vowel duration and closure duration, by which a long vowel entails a short closure (and vice versa), and a short vowel entails a long closure (and vice versa).

The data and statistical analyses of this study suggest that the duration of the interval between the releases of two consecutive consonants in CV́CV words (the release to release interval) is not affected by the phonological voicing of the second consonant (C2) in Italian and Polish. In accordance with a compensatory temporal adjustment account (Slis & Cohen 1969b; Lehiste 1970b), the difference in vowel duration and closure durations before voiceless vs. voiced stops can be seen as the outcome of differences in timing of the vowel offset/closure onset (more neutrally, the VC boundary). In other words, the timing of the VC boundary within the temporally stable release to release interval determines the duration of both the vowel and the stop closure. An earlier VC boundary relative to the onset of the preceding vowel results in a shorter vowel and a longer stop closure. On the other hand, a later VC boundary produces a longer vowel and a shorter closure. Figure 7 illustrates this compensatory mechanism. Note that the term “temporal stability” (and “temporally stable”) as used here means that the underlying statistical distribution of the interval duration is stable across contexts of C2 voicing. No specific statement is implied about the variance of the duration around the mean, across or within phonological contexts.

Figure 7
Figure 7

A schematic representation of the oral cavity cross-sectional area, as inferred from acoustics. Design based on Esposito (2002). The top panel shows a CV́C sequence with a voiceless C2, the bottom panel with a voiced C2. Oral cavity aperture (on the y-axis, as the inverse of oral constriction) through time (on the x-axis) is represented by the black line. Lower values represent a more constricted oral tract (a contoid configuration), while higher values indicate a more open oral tract (a vocoid configuration). The black bars below the time axis represent voicing (vocal fold vibration). Various landmarks and intervals are indicated in the schematic.

The invariance of the release to release interval allows us to refine the logistics of the compensatory account by narrowing the scope of the temporal adjustment action. A limitation of this account, as proposed by Slis & Cohen (1969b) and Lehiste (1970b), is the lack of a precise identification of the word-internal mechanics of compensation. As already discussed in Section 1, it is not clear why the adjustment should target the preceding stressed vowel, rather than the following unstressed vowel or any other segment in the word. Since the release to release interval includes just the vocoid gesture between the release of C1 and the VC boundary, and the consonant closure, it follows that differences in the timing of the VC boundary must be reflected in differences in both vowel and closure durations.

Under an account of temporal compensation, the voicing effect can be interpreted as a by-product of gestural phasing and mechanisms operating on the timing of the VC boundary. The temporal stability of the release to release interval across voicing contexts allows us to refine the compensatory mechanism by providing a temporal anchor. On the other hand, it is important to note that the release to release interval should not necessarily have a special status in such a compensatory account, but rather can be used as a proxy to the understanding of a full gestural mechanism of compensation. Indeed, the temporal stability of this interval should be derivable from a theory of gestural phasing, rather than one that simply states that the interval is stable across voicing contexts.

The non-exclusivity of the release to release interval is also shown by the fact that excluding the VOT from it still indicates that C2 voicing is not affecting the interval duration. The duration of the vowel onset to release interval (the release to release minus VOT) is stable across voicing contexts (Bayes factor = 9). However, the duration of release to release interval has relatively more cohesion than that of the vowel onset to release interval, as indicated by two measures of relative dispersion (the coefficient of variation CV and the coefficient of quartile variation CQV, see Bonett 2006).9 On the other hand, the duration of the interval between the vowel onset of V1 to the vowel onset of V2 does change depending on C2 voicing (the interval it’s around 20 ms longer if C2 is voiceless). This fact is simply a consequence of including the VOT of C2 in the measure. Voiceless stops have longer VOT values, which increases the duration of the interval. The difficulty in identifying a clear-cut time point corresponding to vowel onset could explain the relative higher dispersion of the vowel onset to release interval duration. For these reasons, the release to release interval is probably a better measure of temporal stability than the vowel onset to release, given its inherent higher cohesion.

It is possible that the temporal stability of the release to release interval is not an antecedent, but rather a consequence of manipulating vowel and closure durations. If this were the case, the differential duration of vowels and closures would not be the result of a compensatory mechanism. The present data cannot disambiguate between these two scenarios, and future studies should look into investigating independent reasons for the release to release interval stability across voicing contexts. The account of gestural phasing proposed by Tilsen (2013; 2016) is promising, in that the temporal stability of the release to release interval would directly follow from the relative phasing of the vowels in CVCV words (see Figure 6 in Tilsen 2013). Articulatory work on the gestural coordination of sequences besides the traditional syllable might reveal a principled organisation that results in the temporal patterns observed in this study and in other durational phenomena.

However, even if independent reasons for the interval stability can be identified, other mechanisms, unrelated to compensatory effects, would still be required to explain the differential timing of the VC boundary within that interval. Accounts compatible with other aspects of production and perception would not be ultimately ruled out, as thoroughly discussed in Beguš (2017). For example, the laryngeal adjustment hypothesis (Halle et al. 1967) states that adjustments of the glottis for obstruent voicing require more time to be implemented, so that stop closure onset (VC boundary) for a voiced stops will be achieved later than that of a voiceless stop, relative to the onset of the preceding vowel. Tongue root advancement (Rothenberg 1967; Westbury 1983; Ohala 2011) could also play a role in modulating the time required before closure can be implemented. Another account (Chen 1970) makes direct reference to velocity of the closing gesture, which is faster in voiceless than in voiced stops (Summers 1987; de Jong 1991), so that the VC boundary within the release to release would be timed earlier in the former than in the latter case. Moreover, perceptual explanations of the voicing effect have been proposed in Javkin (1976) and Kluender et al. (1988), and these perceptual factors might play a role in the enhancement of the effect (see Port & Dalby 1982; Luce & Charles-Luce 1985; Kingston & Diehl 1994; see Fowler 1992 for a critique to Kluender et al. 1988). Finally, whether the timing of the VC boundary depends on modulations of the vocalic or consonantal gesture, or both, is another aspect that should be investigated further (see de Jong 1991 for an example).

A comment is also due in relation to possible coexisting effects on vowel duration. Beguš (2017) finds that, even when C2 closure duration is controlled for, C2 phonation (ejective, voiceless, voiced) in Georgian is still a significant predictor. The author argues for a separate laryngeal features effect, which operates in addition to a closure duration effect. In the present study, C2 voicing (voiceless, voiced) and its interactions are not significant when included in the model discussed in section Section 3.3, which has vowel duration as outcome and C2 closure duration as one of the predictors.10 However, even when multicollinearity between predictors is minimal, presence or lack of statistical significance of multiple terms cannot unequivocally inform us on the actual contribution of those terms, since it is possible that unknown relations between terms mask underlying mechanisms (for a discussion see McElreath 2015). The diachronic development of context-driven statistical sub-distributions can override the original causal link (Sóskuthy 2013). Under this scenario, it is not possible to discern which of the competing predictors is diachronically responsible for the relation, and either or both the compensatory mechanism and the laryngeal features could have had a role in generating the synchronic patterns (this kind of reasoning is compatible for example with exemplar theories of speech perception and production, see among others Johnson 1997; Ambridge 2018; Sóskuthy et al. 2018; Todd et al. 2019).

Since diverging results have been obtained in relation to the significance of C2 phonation in addition to C2 closure durations, these aspects need to be further investigated in future studies, although to ascertain whether they are artefacts of statistical procedures or if they reflect an underlying state of affairs might still prove difficult. To conclude, lack of significance of a separate laryngeal features effect in this study cannot be taken as evidence for its absence in the present data, and a compensatory mechanism could coexist with mechanisms directly related to laryngeal features, which would in turn explain the differential timing of the VC boundary.

4.3 Limitations and future work

The generalisations put forward in this paper strictly apply to disyllabic words with a stressed vowel in the first syllable, flanked by single stops. First, it is possible that the pattern found in this context does not occur in sequences including an unstressed vowel. For example, it is known that the difference in closure duration between voiceless and voiced stops is not stable when the stops precede a stressed vowel, although vowels preceding pre-stress stops have slightly different durations (Davis & Summers 1989). According to the mechanism proposed here, the absence of differences in closure duration should correspond to the absence of differences in vowel duration. Second, it is known that the magnitude of the effect of voicing is modulated by other prosodic characteristics, like the number of syllables in the word, presence/absence of focus, and position within the sentence (Sharf 1962; Klatt 1973; Laeufer 1992; de Jong 2004). Third, the constraints on experimental material enforced by the use of ultrasound tongue imaging have been previously mentioned in Section 2.3. Given these constraints, temporal information from other vowels (like front vowels), places and manners of articulation is a desideratum. Data from different contexts and different languages is thus needed to assess the generality of the claims put forward in this paper.

Another issue is the interaction of the temporal compensation and speech rate. The magnitude of compensation between vowel and closure duration found in de Jong (1991) and here is somewhat small (between 12% and 40%). Ideally, given the temporal stability of the release to release interval relative to C2 voicing, the compensation rates should approximate 100%. However, it is possible that the correlation between vowel and closure duration is modulated in complex ways by the individual effects of speech rate on the vowel and the closure. For example, Ko (2018) finds that the vowel/closure ratio differs depending on speaking rate and that there is an interaction between the voicing of the consonant and speaking rate. When the consonant is voiceless, the vowel/closure ratio is smaller when speaking rate is slow, while slow speaking rate induces larger vowel/closure values when the consonant is voiced. Experimental work is required which addresses the differential effect of speaking rate on vowel and consonant closures, and how these interact with a possible compensatory mechanism.

Some concern could be raised in relation to possible influences of English on the native productions of participants recorded in the English-speaking context of the University of Manchester Laboratory. However, as reported in Section 2.4, conversations during the session prior to the experiment and instructions were in the participant’s native language. Antoniou et al. (2010) show that, in a situational language context study of Greek-English bilinguals, being exposed to the native language during the experiment elicited Greek native-like phonetic values even when the dominant language at the time of recording was English (the bilingual speakers acquired English as a second language, being Greek their first). A small effect of L2 could persist in proficient L2 speakers, as found by Schwartz et al. (2015). The five Polish speakers with a highly proficient level of English investigated in that study showed a 10 ms increase in VOT values compared to the quasi-monolingual base level. While previous studies focussed on VOT, future work should directly test the influence of English on the magnitude of the voicing effect of one’s native language.

The compensatory temporal adjustment account presented here extends to other durational effects discussed in the literature. In particular, the account bears predictions on the direction of the durational difference led by phonation types different from voicing, like aspiration and ejection. For example, the mix of results with regard to the effect of aspiration (Durvasula & Luo 2012) suggests that the conditions for a temporal adjustment might differ across the contexts and languages studied. In light of the results in Beguš (2017), future studies will also have to investigate the durational invariance of speech intervals in relation to a variety of phonation contrasts.

5 Conclusions

The results of this exploratory study of the effect of voicing on vowel duration are congruent with a compensatory temporal adjustment account of such effect. Acoustic data from seventeen speakers of Italian and Polish show that the temporal distance between two consecutive stop releases is not affected by the voicing of the second stop in CV́CV words. The temporal invariance of the release to release interval, together with a difference in timing of the VC boundary, can cause vowels to be shorter when followed by voiceless stops (which have a long closure) and longer when followed by voiced stops (the closure of which is short).

As discussed in Section 4.2, the temporal patterns reported here do not univocally exclude other possible sources for the duration differential. Multiple mechanisms (both articulatory and perceptual) could conspire together to produce the observed patterns. Such a pluralist view has already been proposed for the voicing effect (for example, Beguš 2017 and Sanker 2018), and for other related phenomena, like vowel duration in incomplete neutralisation (Winter & Roettger 2011). For a review of explanatory pluralism in the cognitive sciences, see Dale et al. (2009) and references therein. Indeed, a hybrid account, which takes into consideration and synthesises aspects of multiple proposed accounts, is probably warranted, given the diversity of compatible results obtained so far. Future work will need to investigate further aspects of the patterns found in this study, with a particular focus on the effects of different segmental and prosodic structures and different laryngeal contrasts on the release to release interval, and in relation to other attributes of consonant effects on vowel duration.

Additional Files

The additional files for this article can be found as follows:

Supplementary file 1

Appendix with statistical output, participants’ socio-linguistic information, and list of target words. DOI: https://doi.org/10.5334/gjgl.869.s1

Supplementary file 2

Open Science Framework research compendium (data and code). DOI: https://doi.org/10.17605/OSF.IO/BFYHR

Notes

  1. One of the first attestations of the term “voicing effect” can be attributed to Mitleb (1982). Another term used to refer to the same phenomenon is “pre-fortis clipping”, probably introduced by Wells (1990). [^]
  2. A typological note. Most languages reported having a voicing effect come from the Indo-European family. Others are from a pool of widely studied languages. It is thus of vital importance that future studies look at other language families and underdocumented/underdescribed languages. [^]
  3. In this paper, I use the term “relation” to mean a categorical pattern of entailment (like in “a long vowel entails a short closure”), while the term “correlation” is reserved to a statistical correlation of two continuous variables. [^]
  4. As per Cysouw & Good (2013), the glossonyms Italian and Polish as used here to refer, respectively, to the languoids Italian [GLOTTOCODE: ital1282] and Polish [GLOTTOCODE: poli1260]. [^]
  5. Polish neutralises the voicing contrast word-finally, although the contrast is maintained word-medially (Gussmann 2007). [^]
  6. Italian has both a mid-low [ɔ] and a mid-high [o] back vowel in its vowel inventory. These vowels are traditionally described as two distinct phonemes (Krämer 2009), although both their phonemic status and their phonetic substance are subject to a high degree of geographical and idiosyncratic variability (Renwick & Ladd 2016). As a rule of thumb, stressed open syllables in Italian (like the ones used in this study) have [ɔː] (vowels in penultimate stressed open syllables are long) rather than [oː] (Renwick & Ladd 2016). On the other hand, Polish has only a mid-low back vowel phoneme /ɔ/ (Gussmann 2007). For the sake of typographical simplicity, the symbol /o/ will be used here for both languages. [^]
  7. A reviewer makes interesting phonological remarks. The presence of lenition and voicing of voiceless stops in some varieties of Italian and its absence in Polish could be related to differences in laryngeal phonology and prosodic structure between these languages, namely the absence of a feature [voice] in Italian and the absence of true trochees in Polish. This hypothesis is compatible with work by Schwartz & Arndt (2018) and Schwartz (2016), to which the reader is referred. [^]
  8. The choice of Bayes factors over other information criteria, like AIC, is a practical one. First, Bayes factors can be used to identify the realative strength of the evidence for each hypothesis. The higher the Bayes factor of H01, the stronger the evidence for H0 according to the data. Second, a Bayes factor near 1 indicates that the data is compatible with both hypotheses (even when AIC indicates a preference of one over the other), in which case it is not possible to chose among them. Note that the AICs of the word duration and release to release duration models reported below are lower when C2 voicing is not included as a predictor than when it is included, although the difference in AIC between the null and full models is very small (below 2). [^]
  9. The CV of the release to release duration is 0.203, while that of the vowel onset to release duration is 0.232. The CQV is 0.127 for the release to release and 0.136 for the vowel onset to release. Lower values mean less dispersion/more cohesion. [^]
  10. Multicollinearity is not an issue here, since the VIFs are all below 3 (Zuur et al. 2010). [^]

Acknowledgements

I am grateful to Ricardo Bermúdez-Otero and Patrycja Strycharczuk for their immense support and patience in providing feedback on this project. Thanks to the editor of Glossa and two anonymous reviewers for their feedback, which significantly improved the manuscript. I also want to thank the audience at the 16th Laboratory Phonology conference (LabPhon16) for their input, and Kenneth de Jong for comments on an early draft of this paper. Thanks also go to my colleagues at the Phonetics Laboratory of the University of Manchester, who provided help in different ways. Any remaining errors are my own.

Funding Information

This project has been funded by the School of Arts, Languages, and Cultures Graduate School at the University of Manchester.

Competing Interests

The author has no competing interests to declare.

References

Ambridge, Ben. 2018. Against stored abstractions: A radical exemplar model of language acquisition. Pre-print available at PsyArXiv. DOI:  http://doi.org/10.31234/osf.io/gy3ah

Ananthapadmanabha, Tirupattur V., Aragulla Prasad Prathosh & Angarai Ganesan Ramakrishnan. 2014. Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index. The Journal of the Acoustical Society of America 135(1). 460–471. DOI:  http://doi.org/10.1121/1.4836055

Antoniou, Mark, Catherine T. Best, Michael D. Tyler & Christian Kroos. 2010. Language context elicits native-like stop voicing in early bilinguals’ productions in both L1 and L2. Journal of Phonetics 38(4). 640–653. DOI:  http://doi.org/10.1016/j.wocn.2010.09.005

Articulate Instruments Ltd™. 2011. Articulate Assistant Advanced user guide. Version 2.16.

Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48. DOI:  http://doi.org/10.18637/jss.v067.i01

Beckman, Jill, Michael Jessen & Catherine Ringen. 2013. Empirical evidence for laryngeal features: Aspirating vs. true voice languages. Journal of Linguistics 49(2). 259–284. DOI:  http://doi.org/10.1017/S0022226712000424

Beguš, Gašper. 2017. Effects of ejective stops on preceding vowel duration. The Journal of the Acoustical Society of America 142(4). 2168–2184. DOI:  http://doi.org/10.1121/1.5007728

Belasco, Simon. 1953. The influence of force of articulation of consonants on vowel duration. The Journal of the Acoustical Society of America 25(5). 1015–1016. DOI:  http://doi.org/10.1121/1.1907201

Berez-Kroeker, Andrea L., Lauren Gawne, Susan Smythe Kung, Barbara F. Kelly, Tyler Heston, Gary Holton, Peter Pulsifer, David I. Beaver, Shobhana Chelliah & Stanley Dubinsky. 2018. Reproducible research in linguistics: A position statement on data citation and attribution in our field. Linguistics 56(1). 1–18. DOI:  http://doi.org/10.1515/ling-2017-0032

Bigi, Brigitte. 2015. SPPAS – Multi-lingual approaches to the automatic annotation of speech. The Phonetician 111–112. 54–69.

Bonett, Douglas G. 2006. Confidence interval for a coefficient of quartile variation. Computational Statistics & Data Analysis 50(11). 2953–2957. DOI:  http://doi.org/10.1016/j.csda.2005.05.007

Chen, Matthew. 1970. Vowel length variation as a function of the voicing of the consonant environment. Phonetica 22(3). 129–159. DOI:  http://doi.org/10.1159/000259312

Crüwell, Sophia, Johnny van Doorn, Alexander Etz, Matthew Makel, Hannah Moshontz, Jesse Niebaum, Amy Orben, Sam Parsons & Michael Schulte-Mecklenbeck. 2018. 8 easy steps to open science: An annotated reading list. PsyArXiv. DOI:  http://doi.org/10.31234/osf.io/cfzyx

Cyran, Eugeniusz. 2011. Laryngeal realism and laryngeal relativism: Two voicing systems in Polish? Studies in Polish Linguistics 6(1). 45–80.

Cysouw, Michael & Jeff Good. 2013. Languoid, doculect, and glossonym: Formalizing the notion ‘language’. Language Documentation & Conservation 7. 331–359. https://doi.org/10125/4606.

Dale, Rick, Eric Dietrich & Anthony Chemero. 2009. Explanatory pluralism in cognitive science. Cognitive Science 33(5). 739–742. DOI:  http://doi.org/10.1111/j.1551-6709.2009.01042.x

Davis, Stuart & W. Van Summers. 1989. Vowel length and closure duration in word-medial VC sequences. Journal of Phonetics 17. 339–353. DOI:  http://doi.org/10.1121/1.2026892

de Jong, Kenneth. 1991. An articulatory study of consonant-induced vowel duration changes in English. Phonetica 48(1). 1–17. DOI:  http://doi.org/10.1121/1.2028316

de Jong, Kenneth. 2004. Stress, lexical focus, and segmental focus in English: Patterns of variation in vowel duration. Journal of Phonetics 32(4). 493–516. DOI:  http://doi.org/10.1016/j.wocn.2004.05.002

Durvasula, Karthik & Qian Luo. 2012. Voicing, aspiration, and vowel duration in Hindi. Proceedings of Meetings on Acoustics 18. 1–10. DOI:  http://doi.org/10.1121/1.4895027

Esposito, Anna. 2002. On vowel height and consonantal voicing effects: Data from Italian. Phonetica 59(4). 197–231. DOI:  http://doi.org/10.1159/000068347

Farnetani, Edda & Shiro Kori. 1986. Effects of syllable and word structure on segmental durations in spoken Italian. Speech Communication 5(1). 17–34. DOI:  http://doi.org/10.1016/0167-6393(86)90027-0

Ferrero, Franco E., Emanuela Magno Caldognetto, Kiryaki Vagges & Carlo Lavagnoli. 1978. Some acoustic characteristics of Italian vowels. Journal of Italian Linguistics Amsterdam 3(1). 87–94.

Fowler, Carol A. 1992. Vowel duration and closure duration in voiced and unvoiced stops: There are no contrast effects here. Journal of Phonetics 20(1). 143–165.

Fox, John. 2003. Effect displays in R for generalised linear models. Journal of Statistical Software 8(15). 1–27. DOI:  http://doi.org/10.18637/jss.v008.i15

Gelman, Andrew & Eric Loken. 2013. The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf.

Gussmann, Edmund. 2007. The phonology of Polish. Oxford: Oxford University Press.

Hajek, John & Mary Stevens. 2008. Vowel duration, compression and lengthening in stressed syllables in Central and Southern varieties of standard Italian. In Proceedings of the 9th annual conference of the International Speech Communication Association, 516–519.

Halle, Morris & Kenneth Noble Stevens. 1967. Mechanism of glottal vibration for vowels and consonants. The Journal of the Acoustical Society of America 41(6). 1613–1613. DOI:  http://doi.org/10.1121/1.2143736

Halle, Morris, Kenneth Noble Stevens & Alan Victor Oppenheim. 1967. On the mechanism of glottal vibration for vowels and consonants. In Quarterly progress report 85. 267–277.

Heffner, R.-M. S. 1937. Notes on the length of vowels. American Speech 12. 128–134. DOI:  http://doi.org/10.2307/452621

House, Arthur S. & Grant Fairbanks. 1953. The influence of consonant environment upon the secondary acoustical characteristics of vowels. The Journal of the Acoustical Society of America 25(1). 105–113. DOI:  http://doi.org/10.1121/1.1906982

Hualde, José Ignacio & Marianna Nadeu. 2011. Lenition and phonemic overlap in Rome Italian. Phonetica 68(4). 215–242. DOI:  http://doi.org/10.1159/000334303

Hussein, Lutfi. 1994. Voicing-dependent vowel duration in Standard Arabic and its acquisition by adult American students: Columbus, OH: The Ohio State University dissertation.

Huszthy, Bálint. 2016. Italian as a voice language without voice assimilation. In Proceedings of ConSOLE XXIV, 428–452.

Jacewicz, Ewa, Robert Allen Fox & Samantha Lyle. 2009. Variation in stop consonant voicing in two regional varieties of American English. Journal of the International Phonetic Association 39(3). 313–334. DOI:  http://doi.org/10.1017/S0025100309990156

Jarosz, Andrew F. & Jennifer Wiley. 2014. What are the odds? A practical guide to computing and reporting Bayes factors. The Journal of Problem Solving 7(1). 2–9. DOI:  http://doi.org/10.7771/1932-6246.1167

Jassem, Wiktor & Lutoslawa Richter. 1989. Neutralization of voicing in Polish obstruents. Journal of Phonetics 17(4). 317–325.

Javkin, Hector R. 1976. The perceptual basis of vowel duration differences associated with the voiced/voiceless distinction. Report of the Phonology Laboratory, UC Berkeley 1. 78–92.

Johnson, Keith. 1997. Speech perception without speaker normalization: An exemplar model. In Keith Johnson & John W. Mullenix (eds.), Talker variability in speech processing, 145–165. San Diego, CA: Academic Press.

Keating, Patricia A. 1984. Universal phonetics and the organization of grammars. UCLA Working Papers in Phonetics 59. 35–49.

Kerr, Norbert L. 1998. HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review 2(3). 196–217. DOI:  http://doi.org/10.1207/s15327957pspr0203_4

Kingston, John & Randy L. Diehl. 1994. Phonetic knowledge. Language, 419–454. DOI:  http://doi.org/10.2307/416481

Kirby, James & Morgan Sonderegger. 2018. Mixed-effects design analysis for experimental phonetics. Journal of Phonetics 70. 70–85. DOI:  http://doi.org/10.1016/j.wocn.2018.05.005

Kirby, James P. & D. Robert Ladd. 2016. Effects of obstruent voicing on vowel F0: Evidence from “true voicing” languages. The Journal of the Acoustical Society of America 140(4). 2400–2411. DOI:  http://doi.org/10.1121/1.4962445

Klatt, Dennis H. 1973. Interaction between two factors that influence vowel duration. The Journal of the Acoustical Society of America 54(4). 1102–1104. DOI:  http://doi.org/10.1121/1.1914322

Kluender, Keith R., Randy L. Diehl & Beverly A. Wright. 1988. Vowellength differences before voiced and voiceless consonants: An auditory explanation. Journal of Phonetics 16. 153–169.

Ko, Eon-Suk. 2018. Asymmetric effects of speaking rate on the vowel/consonant ratio conditioned by coda voicing in English. Phonetics and Speech Sciences 10(2). 45–50. DOI:  http://doi.org/10.13064/KSSS.2018.10.2.045

Krämer, Martin. 2009. The phonology of Italian. Oxford: Oxford University Press.

Kuznetsova, Alexandra, Per Bruun Brockhoff & Rune Haubo Bojesen Christensen. 2017. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82(13). DOI:  http://doi.org/10.18637/jss.v082.i13

Laeufer, Christiane. 1992. Patterns of voicing-conditioned vowel duration in French and English. Journal of Phonetics 20(4). 411–440.

Lampp, Claire & Heidi Reklis. 2004. Effects of coda voicing and aspiration on Hindi vowels. The Journal of the Acoustical Society of America 115(5). 2540–2540. DOI:  http://doi.org/10.1121/1.4783577

Lehiste, Ilse. 1970a. Temporal organization of higher-level linguistic units. The Journal of the Acoustical Society of America 48(1A). 111. DOI:  http://doi.org/10.1121/1.1974906

Lehiste, Ilse. 1970b. Temporal organization of spoken language. In OSU Working Papers in Linguistics 4. 96–114. https://linguistics.osu.edu/sites/linguistics.osu.edu/files/workingpapers/osu_wpl_04.pdf.

Lindblom, Björn. 1967. Vowel duration and a model of lip mandible coordination. Speech Transmission Laboratory Quarterly Progress Status Report 4. 1–29.

Lisker, Leigh. 1957. Closure duration and the intervocalic voiced-voiceless distinction in English. Language 33(1). 42–49. DOI:  http://doi.org/10.2307/410949

Lisker, Leigh. 1974. On “explaining” vowel duration variation. In Proceedings of the Linguistic Society of America, 225–232.

Luce, Paul A. & Jan Charles-Luce. 1985. Contextual effects on vowel duration, closure duration, and the consonant/vowel ratio in speech production. The Journal of the Acoustical Society of America 78(6). 1949–1957. DOI:  http://doi.org/10.1121/1.392651

Luke, Steven G. 2017. Evaluating significance in linear mixed-effects models in R. Behavior Research Methods 49(4). 1494–1502. DOI:  http://doi.org/10.3758/s13428-016-0809-y

Machač, Pavel & Radek Skarnitzl. 2009. Principles of phonetic segmentation. Praha: Epocha.

Maddieson, Ian & Jack Gandour. 1976. Vowel length before aspirated consonants. In UCLA Working papers in Phonetics 31. 46–52.

Magno Caldognetto, Emanuela, Franco Ferrero, Kyriaki Vagges & Maria Bagno. 1979. Indici acustici e indici percettivi nel riconoscimento dei suoni linguistici (con applicazione alle consonanti occlusive dell’italiano). Acta Phoniatrica Latina 2. 219–246.

Malisz, Zofia & Katarzyna Klessa. 2008. A preliminary study of temporal adaptation in Polish VC groups. In Proceedings of speech prosody, 383–386.

McElreath, Richard. 2015. Statistical rethinking: A bayesian course with examples in R and Stan. Boca Raton, FL: CRC Press.

Meyer, Ernst Alfred. 1904. Zur Vokaldauer im Deutschen. In Nordiska studier tillegnade A. Noreen, 347–356. Uppsala: K.W. Appelbergs Boktryckeri.

Mitleb, Fares. 1982. Voicing effect on vowel duration is not an absolute universal. The Journal of the Acoustical Society of America 71(S1). S23–S23. DOI:  http://doi.org/10.1121/1.2019285

Nicenboim, Bruno, Timo B. Roettger & Shravan Vasishth. 2018. Using metaanalysis for evidence synthesis: The case of incomplete neutralization in German. Journal of Phonetics 70. 39–55. DOI:  http://doi.org/10.1016/j.wocn.2018.06.001

Nowak, Pawel. 2006. Vowel reduction in Polish. Berkeley, CA: University of California, Berkeley dissertation.

Ohala, John J. 2011. Accommodation to the aerodynamic voicing constraint and its phonological relevance. In Proceedings of the 17th International Congress of Phonetic Sciences, 64–67.

Peterson, Gordon E. & Ilse Lehiste. 1960. Duration of syllable nuclei in English. The Journal of the Acoustical Society of America 32(6). 693–703. DOI:  http://doi.org/10.1121/1.1908183

Plug, Leendert & Rachel Smith. 2018. Segments, syllables and speech tempo perception. In Proceedings of the 9th International Conference on Speech Prosody 2018, 279–283. DOI:  http://doi.org/10.21437/SpeechProsody.2018-57

Port, Robert F. & Jonathan Dalby. 1982. Consonant/vowel ratio as a cue for voicing in English. Perception & Psychophysics 32(2). 141–152. DOI:  http://doi.org/10.3758/BF03204273

R Core Team. 2018. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Raftery, Adrian E. 1995. Bayesian model selection in social research. Sociological Methodology 25. 111–163. DOI:  http://doi.org/10.2307/271063

Raftery, Adrian E. 1999. Bayes factors and BIC: Comment on “A critique of the Bayesian information criterion for model selection”. Sociological Methods & Research 27(3). 411–427. DOI:  http://doi.org/10.1177/0049124199027003005

Renwick, Margaret & Robert D. Ladd. 2016. Phonetic distinctiveness vs. lexical contrastiveness in non-robust phonemic contrasts. Laboratory Phonology: Journal of the Association for Laboratory Phonology 7(1). 1–29. DOI:  http://doi.org/10.5334/labphon.17

Roettger, Timo B. 2019. Researcher degrees of freedom in phonetic sciences. Laboratory Phonology: Journal of the Association for Laboratory Phonology 10(1). 1–27. DOI:  http://doi.org/10.5334/labphon.147

Rothenberg, Martin. 1967. The breath-stream dynamics of simple-releasedplosive production 6. Basel: Biblioteca Phonetica.

Sanker, Chelsea. 2018. Effects of laryngeal features on vowel duration: implications for Winter’s Law. Papers in Historical Phonology 3. 180–205. DOI:  http://doi.org/10.2218/pihph.3.2018.2898

Schwartz, Geoffrey. 2016. On the evolution of prosodic boundaries–parameter settings for Polish and English. Lingua 171. 37–73. DOI:  http://doi.org/10.1016/j.lingua.2015.11.005

Schwartz, Geoffrey, Anna Balas & Arkadiusz Rojczyk. 2015. Phonological factors affecting L1 phonetic realization of proficient Polish users of English. Research in Language 13(2). 181–198. DOI:  http://doi.org/10.1515/rela-2015-0014

Schwartz, Geoffrey & Daria Arndt. 2018. Laryngeal Realism vs. Modulation theory – evidence from VOT discrimination in Polish. Language Sciences 69. 98–112. DOI:  http://doi.org/10.1016/j.langsci.2018.07.001

Sharf, Donald J. 1962. Duration of post-stress intervocalic stops and preceding vowels. Language and Speech 5(1). 26–30. DOI:  http://doi.org/10.1177/002383096200500103

Sharf, Donald J. 1964. Vowel duration in whispered and in normal speech. Language and Speech 7(2). 89–97. DOI:  http://doi.org/10.1177/002383096400700204

Slis, Iman Hans & Antonie Cohen. 1969a. On the complex regulating the voiced-voiceless distinction I. Language and Speech 12(2). 80–102. DOI:  http://doi.org/10.1177/002383096901200202

Slis, Iman Hans & Antonie Cohen. 1969b. On the complex regulating the voiced-voiceless distinction II. Language and Speech 12(3). 137–155. DOI:  http://doi.org/10.1177/002383096901200301

Slowiaczek, Louisa M. & Daniel A. Dinnsen. 1985. On the neutralizing status of Polish word-final devoicing. Journal of Phonetics 13(3). 325–341.

Sóskuthy, Márton. 2013. Phonetic biases and systemic effects in the actuation of sound change. Edinburgh: University of Edinburgh dissertation.

Sóskuthy, Márton, Paul Foulkes, Vincent Hughes & Bill Haddican. 2018. Changing words and sounds: The roles of different cognitive units in sound change. Topics in Cognitive Science 10(4). 1–16. DOI:  http://doi.org/10.1111/tops.12346

Strycharczuk, Patrycja. 2012. Sonorant transparency and the complexity of voicing in Polish. Journal of Phonetics 40(5). 655–671. DOI:  http://doi.org/10.1016/j.wocn.2012.05.006

Summers, W. Van. 1987. Effects of stress and final-consonant voicing on vowel production: Articulatory and acoustic analyses. The Journal of the Acoustical Society of America 82(3). 847–863. DOI:  http://doi.org/10.1121/1.395284

Tilsen, Sam. 2013. A dynamical model of hierarchical selection and coordination in speech planning. PLoS ONE 8(4). e62800. DOI:  http://doi.org/10.1371/journal.pone.0062800

Tilsen, Sam. 2016. Selection and coordination: The articulatory basis for the emergence of phonological structure. Journal of Phonetics 55. 53–77. DOI:  http://doi.org/10.1016/j.wocn.2015.11.005

Todd, Simon, Janet B. Pierrehumbert & Jennifer Hay. 2019. Word frequency effects in sound change as a consequence of perceptual asymmetries: An exemplar-based model. Cognition 185. 1–20. DOI:  http://doi.org/10.1016/j.cognition.2019.01.004

Vazquez-Alvarez, Yolanda & Nigel Hewlett. 2007. The ‘trough effect’: An ultrasound study. Phonetica 64. 105–121. DOI:  http://doi.org/10.1159/000107912

Wagenmakers, Eric-Jan. 2007. A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review 14(5). 779–804. DOI:  http://doi.org/10.3758/BF03194105

Waniek-Klimczak, Ewa. 2011. Aspiration in Polish: A sound change in progress? In Mirosław Pawlak & Jakub Bielak (eds.), New perspectives in language, discourse and translation studies, 3–11. Heidelberg, Dordrecht, London, New York: Springer. DOI:  http://doi.org/10.1007/978-3-642-20083-0_1

Warren, Willis & Adam Jacks. 2005. Lip and jaw closing gesture durations in syllable final voiced and voiceless stops. The Journal of the Acoustical Society of America 117(4). 2618–2618. DOI:  http://doi.org/10.1121/1.4778168

Wells, John C. 1990. Syllabification and allophony. In Susan Ramsaran (ed.), Studies in the pronunciation of English: A commemorative volume in honour of A. C. Gimson, 76–86. New York: Routledge.

Westbury, John R. 1983. Enlargement of the supraglottal cavity and its relation to stop consonant voicing. The Journal of the Acoustical Society of America 73(4). 1322–1336. DOI:  http://doi.org/10.1121/1.389236

Wickham, Hadley. 2017. tidyverse: Easily install and load the ‘Tidyverse’. R package version 1.2.1.

Winter, Bodo & Timo B. Roettger. 2011. The nature of incomplete neutralization in German: Implications for Laboratory Phonology. Grazer Linguistische Studien 76. 55–74.

Zuur, Alain F., Elena N. Ieno & Chris S. Elphick. 2010. A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution 1(1). 3–14. DOI:  http://doi.org/10.1111/j.2041-210X.2009.00001.x