1 Introduction

In recent years there have been a number of sociophonetic studies investigating /s/-retraction in English, a process by which /s/ is realised as a more retracted [ʃ]-like variant in complex onsets. This is most widely reported in the word-initial cluster /stɹ/, as in street and strength, but is also found word-medially, as in district and frustrated.

Retraction in /stɹ/ has been widely attested in varieties of American English, as studied by, for example, Durian (2006; 2007) in Columbus, OH, Gylfadottir (2015) in Philadelphia, PA, Wilbanks (2017) in Raleigh, NC, Rutter (2011) in Louisiana, Phillips (2001) in Georgia and Ahlers (2020) in Austin, TX. By comparison, /s/-retraction is severely under-researched in varieties of British English. It has been attested in Estuary English (Altendorf 2003), Colchester (Bass 2009) and Edinburgh (Sollgan 2013); however, to date, there has been no detailed community-level study of the type seen for American English. While this dearth of research has been somewhat alleviated by a recent cross-dialectal study across varieties of Scotland (and North America) by Stuart-Smith et al. (2019), this work has focused primarily on comparisons of the status of /s/-retraction in a range of English varieties rather than a detailed investigation of change in a single speech community.

In this paper, we present the first evidence of /s/-retraction in Manchester English, spoken in the North West of England and, in doing so, we address the question of what phonetic factors motivate this process of retraction. It has often been claimed that /s/ retracts in these contexts due to long-distance assimilation to the rhotic segment in /stɹ/ clusters (see e.g. Shapiro 1995). An alternative account to this is that retraction is in fact local, arising as a consequence of the affrication of /t/ by /ɹ/ rather than by /ɹ/ directly (Lawrence 2000). The problem in determining which of these competing explanations receives the strongest empirical support is summarised perfectly by Wilbanks (2017: 302) who writes that “it may prove difficult to tease apart the effects of contact with affricated /t/ and variably-articulated /ɹ/ […] and isolate a single underlying cause”. That this affrication trigger is often overlooked in work on /s/-retraction is perhaps a consequence of the fact that work has primarily been conducted on yod-dropping varieties of American English; by turning the focus instead to yod-retaining varieties of British English we can consider the behaviour of /s/ in another environment, namely /stj/ clusters (e.g. student, stupid), in which /tj/ undergoes coalescence to [tʃ]. Crucially, this provides independent evidence of how /s/ is realised in a cluster with affrication but in the absence of a rhotic segment. To date, there is no quantitative evidence of the behaviour of /s/ in these clusters and consequently no comparison with the more widely-attested retraction in /stɹ/. In addition to this, we also provide the first acoustic comparison between these contexts and the /stʃ/ environment (e.g. mischief), where /s/ occurs before an underlying affricate rather than one derived through an independent (and still partially variable) phonological process.

Finally, we also address /s/-retraction in the context of the wider sibilant space, defined here as the speaker-specific range of typical spectral values for underlying /s/ and /ʃ/ segments in pre-vocalic environments, e.g. seep and sheep. By considering these two “end points” of the sibilant continuum, which may be changing independently of context-specific /s/-retraction, we can gain insight into how advanced this process is in Manchester English. In doing so, we ask the following question: is there evidence that /s/-retraction has become stabilised as a categorical phonological rule, with speakers for whom /s/ in /stɹ/ is phonetically identical to their realisation of an underlying /ʃ/?

In sum, the research described here is guided by the following questions:

  •    i.  How advanced is /s/-retraction in Manchester English and is there evidence of an apparent-time change in this community?

  •  ii.  Is retraction observed in /stj/ clusters and to what extent does /s/ show similar behaviour in /stj/, /stɹ/ and /stʃ/ contexts?

  • iii.  In light of this, which of the two competing accounts of /s/-retraction (local vs non-local assimilation) finds the strongest empirical support in Manchester English?

The results of this study provide the first robust empirical evidence of a community-level change in /stɹ/ in a British variety of English. This appears to be a regular coarticulatory sound change which is led by young women (and, possibly, working-class speakers) and in which higher frequency words are more advanced.

Our findings further provide new insight into the mechanisms of /s/-retraction. That is, in this first large-scale quantitative investigation of retraction in /stj/ and /stʃ/, we find that they are changing in parallel with /stɹ/ and this suggests that, although /ɹ/ and /j/ may have some direct effect on /s/, this is unlikely to be the primary cause that initiates this change. The proposed solution to the actuation problem advanced by Baker et al. (2011), which relies on covert articulatory variation in /ɹ/, is therefore unable to account for this particular instance of /s/-retraction. In addition to significant change in /stɹ/, /stj/ and /stʃ/, we also observe change in pre-vocalic /s/ and /ʃ/ in the form of a larger acoustic contrast between these sibilants among younger speakers. We discuss these results and their implications for the origins and underlying mechanisms of /s/-retraction, as well as variation in the wider sibilant space.

2 Background

2.1 Preliminaries on /s/ and /s/-retraction

Although this study is concerned with context-specific /s/-retraction, there is of course a wealth of evidence highlighting the degree of within- and between-speaker variability in the production of sibilants more generally and the factors that condition these patterns of synchronic variation (Newman et al. 2001; Stuart-Smith 2007; 2020; Levon et al. 2017). This is particularly the case for /s/, in which variation has been shown to have taken on socio-indexical meaning: more fronted /s/ realisations are perceived as less masculine and more gay and, in production, are more frequent among male speakers who are gay or bisexual (Campbell-Kibler 2011; Podesva & Van Hofwegen 2014; Levon 2014).

However, it is important to separate the realisation of /s/ in /stɹ/ from the wider spectral variation in /s/ production. /stɹ/ has been studied as a sociolinguistic variable in its own right, separate from the wider patterns of socio-indexical variation in /s/ found in other environments (see e.g. Durian 2007; Gylfadottir 2015; Wilbanks 2017). There is also strong perceptual evidence to support this distinction: Phillips & Resnick (2018; 2019) find that a retracted /s/ in /stɹ/ clusters does not carry the same socio-indexical meaning of masculinity, toughness or heterosexuality that has been observed for retracted /s/ in other /sCɹ/ environments.

There is a general lack of large-scale studies of /s/-retraction that combine robust acoustic analysis with community-level data in order to investigate its status within a given speech community in detail, especially with a view to investigating change. Existing work is also variable with respect to the method of coding /s/-retraction. Some studies have adopted a binary classification, coding tokens of /s/ as either retracted or non-retracted (e.g. Janda & Joseph 2003; Bass 2009); while this finds some support from a study by Rutter (2011), who reports that a majority of retracted forms fall within a speaker’s normal range for [ʃ], Labov (2001) argues that, in Philadelphia English, there are at least four variants differing in how [ʃ]-like they are. A fine-grained acoustic measure of retraction is vital given that this process is also argued to occur, though to a much lesser degree, in other contexts such as pre-vocalic /sp, st, sk/ and /spɹ, skɹ/ (Labov 1984; Janda & Joseph 2003; Baker et al. 2011).

The few large-scale studies that have been conducted concern almost exclusively varieties of English spoken in North America (e.g. Gylfadottir 2015 in Philadelphia, PA and Wilbanks 2017 in Raleigh, NC) and all show evidence of apparent-time change with increasing /s/-retraction among younger speakers.1 However, the role of other social factors in explaining variation in retraction, such as gender and socioeconomic status, is less consistent. Durian (2006) and Bass (2009) describe “rapid anonymous” surveys, in which /s/ is auditorily coded, but the former finds a female lead for /s/-retraction in American English (Columbus, OH) and the latter a male lead in British English (Colchester). Larger-scale studies based on acoustic data are similarly variable, finding either that women are leading in this change (Wilbanks 2017; Stuart-Smith et al. 2019) or detecting no gender effect at all (Gylfadottir 2015). Additionally, while Labov (2001) finds an association between /s/-retraction and working-class speech in subjective evaluation tests, there is not yet any clear evidence of a relationship between /s/-retraction and social class on the basis of actual production data. Although we plan to address these topics in future work that considers the wider status of /s/-retraction in the speech community, and we do discuss these factors briefly in Section 4 since they are included in the model, the primary focus of this paper is on the mechanisms by which /s/ undergoes retraction in this environment.

As discussed in Section 1, there are two major proposals that aim to explain why /s/ retracts in these contexts. These differ from one another with respect to the locality of the triggering mechanism:

  •   i.  /s/ undergoes long-distance assimilation to /ɹ/;

  • ii.  /s/ undergoes local assimilation to an adjacent affricated /t/, itself affricated locally under the influence of a following /ɹ/.

A fuller consideration of these competing explanations follows.

2.2 Arguments for /ɹ/ as the trigger

The first of these competing explanations, that retraction is caused directly by the /ɹ/ in these /stɹ/ clusters, was initially proposed by Shapiro (1995). This argument was motivated by the observation that /s/ retracts to a significantly lesser degree in /st/ clusters, e.g. steep, suggesting that the rhotic segment in /stɹ/ plays a crucial role and that the retraction process is therefore a case of assimilation “at a distance”. The presence of only low-level retraction in /st/ and other complex onsets has been widely attested and even varieties that are reportedly not undergoing the /stɹ/ change itself, such as Australian English, still show a slightly lower centroid frequency for /s/ in contexts such as /sp, st, sk/, described as the “phonetic pre-conditions” of the change in /stɹ/ (Stevens and Harrington 2016: 118).

While Shapiro (1995) relies on the inspection of secondary data and anecdotal evidence, later studies provide a more strongly quantitative and empirical support for these claims. An articulatory experiment by Baker et al. (2011), in which ultrasound tongue imaging is combined with simultaneous acoustic analysis, finds evidence of a coarticulatory bias towards retraction in other /sCɹ/ clusters in American English. In their study, /s/ is produced with a slight dampening of centre of gravity in all complex onset clusters, with a stronger lowering in centroid frequency observed when the complex onset contains /ɹ/, as in words such as spring and screech. Although the effect is registered most strongly in /stɹ/, which is clearly set apart from all other onset types, this result suggests that /ɹ/ has some long-distance lowering effect on the frequency of /s/ regardless of the intervening consonant.

Another line of argumentation provided by Baker et al. (2011) draws upon the inherent variability in the articulation of /ɹ/. It is well established that /ɹ/ can be produced with a range of lingual constriction types, ranging from more “tip-up” retroflex to more “tip-down” bunched articulations, often with little to no perceptible acoustic difference between them (Delattre & Freeman 1968; Twist et al. 2007). Baker et al. (2011) report that all speakers in their study used a bunched /ɹ/ variant in /stɹ/, but with variation within this category with respect to the similarity of the lingual shape in the articulations of /s/ and /ɹ/ in these clusters. Among speakers they classify as “non-retractors”, for whom retraction in /stɹ/ stems from gradient coarticulation rather than a distinct production target, the magnitude of retraction correlates with this inter-speaker variation in the tongue shape for /ɹ/ and specifically its similarity to the tongue shape for /s/.

It is also interesting to note the claim by Baker et al. (2011) that /stɹ/ clusters favour a bunched rather than retroflex tongue configuration for /ɹ/ as there is independent evidence (albeit from British, rather than American English) that bunched /ɹ/ is accompanied by more extreme lip protrusion relative to retroflex /ɹ/ (King & Ferragne 2020). This might suggest that any role of /ɹ/ may in part be due to regressive assimilation of labialisation and less so the lingual gesture, a possibility also raised by Janda & Joseph (2003: fn. 8) in a discussion of cross-linguistic differences in anticipatory labialisation.

One final piece of suggestive evidence regarding the role of /ɹ/ comes from a small-scale study of /s/-retraction in British English: Sollgan (2013) reports variability in tongue shape for /ɹ/ among speakers of Edinburgh English and, crucially, observes that alveolar realisations of /ɹ/ rarely co-occur with a retracted /s/ variant. Based on this, one might argue that there is a close relationship between the tongue shape of /s/ and of the rhotic segment in these onset clusters, mirroring the aforementioned observation in the Baker et al. (2011) study.

2.3 Arguments for affrication as the trigger

Competing accounts of /s/-retraction have proposed that affricated /tɹ/ clusters are responsible for the retraction of a preceding /s/. This argument was first made by Lawrence (2000), in response to the /ɹ/-centric explanation provided by Shapiro (1995) as detailed in the preceding subsection. He points out that when innovative [ʃ]-like variants are produced, they are always followed by an affricated /tɹ/ cluster and claims that the derivation follows a two-step process as follows: /stɹ/ → [stʃɹ] → [ʃtʃɹ] (Lawrence 2000: 83). Although this is based largely on anecdotal evidence, it is supported by several studies of /s/-retraction that note how participants who retract always have affricated /tɹ/ clusters and that /tɹ/-affrication predates /s/-retraction (Magloughlin & Wilbanks 2016; Smith et al. 2019). These results suggest a strong (though not necessarily unidirectional) link between these two processes: speakers may affricate the /tɹ/ cluster without also retracting the /s/ but they do not retract /s/ without affricating the following /t/. Affrication has also played a central role in explanations of /s/-retraction in other, lesser-studied varieties of English, such as Trinidadian English where it has been described as the catalyst for this change in /s/ centroid frequency (Ahlers & Meer 2019).

The affrication of such clusters, e.g. in words like train and try (similarly in its voiced counterpart /dɹ/, e.g. drink, dry) is well-accepted in descriptions of English, dating back to the relatively early reference by Jones (1956: §270) that “[t]he Southern English tr and dr seem to be intermediate between single affricates and sequences of two distinct sounds”. Its status is noted by textbooks on English pronunciation and highlighted for foreign learners of English (Cruttenden 2014; Lindsey 2019). Additional support for affrication of /tɹ/ clusters comes from children’s spellings, e.g. try as CHRIE and dragon as JRAGIN (O’Neil 2013: 222).

The extent to which /tɹ/-affrication is a stable phenomenon or worthy of study as a change in progress is not clear in existing descriptions and this is clouded by the lack of empirical studies, particularly when compared to /stɹ/. Wells (2011), at least, asserts its stability in Southern British English: “I don’t believe there is any such phonological change in progress.” This claim is potentially contradicted by Lindsey’s (2019: 61–2) recent description of Standard Southern British English in a dedicated chapter entitled A New chrend. He describes traditional Received Pronunciation as having had retracted [t̠ɹ] and [d̠ɹ] in words such as train and drain but modern-day Standard Southern British English as having [tʃɹ] and [dʒɹ]. Lindsey describes this as a phonological process of simplification and regularisation, indicating that a change has taken place. Jones’s mid-twentieth century observations could be interpreted as evidence in either direction: on the one hand, the existence of long-standing /tɹ/-affrication could be taken as evidence for its stability; on the other, Jones’s use of the description “intermediate” could support an argument whereby affrication was formerly something like [t̠ʃɹ] and is now more like [tʃɹ]. At least in other varieties of English, such as North America (Smith 2013: 71), Australia (Stevens & Harrington 2016: 119) and New Zealand (Gordon et al. 2004: 612), all of whom describe the process as some kind of sound change (albeit without empirical data to demonstrate this). It is possible, however, that the British Isles are ahead on this trend.

Wells (2011) suggests a minimal pair as a diagnostic: sentry and the condensed form of century (i.e. with no intermediate schwa). If the pair are the same, it indicates phonological neutralisation. If not (which Wells claims is the common tendency), he asserts that -tr- goes with sentry and requires a broad phonetic transcription of [tɹ], not [tʃɹ].

Regrettably, as indicated, there is next to no empirical work demonstrating whether /tɹ/-affrication is a long-established stable phenomenon or, alternatively, a change in progress representing either an advancement from [tɹ] > [tʃɹ] or a gradient shift along a continuum. The major exception to this is Magloughlin’s (2018) thesis on /tɹ/-affrication in American English (see also Schwartz 2021 for perceptual tests). Magloughlin demonstrates that affrication from [tɹ] to [tʃɹ] is a phonetically gradual change in progress in Raleigh, NC well underway by the middle of the twentieth century. She presents articulatory data from ultrasound tongue imaging, supporting a situation whereby earlier generations displayed an epiphenomenal gradient effect of coarticulatory affrication, which was the diachronic precursor to a fully phonologised and subsequently stabilised realisation over time, in line with the predictions of the life cycle of phonological processes (Bermúdez-Otero 2015) (although Magloughlin does not herself reference this framework). She provides evidence from gestures such as lip rounding generalising to other positions in the word where the coarticulatory effect is not present, in accordance with the life cycle prediction of rule generalisation. Magloughlin proposes two types of speakers: those for whom /tɹ/ is identical to [tʃɹ] and those who have some kind of different affricate. Although this research illuminates the situation for North American varieties, and British English varieties may arguably be way ahead on the trajectory of the completion of change, Magloughlin’s findings demonstrate that within a given speech community, /tɹ/-affrication could be a sound change, even if Wells is right concerning present-day British English and it is stable.2

/tɹ/ clusters are certainly affricated in Manchester, although we cannot comment here on whether there is full affrication to [tʃɹ] or some kind of in -between variant. Three of the four authors are native Mancunians and report complete homophony between sentry and compressed century, perhaps suggesting a more advanced situation than that reported by Wells (2011).

An attractive aspect of the affricate-based explanation, as opposed to assimilation to /ɹ/, is that it would also capture the behaviour of /s/-retraction in /stj/ and /stʃ/ clusters, for which this study provides the first acoustic evidence in a large-scale investigation of the speech community. This is because /t/ also affricates before /j/ in a process often called “yod-coalescence”, e.g. tune [tʃʉːn], although there is not yet a detailed quantitative/acoustic study of this phenomenon.

/tj/-coalescence is discussed by Wells (1997), who claims that it was initially found in unstressed syllables, e.g. perpetual, before spreading to stressed contexts, e.g. tune, in the late twentieth century. Hannisdal (2006) provides a good overview of this widespread change in British English, with a focus on Received Pronunciation, by comparing various pronunciation dictionaries over the course of the twentieth century and the extent to which they list /tʃ/ as a possible variant for words historically containing /tj/. Hannisdal notes that coalesced forms are absent in the first edition of the English Pronouncing Dictionary (Jones 1917) but begin to appear sporadically by the later 16th edition (Jones 2003). In other such dictionaries published since the turn of the century, affricated variants are listed more consistently, e.g. in the Longman Pronunciation Dictionary (Wells 2000) and the Oxford Dictionary of Pronunciation for Current English (Upton et al. 2001). Increases in /tj/-coalescence have also been attested within individual speech communities, including the likes of Ipswich and the Fens in East Anglia (Britain et al. 2008), where it co-exists with variable yod-dropping, and across various locales in the East Midlands (Braber & Flynn 2016).

As with the discussion of /tɹ/-affrication earlier and the fact that speakers have to affricate /tɹ/ for /s/-retraction to be licensed, it is also the case that significant retraction of /s/ in /stj/ clusters is similarly limited to instances where the /tj/ cluster itself does actually undergo coalescence.

Although the coalesced form is now incredibly widespread across varieties of British English, some speakers do still retain the /tj/ pronunciation. However, this resistance to /tj/-coalescence is largely restricted to speakers of “conservative RP”, who see such realisations as a less formal variant, particularly in word-initial onsets of stressed syllables (Upton 2008: 229; see also Ramsaran 1990 and the Daily Telegraph quote reported by Kerswill 2001: 12 describing coalescence as an “insidious degradation of spoken English”). Anecdotally, coalescence is certainly predominant in Manchester English, applying consistently across the lexicon and throughout the whole social scale.

As outlined in Section 1, investigations of /s/-retraction have almost all focused exclusively on the /stɹ/ context. There is, however, some evidence from smaller-scale studies of how the affricate derived from /tj/-coalescence influences a preceding /s/ in words such as student. Retraction in this environment is discussed briefly by Glain (2014) in a study of what he terms “instances of contemporary palatalisation”, referring to the increasing occurrence of palato-alveolar sounds such as /ʃ, tʃ, ʒ, dʒ/ in certain segmental contexts in British English. However, it should be noted that Glain draws no causal link between affrication and retraction, suggesting instead that it is caused directly by the /ɹ/ and /j/ in these clusters. A retracted /stj/ variant is also mentioned briefly in the Longman Pronunciation Dictionary (Wells 2000: 50), where it is interesting to note that the derivation is given as /stj/ → [stʃ] → [ʃtʃ] but for a word like strong it is given as /stɹ/ → [ʃtɹ] with no equivalent affrication.

Finally, retraction in /stj/ has been attested in New Zealand English. In his response to Shapiro (1995), Lawrence (2000) explicitly mentions the behaviour of /stj/ sequences but only cites examples in word-medial position (e.g. moisture) and across word boundaries (e.g. last year). Like Shapiro (1995), this paper is also largely based on anecdotal evidence rather than a robust quantitative investigation. A study of this type was conducted by Warren (2006) using elicitation recordings from the New Zealand Spoken English Database (Warren 2002), though the /stj/ results are based solely on the words student and Stewart and tokens were coded only auditory in a binary fashion (retracted vs non-retracted) rather than measured acoustically. The results suggest a strong gender divide with male speakers much more likely to retract in /stj/ relative to female speakers (42% vs 14%), with this stark divide leading to an overall significant difference in the rate of retraction between /stɹ/ and /stj/ among these speakers of New Zealand English. The behaviour of /s/-retraction in this study is also further complicated by the way in which the process interacts with variable /t/-deletion in the same clusters.

Overall, there is some evidence of retraction in /stj/ across varieties of English. However, the severe lack of large-scale acoustic analyses, particularly compared to the widely-studied retraction in /stɹ/, means that we know very little about its exact synchronic and diachronic behaviour and the extent to which these two environments of retraction behave in a similar fashion.

3 Methodology

The present study contributes to our understanding of /s/-retraction with a number of methodological strengths, including robust acoustic analysis rather than auditory coding and working with conversational sociolinguistic interview data with a large and balanced sample of a single speech community, including sociodemographic metadata for all speakers. The methods of analysis are detailed further in the following subsections.

3.1 Data collection

This study is based on a sample of 118 speakers (61 male, 57 female) who grew up in Manchester from the age of 3 or younger, with at least one local parent, stratified by age, gender, social class and ethnicity. For the purposes of the study, Manchester is defined as the urbanised area within the M60 ring-road motorway, including neighbourhoods immediately south of the M60, such as Sale, Wythenshawe, Northenden, Cheadle and Stockport.

The informants’ ages at interview range from 16 to 87. Social class is operationalised in terms of occupational levels as occupation has been shown to be the best single indicator of socio-economic status for the purposes explaining linguistic variation, both in the US (Labov 2001) and in the UK (Baranowski & Turton 2018). Following Baranowski (2017), there are five occupational levels ranging from lower-working for unskilled workers to upper-middle class for occupations such as university professors and high-level managers and administrators. The assignment to a particular social class is based the occupational history of a speaker rather than just the last job they held; children are assigned the social class of the parents. The coding of ethnicity is based on speakers’ self-identification, with 118 white British, 18 Pakistani and 13 Black Caribbean informants in the wider corpus, though in this study we focus on the white British speakers due to the relatively smaller samples in other groups. A more detailed breakdown by age and social class is given in Table 1.

Table 1

Breakdown of speaker numbers by birth year and social class. Each cell contains the total number of speakers in the corpus for that category followed by a further breakdown by gender (male|female).

lower working upper working lower middle middle middle upper middle total
oldest (1907–1949) 6 (3|3) 4 (2|2) 2 (0|2) 2 (1|1) 2 (1|1) 16 (7|9)
middle (1950–1989) 14 (9|5) 19 (9|10) 17 (9|8) 12 (4|8) 4 (1|3) 66 (32|34)
youngest (1990–2001) 5 (4|1) 10 (6|4) 10 (7|3) 11 (5|6) 0 (0|0) 36 (22|14)
total 25 (16|9) 33 (17|16) 29 (16|13) 25 (10|15) 16 (2|4) 118 (61|57)

The informants were recorded during sociolinguistic interviews, conducted mostly between 2011 and 2018 with only a small number of speakers recorded earlier than this. Some informants were recorded in their homes, some at their place of work and others on a university campus. All were recruited through a “friend of a friend” approach (after Tagliamonte 2006: 21–2). The interviews focused on eliciting narratives of personal experience, which are known to approximate speakers’ vernaculars (Labov 1984; Tagliamonte 2006). The interviews were supplemented with two formal elicitations: word-list reading and minimal pairs for a number of vocalic and consonantal contrasts. However, as these elicitation tasks do not contain sufficient tokens of /s/ in the target environments, the results in this paper are based solely on spontaneous speech. The recordings were forced-aligned using FAVE (Rosenfelder et al. 2014), the online Forced-Alignment and Vowel Extraction suite developed at the University of Pennsylvania, in order to produce a time-aligned transcription for efficient and automated extraction of the relevant acoustic measures. This process of acoustic data extraction and processing is described in the following section.

3.2 Acoustic analysis

Studies of /s/-retraction almost always characterise the fricative quality using its centroid frequency, commonly referred to as its centre of gravity (CoG), which has been shown to correlate with the size of the anterior cavity in sibilant production: more [s]-like realisations have a higher CoG and more [ʃ]-like realisations have a lower CoG (Jongman et al. 2000). Although other spectral moments such as standard deviation, skew and kurtosis have also been shown to be relevant in quantifying place of articulation along the front–back plane (Koenig et al. 2013), the analysis in this paper utilises CoG due to its widespread use in other sociolinguistic studies of this variable (see e.g. Baker et al. 2011; Wilbanks 2017; Stuart-Smith et al. 2019).

All interviews in the corpus were recorded at a 44.1 kHz sampling rate with a Sony PCM-M10 recorder and Audio-Technica ATR3350 lavaliere microphone, though some were later downsampled to 22 kHz for the forced-alignment process; because of this, all interviews were downsampled to this same 22 kHz rate at the acoustic analysis stage in order to ensure comparability between speakers with respect to centroid frequencies. In addition to this, all recordings were low pass filtered at 750 Hz to remove spectral information corresponding to residual voicing and environmental noise, as is common in sociophonetic studies of this variable (see e.g. Gylfadottir 2015; Wilbanks 2017).

Acoustic measurements were extracted using a custom Praat script (Boersma & Weenink 2018) and the speakr package (Coretta 2021) in R (R Core Team 2018) to automate extraction of spectral values for the whole corpus. As well as capturing all tokens in the two environments of interest, namely /stɹ/ and /stj/, the script also extracted acoustic measurements for underlying /s/ and /ʃ/ segments in all other environments, as well as the fricated part of /tʃ/ segments, in order to establish speaker- and community-wide norms for the wider sibilant space. CoG was measured at the midpoint of these fricatives and at five equally-spaced points across their duration based on the time-aligned intervals as determined by the forced alignment process. The script also extracted a smoothed transformation of the entire spectral profile. The analysis in this paper will focus on the static midpoint measurements for ease of comparison with earlier studies, but in future work we plan to look at the wider spectral profile of these tokens for greater insight into the dynamics of /s/-retraction.

In addition to these primary acoustic measures, other linguistic information was extracted including segment duration, position in the word (initial vs medial vs final), surrounding phonological environment (the segments immediately adjacent to the fricative and, in the case of /stɹ/ words, the vowel following this consonant cluster) and lexical frequency (measured on the Zipf scale and based on frequency counts in the SUBTLEX-UK corpus, see van Heuven et al. 2014).

The script extracted acoustic measurements of just over 100,000 sibilant tokens but, due to the automated nature of forced-alignment and the resulting potential for erroneous measurements, the data extraction was followed by a data-cleaning process in R based on the practices followed in earlier work (see e.g. Stuart-Smith et al. 2019 on the automated analysis in SPADE). This involved the removal of statistical outliers, identified as tokens with a CoG value outside 1.5x the interquartile range for a particular speaker’s production of either /s/ or /ʃ/ (N = 4,327), tokens with a duration less than 50 ms (N = 10,385) and tokens with a CoG or peak frequency less than 2,400 Hz (N = 13,445). The final dataset contains 72,986 tokens, 1,019 of which are /stɹ/, 236 of which are /stj/ and 82 of which are /stʃ/.

Following work by Gylfadottir (2015) and Ahlers & Meer (2019), the CoG measure was normalised and scaled into z-scores based on each speaker’s individual distribution. This removes individual differences in absolute frequency range (e.g. due to physiological differences in vocal tract size) while maintaining the relative relationships between individual tokens, thus allowing for more reliable comparisons between speakers when investigating the community-wide behaviour of /s/-retraction. This is particularly important in the case of /s/-retraction, where interest lies primarily in the relationship between categories, the retracted /s/ and a speaker’s own sibilant range for /s/–/ʃ/, rather than the absolute frequency values of these tokens. The result is a z-score where higher/positive values correspond to a higher centroid frequency, i.e. more [s]-like, lower/negative values correspond to lower centroid frequencies, i.e. more [ʃ]-like, and values close to 0 represent the midpoint of a speaker’s sibilant frequency range. The results presented in the following section are based primarily on these normalised measures.

4 Results

Although the primary focus of this paper is to establish the status of /s/-retraction in Manchester English and the extent to which /stɹ/, /stj/ and /stʃ/ pattern together, it is first important to establish an overall picture of the wider sibilant space in this community. Figure 1 shows the normalised CoG values for all tokens of underlying /s/ and /ʃ/ split by environment, ordered in the expected direction of retraction magnitude based on earlier work (Baker et al. 2011) with the addition of two contexts: /stj/, which is thus far understudied and here placed next to /stɹ/ for ease of comparison and /stʃ/ (e.g. mischief), which is similarly overlooked in the literature.

Figure 1
Figure 1

The distribution of centre of gravity values (normalised into z-scores) by type of cluster (non-word-final tokens only).

The overall picture appears to be quite comparable to that established by Baker et al. (2011) in their study of American English varieties: pre-vocalic /s/ unsurprisingly has the highest CoG and this is followed by /sp, sk, st/ clusters (e.g. spin, skin, sting) which demonstrate a very slight tendency for retraction. Interestingly, further along the hierarchy there is evidence from /spɹ/ and /skɹ/ clusters (e.g. spring, scream) that the presence of /ɹ/ in these clusters can lead to slightly more advanced retraction despite this not being an environment in which /t/-affrication takes place (see also Stuart-Smith et al. 2019 for similar results). However, crucially, it is evident that the /stɹ/ context is set apart from the other /sCɹ/ environments, demonstrating even more extreme retraction of /s/. Figure 1 also provides the first quantitative evidence of community-level change in /stj/ clusters (e.g. student), which appears to be at a similar level to /stɹ/.3 Furthermore, the /stʃ/ context provides an important point of comparison to /stɹ/ and /stj/ as here we see the realisation of /s/ before what is an indisputable underlying affricate rather than the derived affricate we see in the latter two environments. It is interesting to note that /s/-retraction is evident here too and, additionally, is slightly more advanced relative to /stɹ/ and /stj/.

Averages of the raw frequencies are additionally provided in Table 2 to facilitate comparison with earlier studies in which only non-normalised measures are reported.

Table 2

Average centre of gravity by cluster type.

mean CoG (Hz)
all speakers men women
/sV/ 5897 5301 6724
/sp/ 5778 5187 6543
/sk/ 5660 4968 6429
/st/ 5573 5025 6301
/spɹ/ 5324 4959 6023
/skɹ/ 5184 4681 5986
/stɹ/ 4792 4466 5212
/stj/ 4790 4459 5179
/stʃ/ 4396 4219 4799
/ʃ/ 3916 3681 4123

While there is still a sizeable gap between the centroid frequencies of /s/ in /stɹ/ and /stj/ words compared with the /ʃ/ end-point of this continuum, it is important to note that these values are aggregated over the entire community. However, as reported in Section 2, there is of course substantial evidence from other varieties of English demonstrating that /s/-retraction is an ongoing, or recently-completed, sound change. With this in mind, Figure 2 plots the distribution of CoG values separately for the youngest and oldest cohorts of speakers in the corpus, with speakers in the “youngest” group being born between 1990 and 2001 (aged between 16–25 at time of interview) and speakers in the “oldest” group being born between 1907 and 1949 (aged between 63–87 at time of interview). Overall, there is a great deal of stability in the wider sibilant space whereas a clear and striking change between generations can be found in the /stɹ/ and /stj/ contexts that, for the younger speakers of Manchester English, now partially overlap with the frequency range for underlying /ʃ/.

Figure 2
Figure 2

The distribution of centre of gravity values (normalised into z-scores) by type of cluster and split by age group (non-word-final tokens only).

A more in-depth analysis of the /stɹ/ and /stj/ contexts will be presented later in this section. Before then, we will briefly explore these potential changes across all environments to lend insight into the wider behaviour of /s/ across generations. A mixed-effects linear regression model was fitted using the lme4 package in R (Bates et al. 2015), with normalised centre of gravity as the dependent variable and an interaction between environment and age group as the only predictor (alongside random intercepts for word and speaker). The results point to stability in many of these contexts, with no significant difference between the age groups in /sp/ (β = 0.030, p = 0.644), /sk/ (β = 0.001, p = 0.986), /spɹ/ (β = 0.117, p = 0.642) and /skɹ/ (β = –0.212, p = 0.254). Unexpectedly, significant differences are observed at the two ends of this sibilant continuum, with pre-vocalic /s/ (β = 0.119, p = 0.003) and /st/ (β = 0.112, p = 0.007) both appearing to have increased in CoG over time while pre-vocalic /ʃ/ shows a decrease in frequency (β = –0.111, p = 0.021). While other studies have reported on a possible expansion of the sibilant space over time (Wilbanks 2016), it is also possible that this is simply an artefact of relying on apparent-time data for diagnosing change and that these results arise instead from the physiological effects of ageing, with the phonetic range of this /s/–/ʃ/ contrast diminishing within speakers as they age. While further work would be needed to tease apart these potential explanations, we will briefly return to this point in Section 5.

Unsurprisingly, the /stɹ/ and /stj/ contexts demonstrate the most striking change with magnitude of this difference being almost identical across the two environments (β = 0.879 and β = 0.889 respectively, p < 0.001 for both). It should be noted that while no significant change was found in /stʃ/ in this particular model, the estimate was the next largest in size (β = 0.269, p = 0.513) and the lack of statistical significance is likely attributed to the small number of observations of this context, particularly among the smaller cohort of older speakers. Nevertheless, it is interesting to note that /stʃ/ words already demonstrated a more retracted /s/ segment even for these oldest speakers born in the first half of the twentieth century and therefore a less dramatic change over time, another point that will considered in Section 5.

Taking a closer look now at the three main sources of /s/-retraction, Figure 3 shows the change discussed thus far in a more fine-grained manner through the use of birth year rather than binned age groups. The wider fricative space among younger speakers is again evident based on the distance between pre-vocalic /s/ and /ʃ/ but the most crucial finding for the purposes of this study is the striking change observed in /stɹ/, /stj/ and /stʃ/. All three of these contexts appear to change in parallel, once again providing strong evidence that the retraction of /s/ before these affricates (whether underlyingly present or not) is governed by the same process and behave in a unified manner. In all three cases, there are significant negative correlations between speakers’ birth years and mean CoG values: ρ = –0.468 (p < 0.001) for /stɹ/, ρ = –0.487 (p < 0.001) for /stj/ and ρ = –0.343 (p = 0.017) for /stʃ/. Although there is little data before 1925, extrapolation from the observed change also suggests that at the beginning of the twentieth century this process had not yet been initiated and that /s/ was very much [s]-like in these three contexts.

Figure 3
Figure 3

Normalised centre of gravity by birth year and sibilant environment, illustrating change in apparent time. Points reflect individual speaker means for each environment; glm smooths fitted to each environment with 95% confidence intervals (non-word-final tokens only).

A set of mixed-effects linear regression models were fitted to the data to explain the variation observed in these three environments, taking into account a wider set of social and language-internal predictors: birth year, gender (male vs female), social class (working class vs middle class vs upper middle class), word position (initial vs medial), segment duration (in ms), word frequency (measured on the Zipf scale, per van Heuven et al. 2014) and following vowel type (rounded vs unrounded). The decision to group vowels in the statistical model by the presence/absence of lip rounding, and not by the front–back dimension, was based on exploratory analysis of the data to determine the strongest correlate with CoG. There is of course a clear phonetic motivation for either vowel parameter having an effect on /s/-retraction due to more [ʃ]-like articulations involving both rounding of the lips (Ladefoged & Johnson 2014: 65) as well as a retracted tongue position and, in any case, the two factors are tightly linked in English as almost all back vowels are round and vice versa.4

A baseline model was fitted to all /stɹ/, /stj/ and /stʃ/ tokens containing this full range of predictors but excluding environment completely, thus not differentiating between these three groups. This was compared to a model with environment as a predictor to determine whether or not there is a significant difference between them and also a third model in which environment interacts with all of these other predictors to determine whether or not /stɹ/ and /stj/ behave differently with respect to their conditioning factors.

The coefficients table reported in Table 3, based on the baseline model without an environment factor, indicates that there are a number of significant factors involved in the variation in /s/-retraction. Younger speakers are significantly more retracted than their older counterparts (p < 0.001), confirming the pattern of change in progress illustrated earlier. Retraction is also significantly more advanced for female speakers (p = 0.008) and in word-medial position (p = 0.003). The nature of these effects is not surprising given the widespread nature of female-led change established across decades of sociolinguistic study in similar communities (Labov 2001), as well as previous reports that claim /s/-retraction actually started in word-medial position before spreading to initial positions (Durian 2007; see also Baker et al. 2011; Gylfadottir 2015; Wilbanks 2017) for similar reports of word-medial position leading the change). Retraction is also more advanced in segments of shorter duration (p < 0.001) and marginally so in words of higher token frequency (p = 0.04), which is also to be expected of an assimilatory sound change (see e.g. Bybee 2012).

Table 3

Output of the mixed-effects linear regression model; random intercepts for speaker and word (reference levels for dummy-coded categorical variables given in square brackets).

Fixed effects Estimate Std. Error t-value Pr (>|t|)
(Intercept) –0.093 0.158 –0.586 0.559
Birth year (scaled) –0.281 0.045 –6.212 <0.001 ***
Duration (scaled) 0.126 0.021 5.942 <0.001 ***
Gender [male]
female –0.235 0.087 –2.688 0.008 **
Position [initial]
medial –0.159 0.051 –3.136 0.003 **
Social class [working class]
middle class 0.075 0.090 0.834 0.406
upper middle class 0.351 0.198 1.776 0.079
Following vowel type [rounded]
unrounded –0.047 0.054 –0.872 0.391
Word frequency (Zipf) –0.065 0.031 –2.116 0.040 *

Neither following vowel type (p = 0.391) nor social class are significant, although there is a weak trend within the latter for upper-middle-class speakers—the highest social group within our data—to be more conservative and produce less /s/-retraction compared with working-class speakers. It is interesting to note that the magnitude of the effect size is larger than the other categorical variables included in the model, even the significant predictors of gender and word position (β = 0.351, cf. –0.235 and –0.159 respectively), suggesting that the lack of statistical significance stems largely from a small sample size: there are only 6 upper-middle-class speakers in the corpus (producing 81 tokens of /stɹ/, /stj/ and /stʃ/), compared with 54 lower-middle-class and 58 working-class speakers.

Crucially, environment is not significant when added to this model (p = 0.243 for /stj/; p = 0.608 for /stɹ/) nor do any of the other predictors change in their behaviour in terms of the direction or significance of their effect on CoG. ANOVA comparison between these nested models also leads to no significant increase in model fit (p = 0.345), with the full details reported in Table 4 suggesting that allowing the model to differentiate between the three environments of interest leads to barely any increase in explanatory power. A further comparison with the third model, which includes an interaction between environment and all other factors, is also reported in this table but this also leads to no significant improvement in the statistical modelling (p = 0.437).

Table 4

Results of the ANOVA comparison between nested models (a) without environment, (b) with environment and (c) with an environment interaction. All other fixed and random factors remain the same. Lower AIC/BIC values reflect a better statistical model.

Model Parameters AIC BIC Deviance p
(a) 12 2807.2 2869.5 2783.2
(b) 14 2809 2881.8 2781 0.345
(c) 28 2822.8 2968.3 2766.8 0.437

Although the environment-sensitive models explain slightly more of the observed variation in the dataset (indicated by the deviance values), the baseline model in which /stɹ/, /stj/ and /stʃ/ are treated as a single group would win in a model selection process based on AIC and BIC, which evaluate the trade-off between goodness of fit and model simplicity. In other words, the slightly reduced “information loss” in the more complex models is not large enough to warrant the increased complexity in the model structure. As such, the results here suggest strongly that there are no significant differences in the behaviour of /s/-retraction across any of these three contexts under study.

Though we do not provide robust quantitative evidence here of the status of /t/-affrication—and a thorough investigation of this lies beyond the scope of this current paper—we do see clear evidence of this in our data. To illustrate this, in Figure 4, we provide representative examples of /t/-affrication in word-initial /stɹ/ and /stj/ clusters. A future study of /t/-affrication in Manchester English is certainly warranted, both as an independent instance of change but also to potentially better inform our understanding of /s/-retraction, a point we will return to in the following discussion of these results.

Figure 4
Figure 4

Example spectrograms and waveforms for the words streets (left) and stupid (right) as produced by two female speakers (aged 23 and 20 respectively).

5 Discussion

The results here point to a number of interesting properties of /s/-retraction in Manchester English, including not only its synchronic behaviour but also its trajectory of change over the past century. At the start of this paper we set out three major research questions concerning (i) the presence of an ongoing change in /s/-retraction in Manchester English, (ii) the behaviour of /stj/ and /stʃ/ clusters specifically and the extent to which these pattern like /stɹ/ and (iii) the underlying mechanisms that motivate the change in these environments.

The results first provide strong evidence that Manchester English, like many varieties across the English-speaking world, is currently involved in an ongoing sound change of /s/-retraction. The speakers in this corpus cover birth years ranging from 1907 to 2001 (although most speakers are born after 1925) and, in this time frame, we observe a notable change with the average centroid frequency of /s/ in retracting environments starting out at levels closer to a baseline pre-vocalic /s/ and ending up at a level that sees significant overlap with /ʃ/. This would suggest that the change is at a relatively advanced stage, nearing completion in this community, assuming of course that a speaker’s own target for /ʃ/ is a natural endpoint for the change in this sibilant. This would be a sensible assumption in a framework that sees sound change progressing from phonetic rules that involve gradient coarticulatory pressures to a stabilised phonological rule in which /s/ is categorically changed to a discrete /ʃ/ target as part of the phonological derivation (see e.g. Bermúdez-Otero & Trousdale 2012 on the life cycle of phonological processes). Empirically, this might be reflected by a unimodal distribution in which there is no significant differentiation between the centroid frequencies of /s/ (in these retracting environments) and /ʃ/.

5.1 On the triggering mechanisms of /s/-retraction

This paper is the first to report a comparable apparent-time change also occurring in the /stj/ (e.g. student) and /stʃ/ (e.g. mischief) environments. Crucially, these three contexts appear to behave as one with respect to their current rate of retraction and there are no significant differences between their rates of change in this time period either. In Section 2, we outlined the two major proposals that have been put forward concerning the mechanisms behind /s/-retraction, specifically whether it is caused directly through a long-distance effect of /ɹ/ (see e.g. Shapiro 1995; Baker et al. 2011) or alternatively through contact with an adjacent affricated /t/ (see e.g. Lawrence 2000). Returning now to this question, the new evidence presented here lends strong support to the latter proposal. While it is not impossible that retraction is triggered by three distinct mechanisms in /stɹ/, /stj/ and /stʃ/, the fact that their behaviour is so similar and that they change in tandem suggests that we should instead appeal to a single unifying explanation that invokes affrication as the cause of retraction.

Of course, under a frequentist statistics framework centred around null hypothesis significance testing, it is perfectly possible that the lack of significant difference between these three environments is in part due to issues with sample size and statistical power. Indeed, while the three environments are broadly similar among the youngest speakers, visual inspection of the frequency distributions from Figure 2 indicated that /stʃ/ was perhaps slightly ahead of /stɹ/ and /stj/ at an earlier point of the change. While it would be difficult to account for a difference in the opposite direction, there is in fact an obvious reason why at one point /stʃ/ may have been ahead of the other two environments in this change: the affrication of /t/ before /ɹ/ and /j/ was (and to some extent still is) variable and, as a result, the change in these environments could be underestimated due to the inclusion of older speakers who may not actually take part in this /t/-affrication process at all. The change will naturally appear to be further ahead in /stʃ/ than in /stɹ/ and /stj/ while affrication remains optional in the latter two compared to its obligatory presence in the former, at least for older speakers who acquired the dialect at an earlier stage while affrication in /tɹ/ and /tj/ was still an ongoing change. The inclusion of “underlying” /stʃ/ words such as mischief in this analysis brings with it an obvious benefit, namely it allows us to isolate the effects of /s/-retraction from the interacting changes affecting the adjacent clusters, since these words have—and have always had—an affricate present underlyingly. A follow-up study should be conducted with a specific focus on apparent-time change in /tɹ/-affrication and /tj/-coalescence. This would shed light on whether these two changes have been taking place alongside /s/-retraction (and as such the three sound changes have been working in tandem) or, alternatively, that they have in fact been stable for a long time in this speech community, laying the foundations for /s/-retraction from an early stage.

It is important to note that the results may not necessarily be generalisable to all instances in which /s/-retraction has developed and propagated through a speech community. While this change appears to have been initiated in a number of varieties at a roughly similar time, it is possible that they are all independently triggered by a range of different mechanisms. This is especially pertinent given the fact that most varieties of American English are yod-dropping: given the absence of /j/, they demonstrate no comparable affrication or retraction in /stj/ words like student and stupid and, as such, the distribution of retracted /s/ tokens is quite different. Children acquiring the language are exposed to fewer instances of retracted /s/ before an affricate and therefore might not receive the kind of input required to reanalyse the segmental conditions of /s/-retraction in this way.5

However, even setting aside our data for a moment, there are still questions that can be raised regarding the proposal put forward by Baker et al. (2011). Most notably, if retraction is caused directly by assimilation in tongue shape to /ɹ/, why is retraction consistently so further advanced in /stɹ/ relative to /spɹ/ and /skɹ/ in all varieties of English affected by this change? This is particularly surprising for /spɹ/, where the intervening consonant has a labial place of articulation and as such the tongue has no distinct target to hit between the /s/ and /ɹ/. In a study of /s/-retraction in Philadelphia English, Gylfadottir (2015) raises a similar concern and in doing so also argues against this being a case of “assimilation at a distance” to the non-local /ɹ/ in these clusters. This question could be further illuminated with articulatory data on this covert variability in rhotic production in different segmental environments and across a range of speech communities but, in the absence of this, it remains the case that an explanation in terms of adjacency to /tʃ/ more closely matches the data.

5.2 On the origins of /s/-retraction

Alongside the disagreement over why /s/-retraction takes place in these environments, there are also conflicting reports over the origins of /s/-retraction, specifically with respect to the initiation of change in /stɹ/ in the other environments that demonstrate some low-level retraction.

Janda & Joseph (2003) argue that retraction started in /stɹ/, where today it is registered most strongly and later “spread” to the other contexts such as /sp, st, sk/. While this kind of rule generalisation is entirely possible, in which the target environment of the change is reanalysed to encompass a wider range of segmental contexts, this claim is only based on the fact that retraction is “sporadic” and less advanced in these contexts. In other words, it is based on neither real-time nor apparent-time data from which change can be observed or inferred. As such, there are obvious limitations to relying on these differences in effect magnitude from synchronic data as a tool for estimating the initiation of change.

Instead, it is entirely plausible that retraction began in all complex onset clusters simultaneously, before advancing at a faster rate of change in /stɹ/—and /stj/, as demonstrated in this paper for Manchester English—leading to the clear separation of contexts we observe in many varieties of English spoken today. Indeed, this has been suggested by Stevens & Harrington (2016) for Australian English, in which low-level retraction is observed in /sp, st, sk/ and even /stɹ/. This has been described as the “phonetic pre-conditions” for the more advanced retraction in /stɹ/ that has developed in most varieties of English, suggesting a trajectory of change that sees retraction staring off in all complex onset clusters first before advancing more rapidly in /stɹ/.

In our data, we observe that /st/ is significantly more retracted than pre-vocalic /s/ but that it actually shows very little change over time. Moreover, the minor change we do find is actually in the opposite direction, with younger speakers producing more /s/-like tokens with a higher CoG in /st/ relative to older speakers. This would not be the direction of change we expect if retraction in /st/ represented a later stage of the wider retraction process after spreading to more contexts.

5.3 On variation and change in the wider fricative space

This increase in the CoG of /st/ in apparent time mirrors the same change observed in our data for pre-vocalic /s/, which is also produced with a higher CoG and therefore a “hissier” quality, among younger speakers. Conversely, these same speakers produce a “hushier” /ʃ/ with a small but significant decrease in its CoG in apparent time. Taken together, these results illustrate an apparent expansion of the sibilant space over time leading to a greater acoustic contrast between /s/ and /ʃ/ for younger speakers. Interestingly, a similar result appears in work conducted by Wilbanks (2017) on the variety of American English spoken in Raleigh, NC although there it is restricted to male speakers.

There are two possible interpretations of this finding: (i) that it represents a change in progress involving more distinctive productions of /s/ and /ʃ/ targets and therefore an expansion of the fricative space over time or (ii) that it actually represents the physiological effects of ageing and a reduction in the acoustic /s/–/ʃ/ contrast within speakers’ own productions as they age. There are also two possible pathways by which the latter might occur, both of which find support in previous literature (Matthies et al. 1994; Perkell et al. 2004; Koch & Janse 2015). An articulation-centric explanation might foreground the effects of reduced motor control in old age and by consequence the greater difficulty in producing the precise articulations of /s/ and /ʃ/, which already involve a combination of gestures such as tongue grooving and lip rounding in addition to the midsagittal tongue shape (Rutter 2011). There is an alternative explanation routed through perception and the feedback loop: speakers with hearing loss, which disproportionately affects older adults (Bowl & Dawson 2019), would suffer reduced auditory feedback and may as a result impact their own production of these segments. This would likely be registered most strongly in sibilant production, and particularly /s/ itself, since hearing loss primarily affects the higher frequency range where more of the energy in /s/ is concentrated.

Further work should be conducted in other speech communities to establish how geographically widespread this phenomenon is beyond the disparate locales of Manchester and North Carolina (Wilbanks 2017) and also to shed more light on the exact status of this change. The difficulty in identifying its direction—a community-level increase in sibilant contrast over time or an individual-level decrease as speakers age—is a natural limitation of relying on apparent-time data to infer language change. As such, future work would ideally draw upon a longitudinal or real-time approach in order to tease apart the various temporal dimensions of birth year, age at interview and time of interview (see Fruehwald 2017).

Regardless of the direction of change, the fact that pre-vocalic /s/ and /ʃ/ are not themselves stable highlights the importance of interpreting the degree of change in /s/-retraction with respect to the wider sibilant space itself rather than in isolation.

6 Conclusion and thoughts for future work

In this paper, we have provided evidence of /s/-retraction in Manchester English, marking the first time this variable has been studied in detail within a single British English speech community. In doing so, we identify significant change in apparent time not only in the widely-studied /stɹ/ context but also in /stj/ and /stʃ/ words, all of which now approximate /ʃ/, suggesting that the changes are near completion. These latter two contexts have yet to be tracked in apparent time within a speech community but all three cases of retraction appear to be changing in parallel and are also comparable synchronically in terms of their absolute rates of retraction in the present day. Our results therefore speak to ongoing questions regarding the triggering mechanisms of this process, which focus either on the role of /ɹ/ in a case of long-distance assimilation or on the role of the adjacent affricate. Given this new evidence of parallel change in /stj/ and /stʃ/, in which /s/ appears adjacent to an affricate but in the absence of /ɹ/, we argue that a “retraction by affrication” explanation more accurately captures the relevant segmental environments involved in this process. These results are of course based on a single speech community and so we encourage similar work to be conducted on this full range of contexts in other varieties of English.

While the primary focus of this investigation concerned the triggering mechanisms of retraction, we also report a change in the wider sibilant space that sees a more acoustically-distinct /s/–/ʃ/ contrast among younger speakers. This has been reported independently in other varieties of English but there is not yet a consensus on whether this constitutes change in progress or simply a case of physiologically-motivated age-graded variation with speakers producing less extreme articulations of /s/ and /ʃ/ as they age. Future work is planned to lend further insight into this question and to help tease apart these two opposing vectors of change.

Future work is also needed to provide a more focused and dedicated investigation of variation and change in /t/-affrication in the same community, in the vein of Magloughlin (2018), as well as /tj/-coalescence. This will shed light on how these processes have developed together and the patterns of co-variation they exhibit not just with each other but also with /s/-retraction itself. In addition to this line of inquiry, it would be fruitful to incorporate lab speech to complement the existing sociophonetic studies of /s/-retraction. While conversational data is excellent for sociolinguistic purposes, it would be beneficial to analyse controlled elicitations in order to conduct a more phonetically-detailed analysis including dynamic spectral measures to track retraction across the duration of individual /s/ tokens. Not only would this provide a closer look at the exact phonetic realisation and the coarticulatory nature of this change in relation to adjacent sounds, it would also open up the opportunity to analyse rates of retraction across various morphological, syntactic and prosodic boundaries to better understand how the change may interact with other elements of the grammar.6

On a general note, the results presented here illustrate the importance of tracking /s/-retraction—and any sound change for that matter—in a more holistic way, which in this case involves considering its implementation across a range of segmental contexts and its status within the wider sibilant space.


  1. One exception to this bias towards American English is Ahlers & Meer (2019), a large-scale corpus study of /s/-retraction in /stɹ/ in Trinidadian English which also found retraction among younger speakers. [^]
  2. Note that Prokofieva (2021) claims that /tɹ/-affrication is stable in Canada. However, she only records 18–21 years olds, using a male-led effect in terms of relative advancement as a proxy for stability. This is an unreliable assumption based on the observation that in stable variation, males show more of the non-standard variant than women and should not be deduced backwards. Thus, Canadian affrication may be stable but we cannot tell from the data of 18–21 year olds surveyed in this investigation. [^]
  3. Retraction in /stj/ clusters has previously been reported in studies by Warren (2006) and Nichols & Bailey (2018) but these are based on elicited lab speech and neither investigate potential apparent-time change in these clusters. [^]
  4. There is already some consideration of the coarticulatory effect of vowels in the existing literature on /s/-retraction, with mixed results. Rutter (2011) discusses the possibility that anticipatory lip-rounding from a following rounded vowel can influence the spectral profile of /s/ and trigger a more extreme retraction, which he observes in words such as strudel, although this is not consistent across all speakers. On the other hand, Durian (2007) and Gylfadottir (2015) find no effect of the following vowel and argue that they are too distant to have a coarticulatory effect on the sibilant in these /stɹ/ clusters. [^]
  5. This is not dissimilar to the case of competing systems of /æ/ allophony in Philadelphia English, where quantitative properties of the input children receive can predict the likelihood of allophonic restructuring (Sneller et al. 2019). [^]
  6. Another advantage of looking at further environments is that it will open up analysis of retraction and affrication involving additional segments, e.g. /z, d, dʒ/, which may take part in similar processes to /s, t, tʃ/ but in more limited contexts, such as across word boundaries, e.g. these jars and his drink. [^]


We would like to thank two anonymous Glossa reviewers for their helpful comments on this paper, the audiences at UKLVC12 in London and NWAV48 in Eugene, OR, and finally the Mancunian speakers who gave up their time in the name of this research.

Funding information

The corpus of recordings used for this research was created with financial support from the UK Economic and Social Research Council (ESRC, Grant ES/I009426/1).

Competing interests

The authors have no competing interests to declare.


Ahlers, Wiebke. 2020. Palatization in Austin. A sociophonetic analysis of sibilants. Osnabrück: Universität Osnabrück dissertation.

Ahlers, Wiebke & Meer, Philipp. 2019. Sibilant variation in New Englishes: A comparative sociophonetic study of Trinidadian and American English /s(tr)/-retraction. In Proceedings of Interspeech 2019, 291–5. DOI:  http://doi.org/10.21437/Interspeech.2019-1821

Altendorf, Ulrike. 2003. Estuary English: Leveling at the interface of RP and South-Eastern British English. Tübingen: Gunter Narr.

Baker, Adam & Archangeli, Diana & Mielke, Jeff. 2011. Variability in American English sretraction suggests a solution to the actuation problem. Language Variation and Change 23(3). 347–74. DOI:  http://doi.org/10.1017/S0954394511000135

Baranowski, Maciej. 2017. Class matters: The sociolinguistics of GOOSE and GOAT in Manchester English. Language Variation and Change 29(3). 301–39. DOI:  http://doi.org/10.1017/S0954394517000217

Baranowski, Maciej & Turton, Danielle. 2018. Locating speakers in the socioeconomic hierarchy: Towards the optimal indicators of social class. Talk given at New Ways of Analyzing Variation 47, New York, NY, US, 20 October.

Bass, Michael. 2009. Street or shtreet? Investigating (str-) palatalisation in Colchester English. Estro: Essex Student Research Online 1(1). 10–21.

Bates, Douglas & Mächler, Martin & Bolker, Benjamin M. & Walker, Steven C. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48. DOI:  http://doi.org/10.18637/jss.v067.i01

Bermúdez-Otero, Ricardo. 2015. Amphichronic explanation and the life cycle of phonological processes. In Honeybone, Patrick & Salmons, Joseph C. (eds.), The Oxford handbook of historical phonology, 374–99. Oxford: Oxford University Press.

Bermúdez-Otero, Ricardo & Trousdale, Graeme. 2012. Cycles and continua: On unidirectionality and gradualness in language change. In Nevalainen, Terttu & Traugott, Elizabeth Closs (eds.), The Oxford handbook of the history of English, 691–720. New York: Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780199922765.013.0059

Boersma, Paul & Weenink, David. 2018. Praat: Doing phonetics by computer [computer program]. Version 6.0.37. Retrieved 3 February 2018 from http://www.praat.org/.

Bowl, Michael R. & Dawson, Sally J. 2019. Age-related hearing loss. Cold Spring Harbor Perspectives in Medicine 9(8). 1–14. DOI:  http://doi.org/10.1101/cshperspect.a033217

Braber, Natalie & Flynn, Nicholas. 2016. What’s n(j)ew in the East Midlands? An investigation into yod-dropping. Talk given at the 7th Northern Englishes Workshop, Edinburgh, UK, 14–15 April.

Britain, David & Amos, Jennifer Clare & Spurling, Juliette. 2008. Yod-dropping on the East Anglian periphery. Talk given at the 17th Sociolinguistics Symposium, Amsterdam, Netherlands, 5 April.

Bybee, Joan. 2012. Patterns of lexical diffusion and articulatory motivation for sound change. In Solé, Maria-Josep & Recasens, Daniel (eds.), The initiation of sound change: Perception, production, and social factors, 211–34. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/cilt.323.16byb

Campbell-Kibler, Kathryn. 2011. Intersecting variables and perceived sexual orientation in men. American Speech 86(1). 52–68. DOI:  http://doi.org/10.1215/00031283-1277510

Coretta, Stefano. 2021. speakr: A wrapper for the phonetic software Praat. R package version 3.0.0. URL: https://CRAN.R-project.org/package=speakr.

Cruttenden, Alan. 2014. Gimson’s pronunciation of English. Oxford: Routledge. DOI:  http://doi.org/10.4324/9780203784969

Delattre, Pierre & Freeman, Donald C. 1968. A dialect study of American R’s by X-ray motion picture. Linguistics 6(44). 29–68. DOI:  http://doi.org/10.1515/ling.1968.6.44.29

Durian, David. 2006. Urbanization, social class, and the spread of linguistic variation: (str) in Columbus, OH. Ohio State University, ms.

Durian, David. 2007. Getting [ʃ]tronger every day?: More on urbanization and the sociogeographic diffusion of (str) in Columbus, OH. University of Pennsylvania Working Papers in Linguistics 13(2). 65–79.

Fruehwald, Josef. 2017. Generations, lifespans, and the zeitgeist. Language Variation and Change 29(1). 1–27. DOI:  http://doi.org/10.1017/S0954394517000060

Glain, Olivier. 2014. Introducing contemporary palatalisation. York Papers in Linguistics: Proceedings of PARLAY 2013 1(1). 16–29.

Gordon, Elizabeth & Campbell, Lyle & Hay, Jennifer & Maclagan, Margaret & Sudbury, Andrea & Trudgill, Peter. 2004. New Zealand English: Its origins and evolution. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511486678

Gylfadottir, Duna. 2015. Shtreets of Philadelphia: An acoustic study of /str/-retraction in a naturalistic speech corpus. University of Pennsylvania Working Papers in Linguistics 21(2). 89–97.

Hannisdal, Bente Rebecca. 2006. Variability and change in Received Pronunciation: A study of six phonological variables in the speech of television newsreaders. Bergen: University of Bergen dissertation.

Janda, Richard D. & Joseph, Brian D. 2003. Reconsidering the canons of sound-change: Towards a “Big Bang” theory. In Blake, Barry & Burridge, Kate (eds.), Selected papers from the 15th International Conference on Historical Linguistics, 205–19. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/cilt.237.14jan

Jones, Daniel. 1917. English pronouncing dictionary. London: J. M. Dent & Sons.

Jones, Daniel. 1956. The pronunciation of English. Cambridge: Cambridge University Press.

Jones, Daniel. 2003. English pronouncing dictionary. Cambridge: Cambridge University Press.

Jongman, Allard & Wayland, Ratree & Wong, Serena. 2000. Acoustic characteristics of English fricatives. Journal of the Acoustical Society of America 108(3). 1252–63. DOI:  http://doi.org/10.1121/1.1288413

Kerswill, Paul. 2001. Mobility, meritocracy and dialect levelling: The fading (and phasing) out of Received Pronunciation. In Rajamäe, P. & Vogelberg, K (eds.), British studies in the new millennium: Challenge of the grassroots, 45–58. Tartu: University of Tartu.

King, Hannah & Ferragne, Emmanuel. 2020. Loose lips and tongue tips: The central role of the /r/-typical labial gesture in Anglo-English. Journal of Phonetics 80. 1–19. DOI:  http://doi.org/10.1016/j.wocn.2020.100978

Koch, Xaver & Janse, Esther. 2015. Effects of age and hearing loss on articulatory precision for sibilants. In Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). London: International Phonetic Association. DOI:  http://doi.org/10.1121/1.4968781

Koenig, Laura L. & Shadle, Christine H. & Preston, Jonathan L. & Mooshammer, Christine R. 2013. Towards improved spectral measures of /s/: Results from adolescents. Journal of Speech Language and Hearing Research 56(4). 1175–89. DOI:  http://doi.org/10.1044/1092-4388(2012/12-0038)

Labov, William. 1984. Field methods of the project on language change and variation. In Baugh, John & Scherzer, Joel (eds.), Language in use, 28–53. Englewood Cliffs, NJ: Prentice Hall.

Labov, William. 2001. Principles of linguistic change: Social factors. Oxford: Blackwell.

Ladefoged, Peter & Johnson, Keith. 2014. A course in phonetics. Boston, MA: Cengage.

Lawrence, Wayne P. 2000. /str/ → /ʃtr/: Assimilation at a distance? American Speech 75. 82–7. DOI:  http://doi.org/10.1215/00031283-75-1-82

Levon, Erez. 2014. Categories, stereotypes, and the linguistic perception of sexuality. Language in Society 43(5). 539–66. DOI:  http://doi.org/10.1017/S0047404514000554

Levon, Erez & Maegaard, Marie & Pharao, Nicolai. 2017. Introduction: Tracing the origin of /s/ variation. Linguistics 55(5). 979–92. DOI:  http://doi.org/10.1515/ling-2017-0016

Lindsey, Geoff. 2019. English after RP: Standard British pronunciation today. Cham, Switzerland: Palgrave Macmillan. DOI:  http://doi.org/10.1007/978-3-030-04357-5

Magloughlin, Lyra. 2018. /tɹ/ and /dɹ/ in North American English: Phonologization of a coarticulatory effect. Ottawa: University of Ottawa dissertation.

Magloughlin, Lyra & Wilbanks, Eric. 2016. An apparent time study of (str) retraction and /tɹ/ – /dɹ/ affrication in Raleigh, NC English. Talk given at New Ways of Analyzing Variation 45, Vancouver, BC, Canada, 3–6 November.

Matthies, Melanie L. & Svirsky, Mario A. & Lane, Harlan L. & Perkell, Joseph S. 1994. A preliminary study of the effects of cochlear implants on the production of sibilants. Journal of the Acoustical Society of America 96(3). 1367–73. DOI:  http://doi.org/10.1121/1.410281

Newman, Rochelle S. & Clouse, Sheryl A. & Burnham, Jessica L. 2001. The perceptual consequences of within-talker variability in fricative production. Journal of the Acoustical Society of America 109(3). 1181–96. DOI:  http://doi.org/10.1121/1.1348009

Nichols, Stephen & Bailey, George. 2018. Revealing covert articulation in s-retraction. Talk given at the Annual Meeting of the Linguistics Association of Great Britain, Sheffield, UK, 11–14 September.

O’Neil, Wayne. 2013. The phonology of invented spelling. In Piattelli-Palmarini, Massimo & Berwick, Robert C. (eds.), Rich Languages from Poor Inputs, 220–6. Oxford: Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199590339.003.0015

Perkell, Joseph S. & Matthies, Melanie L. & Tiede, Mark & Lane, Harlan & Zandipour, Majid & Marrone, Nicole & Stockmann, Ellen & Guenther, Frank H. 2004. The distinctness of speakers’ /s/–/∫/ contrast is related to their auditory discrimination and use of an articulatory saturation effect. Journal of Speech, Language, and Hearing Research 47(6). 1259–69. DOI:  http://doi.org/10.1044/1092-4388(2004/095)

Phillips, Betty S. 2001. Lexical diffusion, lexical frequency, and lexical analysis. In Bybee, Joan L. & Hopper, Paul (eds.), Frequency and the emergence of linguistic structure, 123–36. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/tsl.45.07phi

Phillips, Jacob B. & Resnick, Paige. 2018. Listeners’ social attributes influence sensitivity to coarticulation in the perception of sibilants in nonce words. Talk given at New Ways of Analyzing Variation 47, New York, NY, US, 18–21 October.

Phillips, Jacob B. & Resnick, Paige. 2019. Listeners’ social attributes influence sensitivity to coarticulation in the perception of sibilants in nonce words. Journal of the Acoustical Society of America 145. EL574. DOI:  http://doi.org/10.1121/1.5113566

Podesva, Robert J. & Van Hofwegen, Janneke. 2014. How conservatism and normative gender constrain variation in Inland California: The case of /s/. University of Pennsylvania Working Papers in Linguistics 20(2). 129–37.

Prokofieva, Anna. 2021. Social factors affecting /tr/ affrication. Poster presented at Arts Undergraduate Research Event, McGill University, Montreal, QC, Canada, 1 February.

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Version 3.5.1. Vienna: R Foundation for Statistical Computing. URL: http://www.R-project.org/.

Ramsaran, Susan. 1990. RP: Fact and fiction. In Ramsaran, Susan (ed.), Studies in the pronunciation of English, 178–90. London: Routledge.

Rosenfelder, Ingrid & Fruehwald, Josef & Evanini, Keelan & Seyfarth, Scott & Gorman, Kyle & Prichard, Hilary & Yuan, Jiahong. 2014. FAVE (Forced Alignment and Vowel Extraction) Program Suite. https://doi.org/10.5281/zenodo.22281. Version 1.2.2.

Rutter, Ben. 2011. Acoustic analysis of a sound change in progress: The consonant cluster /stɹ/ in English. Journal of the International Phonetic Association 41(1). 27–40. DOI:  http://doi.org/10.1017/S0025100310000307

Schwartz, Geoffrey. 2021. All TRs are not created equal: L1 and L2 perception of English cluster affrication. Unpublished manuscript, Uniwersytet im. Adama Mickiewicza w Poznaniu.

Shapiro, Michael. 1995. A case of distant assimilation: /str/ → /ʃtr/. American Speech 70. 101–7. DOI:  http://doi.org/10.2307/455876

Smith, Bridget J. 2013. The interaction of speech perception and production in laboratory sound change. Columbus, OH: Ohio State University dissertation.

Smith, Bridget J. & Mielke, Jeff & Magloughlin, Lyra & Wilbanks, Eric. 2019. Sound change and coarticulatory variability involving English /ɹ/. Glossa: A Journal of General Linguistics 4(1). 63. DOI:  http://doi.org/10.5334/gjgl.650

Sneller, Betsy & Fruehwald, Josef & Yang, Charles. 2019. Using the Tolerance Principle to predict phonological change. Language Variation and Change 31. 1–20. DOI:  http://doi.org/10.1017/S0954394519000061

Sollgan, Laura. 2013. STR-palatalisation in Edinburgh accent: A sociophonetic study of a sound change in progress.

Stevens, Mary & Harrington, Jonathan. 2016. The phonetic origins of s-retraction: Acoustic and perceptual evidence from Australian English. Journal of Phonetics 58. 118–34. DOI:  http://doi.org/10.1016/j.wocn.2016.08.003

Stuart-Smith, Jane. 2007. Empirical evidence for gendered speech production: /s/ in Glaswegian. In Cole, Jennifer & Hualde, José Ignacio (eds.), Laboratory Phonology 9, 65–86. Berlin: Mouton de Gruyter.

Stuart-Smith, Jane. 2020. Changing perspectives on /s/ and gender over time in Glasgow. Linguistics Vanguard 6(1). 1–13. DOI:  http://doi.org/10.1515/lingvan-2018-0064

Stuart-Smith, Jane & Sonderegger, Morgan & Macdonald, Rachel & Mielke, Jeff & McAuliffe, Michael & Thomas, Erik. 2019. Large-scale acoustic analysis of dialectal and social factors in English /s/-retraction. In Calhoun, Sasha & Escudero, Paola & Tabain, Marija & Warren, Paul (eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019. Canberra, Australia: Australasian Speech Science and Technology Association Inc.

Tagliamonte, Sali. 2006. Analysing sociolinguistic variation. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511801624

Twist, Alina & Baker, Adam & Mielke, Jeff & Archangeli, Diana. 2007. Are “covert” /ɹ/ allophones really indistinguishable? University of Pennsylvania Working Papers in Linguistics 13(2). 207–16.

Upton, Clive. 2008. Received Pronunciation. In Kortmann, Bernd & Schneider, Edgar W. (eds.), A handbook of varieties of English, Vol 1: Phonology, 217–30. Berlin: Mouton de Gruyter.

Upton, Clive & Kretzschmar, W. A. & Konopka, R. 2001. Oxford dictionary of pronunciation for current English. Oxford: Oxford University Press.

van Heuven, Walter J. B. & Mandera, Pawel & Keuleers, Emmanuel & Brysbaert, Marc. 2014. SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology 67(6). 1176–90. DOI:  http://doi.org/10.1080/17470218.2013.850521

Warren, Paul. 2002. NZSED: Building and using a speech database for New Zealand English. New Zealand English Journal 16. 53–58.

Warren, Paul. 2006. /s/-retraction, /t/-deletion and regional variation in New Zealand English /str/ and /stj/ clusters. In Proceedings of the 11th Australian International Conference on Speech Science & Technology, 466–71.

Wells, John C. 1997. Whatever happened to Received Pronunciation? In Medina Casado, Carmelo & Soto Palomo, Concepción (eds.), II Jornadas de Estudios Ingleses, 19–28. Jaén: Universidad de Jaén.

Wells, John C. 2000. Longman pronunciation dictionary. London: Longman.

Wells, John C. 2011. How do we pronounce train? John Wells’ Phonetic Blog. Posted 22 March. URL: http://phonetic-blog.blogspot.com/2011/03/how-do-wepronounce-train.html. Last accessed 30 June 2021.

Wilbanks, Eric. 2016. “SHtriking” change in Raleigh’s speech: Acoustic analysis of (str) retraction. Poster presentated at the 11th Annual North Carolina State Graduate Research Symposium, Raleigh, NC, United States, 23 March.

Wilbanks, Eric. 2017. Social and structural constraints on a phonetically-motivated change in progress: (str) retraction in Raleigh, NC. University of Pennsylvania Working Papers in Linguistics 23(1). 301–10.