Expressive Sibilant Retraction (ESR) in North Norwegian changes an /s/ to the corresponding postalveolar fricative [ʂ] in certain pragmatic contexts. For example, the verb skubbe, ‘to shove’, may alternate as shown in (1). In Supplementary file 1, I provide for each North Norwegian example the nearest equivalent in bokmål, the most widely used standard written variety of Norwegian.
Whereas [skʉba] in (1a) is neutral, the realization [ʂkʉba] in (1b), with the postalveolar fricative, dramatizes the action as involving significant acceleration or force.1 The two examples are otherwise structurally identical.
A similar expressive process has been described for the Urban East Norwegian (UEN) variety in the earlier literature by Larsen (1907) and Broch (1927). Exactly as these researchers describe for UEN, ESR in North Norwegian may only apply where the phonological contrast between /s/ and /ʂ/ is otherwise neutralized. In word-initial position before a vowel, where the two sounds contrast, ESR cannot apply. Thus, marking infelicitous applications of ESR with an exclamation mark <!>, /saːɡa/, ‘sawed’ cannot be realized ! [ʂaːɡa] (with the intended meaning ‘sawed forcefully’). An adequate account of ESR must first explain the connection between these phonological restrictions on its application (form) and its expressive function. Second, it must provide an account of what ESR expresses, and how. For example, it is clear that ESR in North Norwegian is not an all-purpose intensive. Thus, (2b) is not an acceptable intensive version of (2a).
Neither Larsen nor Broch provide an account of the interpretations of ESR for UEN beyond simply listing certain emotional nuances, which in the case of (Broch 1927: 154) include “contempt, anger, a feeling of power, boldness or admiration, a hint of intimacy, degrees of emphasis”.2 The present study shows that the interpretations of ESR are in fact constrained, and vary depending on whether they attach to actions/events, objects or states/properties. The pattern of variation allows us to crystallize a core meaning and an account of the form-function relation.
An immediate question that ESR raises is whether it is part of the grammar or not. On the face of it, ESR resembles a morphological process that introduces, or perhaps reintroduces, a contrast between /s/ and /ʂ/ in word-initial position before a consonant. This account would make ESR the source of a marginal contrast, since there is no lexical contrast between these sibilants in this environment. However, I shall argue that ESR does not belong to the grammar (morphology or phonology), but is instead part of the communication system, specifically, a multimodal post-phonological component that integrates speech with conversational gestures, including manual, facial and spoken gestures. The current account of ESR may thus cast light on expressive phenomena in general, suggesting a way of dealing with at least one type of apparent marginal contrast.
Although found in UEN and other varieties throughout Norway, this article will draw on examples from North Norwegian. I offer a few brief comments here to place North Norwegian in its wider sociolinguistic setting. UEN is the dominant spoken variety of Norwegian, the educated varieties of which are described by Kristoffersen (2000). Although centered on Oslo, UEN is widely spoken in population centers throughout Norway and many towns have significant clusters of UEN speakers. The social dominance of UEN is such that children of UEN-speaking parents throughout Norway frequently target, acquire and grow up using UEN regardless of the ambient variety, despite the comparatively high traditional regard that regional dialects in Norway enjoy. Most regional urban centers have developed varieties with pronounced regional features, but which are increasingly leveled towards UEN in terms of lexis and idiom, morphology and segmental phonology.
The organization of the rest of this paper is as follows. Section 2 lays out the phonological background, the lexical and phonological sources of postalveolar consonants in North Norwegian, and an analysis framed in Optimality Theory (OT). Section 3 describes the problems of a morphological analysis of ESR and lays the foundations for an account of ESR as a spoken gesture. Section 4 provides an account of the different interpretations of ESR depending on whether it indexes an action or event, object, or state/property. Section 5 attempts to trace the relationships between these interpretations and identify a semantic core for ESR. Finally, section 6 concludes.
There are a number of descriptions of North Norwegian, including dialect surveys (Elstad 1982; Jahr & Skare 1996), grammatical descriptions of specific varieties (Iversen 1918; Christiansen 1933; Brekke 2000), sociolinguistic work (Nesse 2008), and vocabularies (Dragøy 2001). For information on Norwegian dialects in general, I refer the reader to the works by Christiansen (1946–1948), Haugen (1976), Sandøy (1996), Papazian & Helleland (2005), and Skjekkeland (2005).
The table in (3) shows a representative consonant inventory for North Norwegian.
|(3)||North Norwegian consonant inventory|
Although phonetic realization and lexical distribution vary, the North Norwegian vowel inventory in (4) is, systemically speaking, little different to UEN. The vowel transcribed here as /a/ is a low, open, front vowel, generally rendered with the symbol <æ> in works dealing with Norwegian. The North Norwegian vowel, however, is closer to cardinal vowel 4 (Jones 1967), warranting a departure from this tradition.
|(4)||North Norwegian vowel inventory|
|i, iː||y, yː||ʉ, ʉː||u, uː|
|a, aː||ɑ, ɑː|
Norwegian has a contrast between three coronal fricatives, alveolar /s/, postalveolar /ʂ/ (inaccurately dubbed ‘retroflex’), and a ‘palatal’ fricative, transcribed here as /ɕ/.3 Minimal triplets are shown in (5).
|(5)||Sibilant contrasts in North Norwegian|
While there is agreement that /s/ is lamino-alveolar, Kristoffersen (2000: 23) writes regarding UEN that “the precise articulatory properties of [ʂ] […] are somewhat unclear”. Historically, the postalveolar fricative /ʂ/ derives from two sources: s in a palatal environment, and the cluster rs, also a synchronic source of [ʂ] when r+s results from combining morphemes. The former includes sk before a front vowel (e.g. ski /ʂiː/, ‘ski’), and the clusters sj (e.g. sjø /ʂøː/, ‘sea’), and skj (e.g. skjå /ʂoː/, ‘shed’). Kristoffersen then raises the question whether speakers in fact distinguish the postalveolar fricatives that derive from these two different sources, for example as [ʃ], from s+palatal, and [ʂ], from rs. Although Larsen (1907) claims the existence of a distinction, which was still made sixty years later by older speakers according to Sivertsen (1967: 79), present-day UEN and North Norwegian would appear to have merged the two. Thus, for Endresen (1985: 77; 1991: 54), there is only one postalveolar fricative, which he describes as having apico-postalveolar place of articulation and tongue grooving.
Although traditionally designated ‘retroflex’, there is in general no curling of the tongue tip upwards in the articulation of this sound. The apico-postalveolar constriction is instead achieved by withdrawing the tongue tip into the front of the body of the tongue, which results in anterior bunching and additional narrowing between the lamina and the palato-alveolar region (see Laver 1994: 141). The post-alveolar fricative is also enhanced by lip protrusion, as is the case for /ʃ/ in English and German (cf. Stevens & Keyser 1989).
Now we turn to the phonological distribution of /s/ and /ʂ/. Word-initially before a consonant the distinction between /s/ and /ʂ/ is neutralized (Sibilant Place Neutralization). In general, only lamino-alveolar /s/ is permitted in this environment, as shown in (6).
|[skuːɾ]||skor||‘support, lean (IMP)’ (dial.)|
Before the nonpalatal lateral,4 however, which in the present variety of North Norwegian is realized across the board as postalveolar [ɭ], the contrast is neutralized to /ʂ/. Prelateral Sibilant Retraction is illustrated in (7).
|(7)||Prelateral Sibilant Retraction|
|[ʂɭap]||slapp||‘lacking in energy’|
An OT analysis of these facts is straightforward (for introductions to this framework, see Prince & Smolensky 2004; McCarthy 2008). Since there is in general a distinction between alveolar and postalveolar consonants, the faithfulness constraint that requires preservation of the underlying contrast between them, IDENT[postalveolar] in (8), must outrank the markedness constraint that penalizes postalveolar consonants in the output, *[postalveolar] in (9).
|Let input segments = i1, i2, i3, …, im and output segments = o1, o2, o3, …, on.|
|Assign one violation mark for every pair (ix, oy), where|
|ix is in correspondence with oy, and|
|ix and oy have different specifications for [postalveolar].|
|Assign one violation mark for every segment with the specification [postalveolar].|
The tableaux in (10) and (11) show how this works for underlying alveolar and postalveolar sibilants in pre-vocalic position, where the contrast is preserved.
To deal with the neutralization pattern in (6), there must be some other markedness constraint *[σʂC, shown in (12), that dominates IDENT[postalveolar].
|Assign one violation mark for every segment S, where (a) S is [ʂ], (b) S is in syllable onset position, and (c) S is followed by a consonant.|
Tableaux for alveolar and postalveolar sibilants in this position are shown in (13) and (14). Both /sC-/ and /ʂC-/ in the input are mapped to [sC-] in the output, the input ʂC-cluster unfaithfully so.
In order to account for the neutralization pattern in (7), the specific markedness constraint *[σsɭ given in (15) must dominate the more general constraint *[σʂC, as shown in the tableaux in (16) and (17).
|Assign one violation mark for every segment S, where (a) S is [s], (b) S is in syllable onset position, and (c) S is followed by the lateral [ɭ].|
As (16) and (17) show, both /s/ and /ʂ/ are mapped by the grammar onto [ʂ] preceding the lateral.
Another source of postalveolar [ʂ] is the application of the Retroflex Rule (Kristoffersen 2000: 87ff.; Stausland Johnsen 2012), a coalescence process found throughout most of Central Scandinavia, an area that includes the East and, for certain shared innovations, the North of Norway. Alveolar consonants /t d n s (l)/ coalesce with a preceding /ɾ/ to give the corresponding postalveolars [ʈ ɖ ɳ ʂ (ɭ)]. In the absence of a following alveolar consonant, an underlying /ɾ/ surfaces faithfully, as shown in the examples in (18).
|(18)||/ɡaːte+a/||gata||/deɲ daːɾ ɡaːta/||den der gata|
|[ɡaːta]||‘the street’||[dɛɲ daːɾ ɡaːta]||‘that street there’|
|/buːɾ+e/||bordet||/de daːɾ buːɾe/||det der bordet|
|[buːɾɛ]||‘the table’||[dɛ daːɾ buːɾɛ]||‘that table there’|
|/ʉːɾ+e/||uret||/de daːɾ ʉːɾe/||det der uret|
|[ʔʉːɾɛ]||‘the watch’||[dɛ daːɾ ʔʉːɾɛ]||‘that watch there’|
The coalescence pattern in North Norwegian is illustrated in (19). In the first example, the /s/ stands in prevocalic position, where it contrasts with /ʂ/.
|(19)||/saːɡ+a/||saga||/deɲ daːɾ saːɡ+a/||den der saga|
|[saːɡa]||‘the saw’||[dɛɲ daː ʂaːɡa]||‘that saw there’|
|/taɲ+a/||tanna||/deɲ daːɾ taɲ+a/||den der tanna|
|[ʈaɲa]||‘the tooth’||[dɛɲ daː ʈaɲa]||‘that tooth there’|
|/dœɾ+a/||døra||/deɲ daːɾ dœɾ+a/||den der døra|
|[dœɾa]||‘the door’||[dɛɲ daː ɖœɾa]||‘that door there’|
|/noːɭ+a/||nåla||/deɲ daːɾ noːɭ+a/||den der nåla|
|[noːɭa]||‘the needle’||[dɛɲ daː ɳoːɭa]||‘that needle there’|
The Retroflex Rule also applies where the /s/ is the first member of a cluster, as shown in (20).
|(20)||/spuːɾ+e/||sporet||/de daːɾ spuːɾe/||det der sporet|
|[spuːɾɛ]||‘the track’||[dɛ daː ʂpuːɾɛ]||‘that track there’|
|/styːɾ+e/||styret||/de daːɾ styːɾe/||det der styret|
|[styːɾɛ]||‘the palava’||[dɛ daː ʂtyːɾɛ]||‘that palava there’|
|/skʉːɾ+e/||skuret||/de daːɾ skʉːɾe/||det der skuret|
|[skʉːɾɛ]||‘the shed’||[dɛ daː ʂkʉːɾɛ]||‘that shed there’|
|/smœɾ+e/||smøret||/de daːɾ smœɾe/||det der smøret|
|[smœɾɛ]||‘the butter’||[dɛ daː ʂmœɾɛ]||‘that butter there’|
|/skɾoːɡ+e/||skroget||/de daːɾ skɾoːɡe/||det der skroget|
|[skɾoːɡɛ]||‘the hull’||[dɛ daː ʂkɾoːɡɛ]||‘that hull there’|
Coalescence also applies preceding /ʂ/, as shown in (21). The process is not vacuous, since the /ɾ/ fails to surface.
|(21)||/ʂœyte+a/||skøyta||/deɲ daːɾ ʂœyta/||den der skøyta|
|[ʂœyta]||‘the small boat’||[dɛɲ daː ʂœyta]||‘that small boat there’|
In sum, the postalveolar fricative [ʂ] has both lexical and phonological sources in North Norwegian. There is a contrast between /s/ and /ʂ/ in prevocalic position, but in syllable onset position before a consonant, this distinction is neutralized, in general to [s] (Sibilant Place Neutralization), but to [ʂ] before a lateral (Prelateral Sibilant Retraction). Postalveolar [ʂ] may also result from coalescence with a preceding /ɾ/ (Retroflex Rule).
The third source of [ʂ], Expressive Sibilant Retraction (ESR), is neither lexical nor phonological. This leaves two possibilities: ESR is either a morphological process or a communicative spoken gesture.
ESR may index actions/events, objects and states/properties. One of each is illustrated in examples (22) to (24), with the indexed word shown in bold.
We might preliminarily gloss the meaning of ESR as ‘intensive’, but this is very misleading, since it suggests rather freer distribution than we in fact find. An ‘intensive’ gloss gives the impression that ESR is a type of modifier, when its essential nature is performative.
In this section I seek to establish the gestural nature of ESR against the alternative view that it is an expressive morphological process. On the gestural interpretation, ESR is a communicative phenomenon that recruits phonetic resources in such a way as to appear to reverse or violate phonological rules. On the second interpretation, ESR interacts with the phonological grammar, overriding phonological neutralization rules. My argument for a gestural account leads into a discussion of the relation between the form and function of ESR and an attempt to identify a core ‘meaning’ underlying the patterned variation of its interpretations. The morphological account may stipulate this variation, but it is not equipped to explain it.
If ESR was a morphological process, it would entail that certain “expressive” morphemes could be spelled out after all phonological processes had applied, or that morphological factors could override phonological ones, introducing new contrasts. Neither of these can be squared with modular approaches. Situating ESR in a post-linguistic communicative component avoids this problem.
One case analysed as an example of override is the Javanese ‘elative’, first brought to the attention of generative phonologists by Benua (1999). The Javanese data in fact provide an interesting contrast, since they appear to involve a partial reversal of a pattern of complementary distribution (allophony), as opposed to neutralization as in the Norwegian case. As described by Benua, high vowels in Javanese are tense in open syllables but lax in closed syllables. Formation of the elative, however, involves tensing the final vowel of the stem regardless whether the final syllable is open or closed, in violation of the canonical allophonic pattern. Benua interprets the tensing of vowels in closed syllables in the elative as a case of ‘morphological override’: the noncanonical pattern surfaces under compulsion from highly ranked constraints requiring that the elative morpheme be realized (MORPHREAL). In her parallelist Optimality-theoretic framework, morphological structure is entirely transparent to phonology. An analysis of ESR along similar lines might postulate a floating [postalveolar] feature as the exponent of the hypothetical expressive morpheme. Applying the same logic, MORPHREAL would then have to outrank *[σʂC from (12) in order to produce the desired result, as shown in the tableau in (25).
We would still have to provide an account of why ESR only ever applies in a sC-cluster, and not just to any word that begins with /s/. This can of course simply be stipulated, for example, by invoking phonologically conditioned suppletive allomorphy (Paster 2006; Bye 2007). On such an account, the floating [postalveolar] featural affix would compete for insertion with a zero allomorph whenever the stem did not begin with a sC-cluster. However, this misses the obvious generalization that ESR specifically alters the output of the Sibilant Place Neutralization rule, while leaving the lexical contrast intact. Yet there is no way to allow the grammar to reflect this. Reanalysing ESR as a communicative rather than a morphological or phonological phenomenon, on the other hand, allows us to see the phonological distribution as part of ESR’s design, rather than as an idiosyncratic restriction.
The morphological analysis raises other problems of a more general architectural kind. In particular, it does not square with the idea that phonology is ordered following spell-out (e.g. Bye & Svenonius 2012) or that, beyond this, phonology only has limited access to morphosyntactic information (e.g. Selkirk 2011). One way of squaring cases like the Javanese elative with a modular approach is to look for evidence that the contrast is already at least marginally present in the lexicon, rather than derived, so that morphological processes remain structure-preserving. Thus, Bye (2013: 51) argues that, despite the overwhelming restriction of closed syllable tensed vowels to morphologically derived elative forms, there is sufficient leakage of the putatively derived contrast into morphologically simple words. It is possible to make a similar case for ESR as well, although not a strong one. For example, if Larsen (1907: 74) is correct, ESR began as a deviant realization of a single lexical item, or a small group of lexical items, before it was adopted as a general process.5
Colloquially, s in the adjective ṣvǣr (or švǣr?) [‘big’] has a similar, although perhaps somewhat different sound, which has a particular psychological motivation: the adjective stor [‘big’] is beginning to ring very flat in this dialect, as in several other eastern dialects, and svær has been chosen as a substitute. In order to depict the size one literally fills one’s mouth, starting the word with one or other sch-sound; likewise now and then with the word svīn [‘swine’], in part also in snē [‘snow’] and stygg [‘ugly’].
The unmarked realization of svær, ‘huge’, is [ʂʋaːɾ] in the present variety as well. Two of the other words mentioned by Larsen in the quoted passage, stygg ‘ugly’ and sne~snø ‘snow’, evince allomorphy in North Norwegian with ESR, suggesting that these forms are stored along with the postalveolar fricative. In the case of the adjective stygg [styɡ] ‘ugly’, the ESR form in North Norwegian optionally has a central rounded or front mid rounded vowel instead, giving [ʂʈʉɡ] or [ʂʈœɡ], ‘(offensively) ugly’. A similar example is snø, ‘snow’, which is /snœː/ in neutral contexts, but may be encountered as /ʂɳyː/ as in (26), when it has the sense of being an inconvenience, because it impedes movement, requires effortful removal, and so on.
In all of these putatively lexicalized cases, the characteristic meaning that ESR introduces is also present, if perhaps not as strongly. This makes the evidence for a marginal lexical contrast weak.
An alternative way to understand ESR, which I argue for here, is that it represents a ‘spoken gesture’, a term introduced by Okrent (2002).6 An example would be the iconic use of speech rate to express temporal or spatial extension (Okrent 2002; Feist 2013; Perlman, Clark & Johansson Falck 2015), as shown in (27).
The study of spoken gesture is still recent (see e.g. Perlman, Clark, & Johansson Falck 2015), although it emerges out of the more established traditions of paralinguistics (e.g. Trager 1958; Poyatos 1993; 2002) and conversational gesture (Kendon 2004). Most research on conversational gesture has focused on manual and, more recently, facial gestures as ‘co-speech’ acts whose timing depends on events in speech, and which contribute to the meaning of the utterance as a whole. Since they recruit the same vocal channel as spoken linguistic units, spoken gestures cannot strictly be described as ‘co-speech’ (cf. Okrent 2002: 188). They may nonetheless be placed with respect to the same techniques of representation and pragmatic functions as manual and facial gestures. Techniques of representation will be dealt with in this section; we return to the pragmatic functions of gestures in Section 4.
Kendon (2004) distinguishes between three techniques of representation: modelling, enactment and depiction. Modelling and depiction have in common that they describe objects, while enactment describes actions. Modelling involves the use of a body part to suggest an object’s shape, for example, making a ‘mouth’ or ‘beak’ with the hand, while depiction entails moving some part of the body, generally the hands, to draw an object. An example would be using both index fingers to trace the outline of a box. In the case of enactment, “the gesturing body parts engage in a pattern of action that has features in common with some actual pattern of action that is being referred to” (p. 160). An example of enactment would be using the flat hand in a chopping motion. The same gesture may be enactive or depictive depending on context. Thus, making a spiral motion with the index finger may depict an object with that shape, or something moving along a spiral path. I argue that enactment seems to provide the best account of the semantics of ESR.
Ekman (1997) was the first to point out that facial gestures in conversation serve a communicative function, rather than being directly expressive of emotions. For Kendon (2004: 310), these include “eyebrow movements or positioning, movements of the mouth, head postures and sustainments and changes in gaze direction”. Discussing manual and facial gestures, Bavelas, Gerwing & Healing (2014) note that the techniques of representation each employ are typically different. Whereas hand gestures may be used for modelling or depiction, facial gestures generally enact emotional responses, either the speaker’s or those of some other individual. Spoken gestures do not obviously allow for modelling as a technique of representation. The iconic use of speech rate mentioned above is at first glance ambiguous between depictive and enactive, since it may be understood as referring to a long object, or enacting something that takes a long time. However, lowering speech rate to ‘depict’ the length of a long object is probably better understood as a metaphor: the gesture enacts the physical experience of tracing the object along its length.
In any case, the ability to recognize spoken gestures as representing something other than speech depends on access to evidence for the relation between the acoustic signal and the articulatory gestures involved in producing it. In some cases, this evidence may be present visually. More generally, hearers may rely on their phonetic map of the links between auditory experience and proprioceptive feedback from articulation (e.g. Rummer et al. 2014).7 For example, pitch excursion (up or down) may be used to indicate vertical movement (up or down). The technique involved is enactment, since raising pitch is achieved partly by elevating the larynx, which increases the tension of the vocal folds, while lowering pitch is accompanied by depressing the larynx, which causes greater slackness (Ohala 1978). If laryngeal position can be inferred from pitch by accessing memories in which proprioceptive memories of larynx articulation are cross-modally linked with their acoustic effects, pitch excursion should be available to enact vertical movement, even in the absence of visual evidence. ESR also enacts something which the listener is able to infer from their own experience of operating their tongue as articulator. This semantic core will be discussed in Section 5. Before we get to that point, though, we must consider what interpretations ESR has in context. This is the subject of the next section.
Kendon (2004; 2017) distinguishes five pragmatic functions of conversational gestures, two of which are relevant here: referential and modal. A referential gesture bears on the proposition expressed in the utterance, while a modal gesture enacts a response: generally, but not necessarily, that of the speaker. An example of the latter would be the facial shrug (Bavelas, et al. 2014), which signals personal disengagement (Debras 2017). The function of ESR is, I claim, essentially modal, although something close to referential interpretations may arise in context.
In the most general sense, ESR serves to amplify the performance of an utterance. It generally signals heightened engagement or vehemence, with one exception described in Section 4.4, where we discuss ESR as a gestural marker of ‘reflective distance’. Within the ‘heightened engagement’ context it is possible to distinguish two more specific meanings, one associated with actions or events, the other with objects. ESR indexes actions as carried out with accelerated movement or impressive force, and objects as obtrusive. In quasi-referential terms, these meanings resemble, respectively, adverbial and adjectival modifiers. However, this is not quite right, and is perhaps fundamentally wrong. The essentially modal nature of ESR consists in that it is a performance of acceleration/force or obtrusive presence, not an intensive modifier. Actions/events are dealt with in Section 4.1, and objects in 4.2. Section 4.3 deals with states and properties.
Because of its use in expressive contexts, ESR is difficult to elicit in a controlled way. The examples presented draw on observations made in over twenty years of living and working as a linguist in North Norway and represent a combination of spontaneously heard utterances, examples elicited during interviews with native speakers, and test utterances devised by me in order to test the limits of its use. All examples have been checked with native speakers of North Norwegian in my circle.
ESR indexes actions to give the impression of accelerated movement or forcefulness. The examples in (28) convey, even dramatize, a build-up of energy prior to the action and, by implication, the speed, force or vigor of its execution. In addition to heightened engagement, the valuation communicated is generally that the action is impressive in some way, but not necessarily either positive or negative.
The examples in (28) all illustrate verbs of contact by impact (see, for example, Levin 1993). Since ESR conveys the impression of accelerated movement, these examples also invite the inference of forceful impact.
Contact by impact need not be part of the semantics of the verb in order to generate this inference. The verbs shown in (29), for example, do not belong to this class. ESR is nevertheless felicitous here.
ESR is also possible with verbs of the ‘break’ class (see Fillmore 1970), if not as readily. The implication is nevertheless again that accelerated movement is involved in producing the result, increasing the impact.
‘Break’ verbs encode the result, but not the manner, since glass may be smashed by an ultrasonic device, and logs split with a laser. Native speakers rejected my attempts to elicit equivalent examples to (30) in which the instrument was spelled out as one of these. It seems to be a strong implicature that contact by impact is involved, which is presumably the reason the utterances in (30) are acceptable.
Explicit representation of the agent is not necessary for the felicity of ESR, since it is also possible with the passive construction. The examples in (31) convey the same accelerative meaning despite the fact that they show suppression of the agent.
Where no acceleration is encoded or implied, the use of ESR is infelicitous. Consider the pair of examples in (32), where the first is infelicitous with ESR.
Example (32a) does not say what initiated the ball’s movement towards the wall, and so ESR is not acceptable. Example (32b), on the other hand is fine, since the accelerated movement is self-initiated.
The felicity of ESR is nonetheless not strictly constrained by considerations of animacy or volition. The examples in (33) are deemed felicitous, since they convey a build-up of internal forces before an accelerated release phase.
In (34), ESR may also dramatize an action as an intrusion or penetration involving forceful initiation.
In at least a few cases, acceleration may not be a strong implicature, at least objectively speaking. It is initially surprising to find examples such as those in (35), involving ‘bumping’ and ‘grazing’, judged as felicitous with ESR. As in (28), these are contact-by-impact verbs, but they are likely to imply that the contact was accidental or unintended. ESR dramatizes these eccentric movements as somehow brought about or impelled to go off course, whether on purpose or not.
Where acceleration cannot be a strong implicature of the verb, ESR is infelicitous. For example, while contact by impact is a strong implicature of ‘smash’ or ‘split’, it is by no means a strong implicature of ‘damage’, since this may take a wider variety of forms. This explains the infelicity, with ESR, of the examples in (36), which illustrate result verbs in ‘intensive’ contexts. Since these were not possible to elicit with ESR, the examples are given in written bokmål form.
Manner verbs do not in general undergo ESR unless the context is an inchoative or momentary one, making acceleration a natural strong implicature. Thus, (37a) cannot be used with ESR to mean something like ‘He paraded around in an expansive manner’.
Compare the inchoative or momentary context in (38), where ESR with the same verbs becomes felicitous.
The examples in (39) show that iterative semantics are also compatible with ESR, where the verb encodes more than one cycle of accelerated movement.
In the following cases in (40), ESR fails to enact an accelerated movement, and so is infelicitous. The actions designated are presumably of insufficient magnitude for an acceleration phase to have salience.
Accelerated movement is likewise not a natural implicature of the examples in (41), making ESR infelicitous.
As can be expected, function words also resist ESR. This is shown in (42).9
In sum, ESR does not result in a derived verb with a general intensive meaning. ESR indexes actions and events as having accelerated movement. For verbs of contact there is a further implicature of forceful impact.
ESR is also conventionally present in the verb steike ‘roast’ in the common oath [daːʋɛn hɑn ʂʈɛikɛ] Dæven han steike! ‘The Devil, (may) he roast!’ However, this usage does not appear to generalize to other contexts.
As a final point, verbs relating to suddenly perceived offensive smells may be realized with ESR. Examples are shown in (43).
The felicity of these examples can be understood with reference to the finding by Digonnet (2018) that the experience of an obtrusive smell is commonly understood in terms of a conceptual metaphor of invasion, cf. the examples in (34).
ESR indexes an object as especially noticeable, usually unwelcomely so. The object may be inconvenient, in the way, and liable to induce aversion, rejection, occasionally awe. Consider the examples in (44).
Again, ESR does not have a general intensive or pejorative meaning with nouns. If aversion to, or rejection of, an obtrusive object is not a plausible interpretation of the speaker’s stance, ESR is infelicitous. The examples in (45) resist ESR for this reason.
ESR is frequent with compound adjectives whose first component independently has intensive force. Examples are given in (46). Here, ESR enacts increased engagement (arousal, surprise) or commitment of the speaker. The accelerative and aversive interpretations are absent in these cases. The implied valuation may be positive or negative.
ESR with simplex adjectives seems to have the same force, as shown in (47).
Finally, I will illustrate uses of ESR which seem to lack the meaning of heightened engagement. The basic meaning in such cases seems to be a reflexive gesture to the speaker him/herself, such that attention is diverted away from the speaker’s positive evaluation of some state of affairs. This introduces a note of reflection, distance, self-consciousness, sometimes irony or even “heteroglossia”. Since ESR is a performance, it allows for the inference that it is an enactment of an utterance by someone else. Consider the examples in (48).
Examples (48a) to (48c) might be taken to be self-conscious compliments that soften the impression of any emotional involvement, while (48d) could be intended ironically.
In their accounts of ESR in UEN, both Larsen (1907) and Broch (1927) attempt to explain the link between ESR’s distribution and its expressive function. Larsen was the first to propose that ESR may be related in some way to the non-expressive, phonologically regular retraction of /sl/ to [ʂɭ] (Prelateral Sibilant Retraction) illustrated in (7) above, which is also characteristic of this variety (see Haugen 1942; Jahr 1985; and Kristoffersen 2000: 102ff. for recent discussion). Broch follows him in this assessment, but the two differ in how they see ESR acquiring its meaning. While Larsen sees a role for sound symbolism or articulatory feedback, Broch (1927: 155f.) favors a social constructionist account. Broch proposes that ESR be understood as the generalization, to sC-clusters, of Prelateral Sibilant Retraction, which was a salient group marker of the Oslo working class at the time he wrote. Broch claims that it is this, rather than anything sound symbolic, that is exploited in the expressive extension of the postalveolar fricative to other word-initial preconsonantal environments. The North Norwegian facts cast doubt on both these accounts, since they indicate that the use of ESR is highly constrained even when phonological factors are taken into account. Broch’s account in particular does not lead us to expect to see the restrictions that we do. ESR is not a general-purpose intensive, but affords a small range of context-dependent meanings. This fact I argue is best explained by a gestural account.
In Section 4 I showed how ESR may index actions/events, objects, states/properties, and evaluations of states of affairs. The interpretations that attach to each differ. With the exception of the ‘reflective distance’ interpretation discussed in Section 4.4, these meanings have in common that they signal heightened engagement on the part of the speaker. With states, this seems to be the only meaning. With actions, however, ESR conveys an impression of acceleration or force, and with objects, a sense of obtrusive presence. These interpretations would furthermore appear to be complementary. For example, certain actions, such as those expressed by the verbs shown in (49), may characteristically trigger aversion, and yet ESR is not possible here.
The complementarity raises the question whether these meanings of ESR are conventionalized separately for each context, or whether it is possible to identify a semantic core underpinning them all. What follows is somewhat speculative, but I will argue that this is possible.10
Since spoken gestures utilize the same channel as spoken linguistic units, our analysis of the meaning should consider how the gesture relates to speech norms as well as its inherent properties, making the question of a semantic core a two-dimensional one.
First, ESR constitutes a deviation from a particular communicative norm, in this case, conformity with the phonetic targets given by Sibilant Place Neutralization. An adequate account of ESR must be able to relate its expressive function to the fact that it may only apply where the distinction between alveolar /s/ and postalveolar /ʂ/ is neutralized. As mentioned above, a verb form like [saːɡa] saga, ‘sawed’, may not undergo ESR to give ! [ʂaːɡa], with the intended meaning ‘sawed forcefully’. ESR also never applies word-internally, where before a consonant the contrast between /s/ and /ʂ/ is largely preserved, e.g. /ʋast/, ‘waistcoat’ vs. /ʋaʂt/, ‘worst’; /bask/, ‘flail, flap (IMP)’ vs. /baʂk/, ‘inhospitable, hardened’. I argue that ESR’s very deviation from the targets given by Sibilant Place Neutralization momentarily directs attention to the non-linguistic gesture, signaling an intent on the part of the speaker to communicate something of heightened significance.11
Second, we must consider the intrinsic properties of the gesture itself. The most straightforward possible interpretation of retraction of the tongue tip is that it enacts backward movement or withdrawal, for example, of some other part of the body, in much the same way that raising or lowering the larynx may iconically enact upward and downward movement of another object. Additional plausibility for this claim comes from recent evidence of a neutrally encoded congruence in the direction of manual and mouth actions, such that backward hand movements are preferentially associated with retraction of the tongue (Vainio et al. 2018). See also Sidhu & Pexman (2017) for further relevant discussion.
It is possible that the accelerative and obtrusive interpretations that attach to actions/events and objects derive from this more basic meaning of ‘withdrawal’. Retraction of the tongue tip may enact preparatory movement for using, say, the hand or arm in a throwing or striking action. The accelerative interpretation would thus result from a metonymic connection between the biomechanical phases of movement. When it indexes an object, ESR does not map onto an action or event in the world. Under this condition, ESR may instead enact rejection behavior in the speaker with respect to the indexed object, or perhaps withdrawal in aversion. In this context, negative implicatures become highly relevant in a way that they do not with actions/events or states. If ESR enacts physical repulsion of an obtrusive object, it is possible to see how this meaning derives from the accelerative one by adding a further metonymic connection.
With a state or property, neither the accelerative nor obtrusive interpretation is relevant, with the result that the interpretation defaults to one of ‘heightened engagement’, which is implied in both the accelerative and obtrusive interpretations. Preparatory movement, preparation to engage in the world in some unspecified way, may thus be the first link in the chain.
This leaves the ‘reflective distance’ interpretation discussed in 4.4. In this case, ESR does not seem to index a particular object or predicate, but a speaker’s positive (but self-distancing) evaluation of a state of affairs. It is possible that tongue-tip retraction enacts no more than a withdrawal, which may lead to inferences that the speaker is distancing themselves from what they are saying, engaging in reflection, irony, and so on. Withdrawal, then, may constitute the most basic meaning of the ESR gesture: tongue tip withdrawal iconically enacts the withdrawal of some other part of the body.
Table 1 summarizes the proposed relationships between context (state of affairs, state/property, action/event, and object) and meaning, and the relation between the hypothesized core iconic meaning of ESR, its metonymically derived context-dependent meanings, and additional implicatures that may derive from these.
|Semantic core||Context||(withdrawal is a) metonym for||Interpretation(s)|
|withdrawal||state of affairs||reflective distance|
|action/event||✓||✓||accelerated movement, forcefulness|
|object||✓||✓||✓||aversion, rejection, negative affect|
At least one key question remains unanswered, however. It is striking that for each type of context, it is the most specific meaning that is required. Thus, when ESR indexes actions/events and objects, the meaning obtained is never simply heightened engagement, although there is nothing in what we have said that rules this out. Actions/events must trigger the accelerative interpretation, and objects must trigger the obtrusive one (which I have argued here may be the enactment of motor repulsion, which would include within it the enactment of accelerated movement). I leave the resolution of this issue to future research.
Alveolar /s/ and postalveolar /ʂ/ contrast in North Norwegian, but the contrast is neutralized word-initially before another consonant (Sibilant Place Neutralization). In an apparent reversal, or violation, of the phonological rule, Expressive Sibilant Retraction (ESR) maps /s/ onto the corresponding postalveolar fricative [ʂ] in expressive contexts in precisely the environment where neutralization applies. Since ESR adds meaning, it is tempting to analyse it as a morphologically derived marginal contrast. In this paper, however, I have argued that this phenomenon is not linguistic, but communicative, and best understood as a ‘spoken gesture’ (Okrent 2002). ESR nevertheless exploits deviation from canonical phonological structure to draw attention to the tongue-tip retraction gesture. The core meaning of ESR proposed here is ‘withdrawal’, which gives rise to more specific interpretations depending on whether the gesture is used to index an action/event, object, state/property, or state of affairs. This paper provides an account of the relationship between these interpretations as well as the relationship between its form and function, substantially adding to the early accounts of more or less the same phenomenon in Oslo Norwegian by Larsen (1907) and Broch (1927). My hope is that this account of ESR may be a possible model for describing and explaining other cases of “expressive phonology” in other languages by enriching our understanding of spoken gesture and the relations between linguistic and post-linguistic communicative processes.
1There is a process similar to ESR in Italian described briefly by Ochs and Schieffelin (1989: 15), whereby [s] is similarly changed to [ʃ] to derive some kind of ‘intensive’, e.g. [ʃ]tupido, ‘stupid’, ti [ʃ]paka la testa, ‘I’ll crack your head’, ti di[ʃ]truggo, ‘I’ll destroy you’.
2Original text: Der kan ligge forakt, sinne, kraftfølelse og kjækhet eller beundring, et præg av intimitet, grader av forsterkning i et sådant «š».
3Endresen (1991: 75f.) claims that the palatal‚ fricative, usually transcribed as [ç], is most frequently an alveolo-palatal [ɕ], at least in word-initial position. Although this claim relates to UEN, it applies equally well to the North Norwegian variety described here. In many younger speakers the distinction between /ʂ/ and /ɕ/ is undergoing merger to /ʂ/ (e.g. Simonsen & Moen 2004). Since /ɕ/ is phonologically inert, it will play no further role in this paper.
4The palatal lateral /ʎ/ does not occur in word-initial onsets.
En lignende, men måske litt forskjellig lyd, har s sedvanlig i adj. ṣvǣr (eller švǣr?), hvilket har en særlig psykologisk foranledning: adjektivet stor begynder i denne som i flere østlandske dialekter at få en meget slap klang, og her har man valgt svær til stedfortreder; for at utmale størrelsen, tar man også bogstavelig munden fuld, og begynder ordet med én eller anden sch-lyd. Likeså undertiden med ordet svīn, tildels også i snē og stygg.
6Other terms for the same kind of phenomenon include ‘iconic modulation of speech’, and ‘analog acoustic expression’ (Shintel, Nusbaum & Okrent 2006).
7Note that this is not equivalent to saying that speech is perceived as articulatory gestures as maintained in the Motor Theory of Speech Perception (Fowler 1986; Galantucci, Fowler & Turvey 2006). It is sufficient that cross-modal connections between auditory and proprioceptive experiences are stored in memory and accessed by listeners.
8In the standard language skamfere has the more general sense of ‘to disfigure’.
9Broch (1927: 152) actually supplies an example of the same modal verb with ESR, but the pragmatics of ESR in North Norwegian and UEN cannot be assumed to be identical.
10See Debras (2017) for an analysis of the shrug that works similarly. Debras identifies a semantic core (personal disengagement) that is common to attitudinal, affective and epistemic meanings, which arise in context.
11It may be instructive to draw a parallel to Jakobson’s (1960) discussion of the poetic function of language as “focus on the message for its own sake” (p. 357). While Jakobson focuses on recurrence (e.g. parallelism) as the principal device for achieving this focus, flouting the canons of the language is clearly exploited in literature as well, as shown by Cureton’s (1979; 1981) analysis of deviant morphology and syntax in the poetry of E. E. Cummings.
DEF = definite, DEM = demonstrative, ESR = Expressive Sibilant Retraction, F = FEMININE, IMP = imperative, INF = infinitive, M = masculine, N = neuter, OBJ = object, OT = Optimality Theory, PL = plural, POSS = possessive, PRS = present, PST = past, PTCP = participal, REFL = reflexive, SG = singular, SBJ = subject, UEN = Urban East Norwegian
I would like to thank three anonymous reviewers for substantial comments on this paper.
The author has no competing interests to declare.
Bavelas, Janet, Jennifer Gerwing & Sara Healing. 2014. Including facial gestures in gesture-speech ensembles. In Mandana Seyfeddinpur & Marianne Gullberg (eds.), From gesture in conversation to visible action as utterance: Essays in honor of Adam Kendon, 15–34. Amsterdam: John Benjamins. DOI: https://doi.org/10.1075/z.188.02bav
Benua, Laura. 1999. Identity and ablaut in Javanese elatives. In Sachiko Aoshima, John Drury & Tuomo Neuvonen (eds.), University of Maryland working papers in linguistics 8. 1–31. College Park, MD: UMDWPL.
Broch, Olaf. 1927. Lyden [š] som ekspressivt middel i Oslo-målet. In Festskrift til Hjalmar Falk. 30. desember 1927. Fra elever, venner og kolleger, 1–12. Oslo: H. Aschehough & Co. (W. Nygaard). Reprinted (1981) in Ernst Håkon Jahr & Ove Lorentz (eds.), Fonologi/Phonology. Studier i norsk språkvitenskap/Studies in Norwegian Linguistics 1. Oslo: Novus.
Bye, Patrik. 2013. The lexicon has its grammar, which the grammar knows nothing of: Marginal contrast and phonological theory. Nordlyd 40(1). 41–54. http://www.ub.uit.no/baser/nordlyd/. DOI: https://doi.org/10.7557/12.2500
Bye, Patrik & Peter Svenonius. 2012. Non-concatenative morphology as epiphenomenon. In J. Trommer (ed.), The morphology and phonology of exponence. Oxford: Oxford University Press. DOI: https://doi.org/10.1093/acprof:oso/9780199573721.003.0013
Cureton, Richard. 1979. E. E. Cummings: A study of the poetic use of deviant morphology. Poetics Today 1(1/2). 213–244. DOI: https://doi.org/10.2307/1772048
Debras, Camille. 2017. The shrug. Gesture 16(1). 1–34. DOI: https://doi.org/10.1075/gest.16.1.01deb
Diffloth, G. 1979. Expressive phonology and prosaic phonology in Mon-Khmer. In T. L. Thongkum, et al. (eds.), Studies in Tai and Mon-Khmer phonetics and phonology In honour of Eugénie J. A. Henderson, 49–59. Bangkok: Chulalongkorn University Press.
Digonnet, René. 2018. The linguistic expression of smells: from lack to abundance? In A. Baicchi, R. Digonnet and J. L. Sandford (eds.), Sensory perceptions in language, embodiment and epistemology, 177–191. Cham: Springer. DOI: https://doi.org/10.1007/978-3-319-91277-6_10
Ekman, Paul. 1997. Should we call it expression or communication? European Journal of Social Sciences 10. 333–359. DOI: https://doi.org/10.1080/13511610.1997.9968538
Feist, Jim. 2013. ‘Sound symbolism’ in English. Journal of Pragmatics 45. 104–118. DOI: https://doi.org/10.1016/j.pragma.2012.10.008
Fowler, Carol A. 1986. An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics 14. 3–28. DOI: https://doi.org/10.1016/S0095-4470(19)30607-2
Galantucci, Bruno, Carol A. Fowler & Michael T. Turvey. 2006. The motor theory of speech perception reviewed. Psychonomic Bulletin & Review 13(3). 361–377. DOI: https://doi.org/10.3758/BF03193857
Haugen, Einar. 1942. Analysis of a sound group: sl and tl in Norwegian. Publications of the Modern Language Association of America 57. 879–907. DOI: https://doi.org/10.2307/458776
Jahr, Ernst Håkon. 1985. Another explanation for the development of s before l in Norwegian. In Jacek Fisiak (ed.), Papers from the VIth International Conference on Historical Linguistics, Poznań, 22–26 August 1983, 290–300. Amsterdam: John Benjamins. DOI: https://doi.org/10.1075/cilt.34.21jah
Kendon, Adam. 2004. Gesture: Visible action as utterance. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511807572
Kendon, Adam. 2017. Pragmatic functions of gestures: Some observations on the history of their study and their nature. Gesture 16(2). 157–175. DOI: https://doi.org/10.1075/gest.16.2.01ken
McCarthy, John J. 2008. Doing Optimality Theory: Applying theory to data. Oxford: Blackwell. DOI: https://doi.org/10.1002/9781444301182
Ochs, Elinor & Bambi Schieffelin. 1989. Language has a heart. Text 9(1). 7–25. DOI: https://doi.org/10.1515/text.1.19188.8.131.52
Okrent, Arika. 2002. A modality-free notion of gesture and how it can help us with the morpheme vs. gesture question in sign language linguistics (Or at least give us some criteria to work with). In Richard P. Meier, Kearsy Cormier & David Quinto-Pozos (eds.), Modality and structure in signed and spoken languages, 175–198. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511486777.009
Perlman, Marcus, Nathaniel Clark & Marlene Johansson Falck. 2015. Iconic prosody in story reading. Cognitive Science 39(6). 1348–1368. DOI: https://doi.org/10.1111/cogs.12190
Poyatos, Fernando. 1993. Paralanguage: A linguistic and interdisciplinary approach to interactive speech and sound. Amsterdam: John Benjamins. DOI: https://doi.org/10.1075/cilt.92
Poyatos, Fernando. 2002. Nonverbal communication across disciplines. Vol. 1: Culture, sensory interaction, speech, conversation; Vol. 2: Paralanguage, kinesics, silence, personal and environmental interaction. Amsterdam: John Benjamins. DOI: https://doi.org/10.1075/z.ncad2
Prince, Alan S. & Paul Smolensky. 2004. Optimality Theory: Constraint interaction in generative grammar. Oxford: Blackwell. DOI: https://doi.org/10.1002/9780470759400
Rummer, Ralf, Judith Schweppe, René Schlegelmilch & Martine Grice. 2014. Mood is linked to vowel type: The role of articulatory movements. Emotion 14(2). 246–250. DOI: https://doi.org/10.1037/a0035752
Selkirk, Elisabeth. 2011. The syntax-phonology interface. In John Goldsmith, Jason Riggle & Alan C. L. Yu (eds.), The Handbook of Phonological Theory, 435–484. Malden, MA: Wiley-Blackwell. DOI: https://doi.org/10.1002/9781444343069.ch14
Shintel, Hadas, Howard C. Nusbaum & Arika Okrent. 2006. Analog acoustic expression in speech communication. Journal of Memory and Language 55(2). 167–177. DOI: https://doi.org/10.1016/j.jml.2006.03.002
Sidhu, David M. & Penny M. Pexman. 2017. Five mechanisms of sound symbolic association. Psychonomic Bulletin & Review 25(5). 1619–1643. DOI: https://doi.org/10.3758/s13423-017-1361-1
Simonsen, Hanne Gram & Inger Moen. 2004. On the distinction between Norwegian /ʃ/ and /ç/ from a phonetic perspective. Clinical Linguistics & Phonetics 18. 605–620. DOI: https://doi.org/10.1080/02699200410001703664
Stausland Johnsen, Sverre. 2012. Variation in Norwegian retroflexion. Nordic Journal of Linguistics 35(2). 197–212. DOI: https://doi.org/10.1017/S0332586512000194
Stevens, Kenneth N. & Samuel Jay Keyser. 1989. Primary features and their enhancement in consonants. Language 65(1). 81–106. DOI: https://doi.org/10.2307/414843
Vainio, Lari, Kaisa Tiippana, Mikko Tiainen, Aleksi Rantala & Martti Vainio. 2018. Reaching and grasping with the tongue: Shared motor planning between hand actions and articulatory gestures. Quarterly Journal of Experimental Psychology 71(10). 2129–2141. DOI: https://doi.org/10.1177/1747021817738732