1 Arbitrariness and language boundaries

1.1 Introduction

Current research in a few not immediately related branches of linguistics expands the “boundaries of language” revising both form-related and function-oriented aspects of phenomena regarded as linguistic. Considering the form-related issues, the research in the last decades has challenged the strict application of arbitrariness for drawing the line between linguistic vs. para-linguistic phenomena. It argues instead that non-arbitrary signs constitute an inherent part of language (Perniss et al. 2010; Dingemanse et al. 2015; Ferrara & Hodge 2018). Widespread and well-studied topics of this research are sound symbolism (cf. e.g. Alderete & Kochetov 2017 for a case study and an overview), ideophones (Dingemanse 2018) and various topics in sign language (e.g. Schembri et al. 2018; Goldin-Meadow & Brentari 2017 for review).

From the function-oriented point of view, different theoretical approaches expand the traditional domains of grammar and semantics, attributing a central role to such phenomena as argumentative meaning, attention-guidance, stance-expression and interaction-management. For instance, Verhagen (2008) argues for a primary function of intersubjective coordination in language, demonstrating how grammatical and lexical items have an inherent argumentative semantic component. Engagement – the grammatical expression of “the speaker’s assumption about the degree to which their attention or knowledge is shared by the addressee” (Evans et al. 2018: 110) – is also gaining ground as a central concept in grammar. These frameworks view influencing interlocutors’ beliefs, attitudes and attention states as having a central place in language structure. Conversation Analysis and Interactional Linguistics introduce the management of interaction and discourse into the core domain of language (Couper-Kuhlen & Selting 2018). Finally, depictive and expressive meaning received scholarly attention not as para-linguistic phenomena but as distinct modes of representation (Clark & Gerrig 1990; Nuckolls 1996).

Notably, it is often the case that devices marginalised due to their interactional or depictive function are also expressed by markers formerly disregarded due to their non-arbitrary nature. For instance, indexical phonetic means in English are used in stance-taking (Ogden 2012), depiction resorts to indexical and iconic devices (Clark 2016; Dingemanse & Akita 2017) and an appeal to an addressee often relies on prosodic indexing (Smith 2010; Noel Aziz Hanna & Sonnenhauser 2013).

1.2 Non-arbitrary forms and interactional functions in prosody research

The study of prosody and intonation appears to be particularly susceptible to the changing views on non-arbitrariness and interactional-expressive meaning (cf. Couper-Kuhlen 2001 for a succinct summary). The original position in the contemporary research situated intonation “around the edge of language” (Bolinger 1964) both due to its non-discrete nature and its attitude-oriented function. In this approach, prosodic features are considered to be beyond grammar and linked directly to emotion-signalling (Bolinger 1989). In Gumperz’ view (1992), most of prosodic signs are “contextualisation cues”, the import of which is determined inferentially and jointly with other cues. In these approaches, even such markers as accent placement and boundary tones in English are regarded as emotion-signalling and/or underspecified cues which do not form a part of the language grammar.

An opposite position treats intonation as a grammatical phenomenon decomposable into primitive arbitrary and discrete intonational phonemes (Beckman & Pierrehumbert 1986; Ladd 2008). Although intonational marking at times appears to have an underlying iconic motivation, this is analysed as an outcome of the grammaticalization of biological and physiological factors, which do not synchronically affect the grammatical status of intonational phonology (Gussenhoven 2004; Ladd 2008). Non-arbitrary properties of intonation are regarded as para-linguistic (Chen et al. 2004).

The contextualization-oriented perspective is maintained in Interactional Linguistics, which combines it with functional approaches to intonation. This research identifies systematic correspondences between prosodic marking and interaction management. Some of the identified markers function as classic contextualising devices, providing interpretation-guiding cues which, jointly with other cues, help speakers achieve their communicative goals. Clearly non-arbitrary devices are often shown to have regular context-dependent contributions (cf. e.g. Hoey 2014 for sighing). However, other prosodic means have a highly specific form–function correspondence, although non-arbitrary properties remain indispensable for their function. For instance, Sicoli (2010) shows how voice quality in languages of Mesoamerica underwent “processes of formalization” (p. 546), becoming an iconic marking of speech registers and social roles. As such, this feature constitutes a conventional category of form–function oppositions. It indeed shares many characteristics with contextualisation cues due to the non-arbitrary nature and affectual content (Levinson 2003). However, these properties notwithstanding, from the structural point of view, it is a paradigmatic set of a specific form–meaning pairing with a conventionalised ancillary meaning contribution, and thus can be regarded as a grammatical category (Boye and Harder 2012). It is consequently better comparable to grammatical devices such as interactional particles typical of East Asian languages (Morita 2015), rather than to clusters of underspecified contextualisation cues.

This paper presents two closely related cases of non-arbitrary prosodic marking in Anal Naga, a Trans-Himalayan (Tibeto-Burman) language of Northeast India. The two markers similarly resist an easy classification on the scale of their linguistic nature and grammatical status. The non-arbitrary suprasegmental form of both markers is indispensable for their interaction-managing and interpretation-guiding functions. However, they also exhibit properties of grammatical constituents: their form-function relation is consistent and conventional, and they have distinct distribution and position rules.

The first marker is a strong accent on the last syllable of the Intonation Unit (IU), dubbed here a Response-Seeking contour (RS). It is a deviant rising-falling pitch on a lengthened vowel, characterised by high intensity and precise articulation otherwise atypical for a grammatical constituent in the final position. Functionally, it is partly parallel to the final rise in many better-described European languages. This is often analysed as a grammatical marking of question (e.g. Truckenbrodt 2012 for English), although typically has broader phatic functions (Smith 2010 for French; Couper-Kuhlen 2012).

The second device is Prosodic Intensification (PI) whose cross-linguistics parallels are at times dubbed “Intensifying Emphasis” (Niebuhr 2010; Ogden 2012). In this case, one lexical item is characterised by highly distinct prosodic properties such as rhyme lengthening and/or a notable pitch level. The non-arbitrary nature of PI often results in its analysis as a para-linguistic product of the effort code (Gussenhoven 2004). PI is also mentioned for various Trans-Himalayan languages. For example, this kind of “para-linguistic emphasis” is found in Lahu (Matisoff 1994: 117), Wǎdū Pǔmǐ (Daudey 2014: 123) and Yongning Na (Michaud 2017: 375–377). This study analyses a parallel prosodic contour in Anal Naga. In its basic form, it has the properties of RS, namely a deviant strong accent. However, these are often exaggerated producing an abrupt deviation into a very high pitch or falsetto voice and/or an extreme elongation of the rhyme.

The paper is structured as follows. Section 2 provides background on the language and the data for the study. Section 3 is dedicated to RS: its form, function and distribution, as well as some cross-linguistically relevant phenomena. Section 4 follows the same structure to address PI. Section 5 summarises the findings, concluding with a discussion regarding the place and role of this kind of non-arbitrary prosodic signs in language.

2 Language background and data

Anal Naga language is spoken by around 20,000 speakers in Chandel District of the state of Manipur in north-eastern India. It is a Trans-Himalayan (Tibeto-Burman) language of the North-Western group (formerly dubbed Old Kuki) of the South-Central branch. Despite the outdated linguistic terminology, the Anal Naga people do not affiliate ethno-politically with Kuki-Chin groups but are one of the Naga communities of Manipur.

There are short and long vowels in the language that distinguish two tones: high and falling-low. Lexical stems and some grammatical morphemes carry an inherent tone. Other grammatical morphemes are assigned a tone by rules that involve a combination of tonal polarity and spreading (Ozerov 2018). Utterance final particles have no inherent tone and exhibit the final prosodic contour. The maximal syllable structure in the language consists of an onset consonant, a vowel and an optional final nasal or liquid consonant. Verbs are morphologically complex and in addition to the root exhibit hierarchical system of person indexation as well as multiple modifying suffixes, directional prefixes, stem-alternation, a large inventory of TAM-markers and other grammatical affixes. The alignment of nominal arguments is pragmatic ergative; the syntax is verb final. The prosodic system of the language requires further research. A preliminary study of 200 examples of Intonation Unit (IU) final contours revealed an utterance terminating falling contour and a continuing-level contour, as well as the infrequent level elongated hesitation. IU-final syllables where these contours occur are typically characterised by reduced intensity and imprecise articulation.

The contours analysed in this study are set apart by their distinct properties both from the lexical tones and the prosodic contours of Anal Naga. These are a rising-falling contour (marked â in the transcription) or a contour of an extra-high pitch value (marked a̋). The Response-Seeking marking (RS) occurs on the final syllable of the IU. RS is deviant relative to both tonal and prosodic contours in the languages due to its abrupt rising-falling shape as well as accent-like characteristics and precise articulation, atypical for IU-final syllables. Prosodic Intensification (PI) occurs within an IU on items that do not otherwise exhibit any salient properties. In the case of PI, they receive a deviant strong accent, and the pitch shifts above the otherwise highest pitch level of the IU, exhibiting an abrupt rise-fall within a single syllable. The pronunciation of the PI can also shift into a falsetto voice. Both RS and PI override the lexical or grammatically assigned tone of the syllable.

Hence, PI and RS are distinct both from the tonal inventory of Anal Naga and from the prosodic system. They are characterised by a rising-falling contour, alien otherwise to the tone system of the language (which lacks rising and contour tones) and distinct from the otherwise falling and level IU-final prosody. Both RS and PI exhibit an array of naturally salient properties: clear articulation, deviant accent, extra-lengthening and effort-requiring pitch deviation into levels that exceed the pitch of the surrounding material, often above the regular speaker’s register. These characteristics suggest their non-arbitrary nature and hence would potentially result in their classification as “para-linguistic” markers.

The examined examples of the two contours were manually collected from the recordings. They were coded for the length of the vowel and the rhyme, maximum pitch, semantics of the host morpheme, position in the IU and the interpretive effect produced by the marker. The study was complemented by a preliminary exploration of IU-final prosody based on 200 examples, which corroborated the distinct status of RS. Overall, examples of both RS and PI demonstrate core groups characterised by deviant properties of pitch, length and articulation (illustrated for PI in Figure 18 in Section 4.7), as well as consistent morphological distribution for PI. The relationship between these two non-arbitrary markers and the main prosodic system of the language, as well as the status of the small number of transient examples, are left for future research.

The recordings included in the data represent a few different genres: natural conversation of multiple participants in their usual everyday settings (2 recordings, 23 minutes, 9 female and 3 male speakers in total), one interview-style conversation (12 minutes, 2 female speakers) and storytelling (5 recordings, 43 minutes, 3 female and 2 male speakers). The narratives and the interview were recorded by community members in settings staged for the recording, where the speakers sat close to a pair of shotgun Røde NTG2 microphones paired with a video-camera. In these settings some of the speakers could naturally interact with the data collector over the course of the recording. Natural conversations were recorded with a Zoom Q8 camera and an SSH-6 microphone, which were positioned on the porch of participants’ homes. The data were further transcribed and translated by a team of community members and complemented by elicitation during 9 months of fieldwork with the community. The data collection was approved and supervised by the community authorities and all data processing was carried out by community members. All participants provided their consent for the recordings and the subsequent usage of the data for research, and were reimbursed for their participation in accordance with community customs.

There are 96 examples of the RS-contour in the data. Since the function of this contour is primarily interactional, some aspects of its usage in real interaction are left for further research for two reasons: (a) many of the recordings used for this study are story-telling and the examples are found in constructed dialogues, and (b) an important interactional effect of the marker is a backchannelling response m̩ː ‘yes’, the systematic detection of which requires more sophisticated audio-equipment than used for this collection (an individual clip-on microphone for each participant). There are 173 examples of PI in the examined 78 minutes of recordings. Out of these, 161 were used for the phonetic analysis, as the rest occurred in overlapping speech, had background noises, or presented other measurement challenges. The examples are accompanied by the indication of the gender of the speaker, the data source, a figure with a spectrogram and a pitch track. Sound files are available as supplementary files. The program used for the prosodic analysis is Praat 6.0.28 (Boersma & Weenink 2017). The figures were generated with a Praat script by Elvira-García (2017).

3 Response-Seeking IU-final contour (RS)

The Response-Seeking contour (RS) discussed in this section occurs at the end of Intonation Units (IU). It is an abrupt upshift in pitch on the last syllable of the IU followed by a fall, and is often accompanied by the lengthening of the syllable. It is additionally characterised by clear articulation and high intensity, atypical otherwise for grammatical utterance-final constituents. Most frequently, it takes place on a dedicated utterance-final interactional particle, which have no inherent tone. However, it can occur also on lexical or grammatical morphemes that occupy the IU-final position. In the latter case, it overrides the tone of the constituent. Functionally, it has the primary characteristics of a phatic device: “serving the purpose of establishing and maintaining contact” and “soliciting the hearer’s cooperation in the production of discourse” (Smith 2010: 292–293, original emphasis). However, this view of phaticity is to be broadened from a discourse-framed to action-oriented perspective, as the hearer’s cooperation in this case encompasses various kinds of responsive actions, discourse production being only one of them. The following sub-sections describe the specific characteristics and the central functions of this contour listed in Table 1 (due to the paucity of examples, the function ‘end of narrative’ is not discussed below). All these interpretations can be analysed as arising from the primitive function of RS, which is analysed here as a direct appeal to the interlocutor requesting a cooperative response.

Table 1

Distribution and functions of RS.

Distribution Function N (%)
• particles ne, no
• any final constituent
request of comprehension approval
negotiated turn-taking
50 (52%)
• particles ve, vo
• any final constituent
proposal/offer/request for action:
proposal-making assertion, hortative,
proposal-making imperative, request of stance-alignment
11 (11%)
• dubitative-question marker mo
• any final constituent
information request, involved/puzzled question, rhetorical question 19 (20%)
• vocative marker =o
• any final constituent
address 14 (15%)
• any final constituent end of narrative 2 (2%)
Total: 96 (100%)

3.1 Back-channelling request

RS commonly occurs with a range of utterance-final particles that call for the interlocutor to confirm that the speaker has achieved their discourse goal and can proceed further with the talk. It is used to verify that the listener is attending to the speaker, to verify that the interlocutor agrees with the speaker’s statement or evaluation, to ensure that the interlocutor has understood the discourse contribution of the content and/or that they have drawn the necessary inferences. This usage constitutes the largest group of RS with 50 (52%) examples. Most frequently it is found on IU-final discourse-managing particles. It typically triggers an acknowledgement in the form of a back-channelling response m̩ː ‘yes’. This can be seen in (1) which conveys the complicating action of the narrated story, as two brothers leave their sister alone and depart on a long journey. Addressing this utterance to the listener with the RS-contour on the final particle =ne, the storyteller seeks to verify that the listener comprehends its crucial contribution to the overall plot.

    1. (1)
    1. va-ʈá-hín=tũ
    2. 3-brother-PL=ERG
    1. a.kàm-mòl-jé-nʉ́
    2. lock-ENTIRELY-3PL-NF
    1. tɕə́ː-tɕá-je=nêːː
    2. go-PERF-3PL= ACKNW
    1. ‘Her brothers locked [her at home] and left, eh?’ (f) (anm_20160220_Thum_PO_1 4’56’’) (Figure 1) (audio: Example 1)
Figure 1
Figure 1

Example (1), RS-contour on =ne; back-channelling response.

Figure 1 shows the typical properties of the RS-contour. The pitch movement of =ne is clearly salient against the low preceding pitch: it abruptly climbs to 315 Hz from 160 Hz. In addition, the intensity of =ne is higher and it is more clearly articulated than the preceding verb. Following a brief pause (150 ms), the listener of the story provides the expected back-channelling response m̩ː, also shown in Figure 1.

Importantly, RS can also occur on the last morpheme of the IU without a final particle overriding the tone of this morpheme. In (2) it occurs on the NP-final absolutive marker -to, inviting the listener to identify a newly introduced referent and acknowledge its acceptability. Notice how, instead of the expected high tone on this marker, the pitch abruptly climbs from 150 Hz to over 200 Hz and the speaker briefly pauses awaiting the interlocutor’s acknowledgement. She reinitiates the narration only after obtaining the response m̩ː (Figure 2).

Figure 2
Figure 2

Example (2), RS on -to ABS calling for referent identification; back-channelling.

    1. (2)
    1. Narrator (f):
    1. va-pàːn-tʰùŋ-hín=tũː
    2. 3-village-inside-PL=BKGR
    1. lapanʉ́
    2. old.woman
    1. akʰéː-tôː
    2. one-ABS
    1. …150…
    1. Listener (m):
    1. m̩ː
    1. Narrator:
    1. bù
    2. cloth
    1. akʰõ-reréː-nʉ́
    2. weave-IDEO.DIM-SEQ
    1. i-ám-jáː-pá
    2. NMLZ-be-JUST-COP
    1. ‘- In their village, an old woman (eh?)’
    2. ‘- Mhm.’
    3. ‘- there was one who lived there and used to weave clothes.’ (anm_20160220_Thum_PO_1 4’49’’) (Figure 2) (audio: Example 2)

These occurrences emphasise that RS does not merely piggyback on the function of the interactional particles. Instead, it has its own function, which combines with the function of the IU-final particles. The RS-contour in (2) can be minimally compared with the same similarly IU-final absolutive marker -to in (3). As can be seen in (3), the final contour exhibits a steady fall. The speaker pauses in this case as well, yet unlike (2) where the speaker waited for the back-channelling response, here it is a result of replanning (as is evidenced by the reinitiated syntactic structure of the following clause, in which the initial constituent has no role). There is no response from the interlocutor during this pause.

    1. (3)
    1. va.də̀.náː.te
    2. so.to.say
    1. sá-pá-tã̀ː-tò | ..215..
    2. animal-AUG-trap-ABS
    1. tɕa̯kʰə̀-to
    2. deer-ABS
    1. a.mín-nʉ́
    2. be.trapped-NFUT
    1. ‘So to say, the animal trap, a deer was caught.’ (f) (anm_20160924_Th_gr_1 12’20’’) (Figure 3) (audio: Example 3)
Figure 3
Figure 3

Example (3), no RS on -to ABS before a pause, no back-channelling.

A common interactional function of RS that similarly employs =ne and triggers an acknowledging, preferably back-channelled response is a negotiation of turn-taking. The contour is used in self-selection by a “dispreferred” participant, who attempts to grab the turn or has been non-dominant in the preceding discourse. In this case, a speaker who does not actively participate in the conversation employs RS on the first constituent to obtain the others’ attention and approval for proceeding with the talk.

3.2 Proposal/request for action

Hortative and proposal-making utterances are similarly characterised by RS on the final discourse particle (11 examples, 11.5%). This usage appeals to the interlocutor, inviting them to consider performing the requested action. Importantly, RS is not found with plain orders, which do not appeal for consideration and negotiation and end with a fall. As such, it can be compared to the “friendly rise” commonly found in proposal-making imperatives in various European languages (Truckenbrodt 2012: 2063). A recurrent usage of RS in this function can be seen in (4): the narrated dialogue enacts a proposal (a hortative Let’s go, OK?) and the counter-alternatives proposed by the interlocutor. The three verbs used to appeal for actions are marked by RS (Figure 4 illustrates only the latter two).

Figure 4
Figure 4

Example (4), RS on action-proposals.

    1. (4)
    1. vá-tɕəká=vêːː
    2. go-HORT=HORT
    1. va-də́ː-náː=te
    2. 3-say-LOC=CNTR
    1. ‘He1 said: “Let’s go (to check the bird traps in the forest), OK?
    1. anì
    2. sun
    1. sá=so
    2. hot=IRR.SUB
    1. vá-péː-ká-níŋ=vêːː
    2. go-just\FUT-AFF-1SG=HORT
    1. vá-tʰe.tʰe=ôːː
    2. go-ahead.RDP=ADDR
    1. də̀-nʉ́
    2. say-NFUT
    1. ‘He2 said: “When the sun rises, I will go, OK? Go ahead, OK?”.’ (f) (anm_20160221_Shar_PO_1 38’’–49’’) (Figure 4) (audio: Example 4)

3.3 Questions

RS-contour occurs nearly obligatorily with polar questions. It is also attested occasionally on the last syllable of content (“wh”) questions, as (5) demonstrates.

    1. (5)
    1. dáː
    2. what
    1. vájə́l
    2. issue
    1. ka-ʈóː=rʰaŋ
    2. 1-do=PURP
    1. vâːː
    2. COP
    1. ‘What case is there for me to settle?’ (f) (anm_20160221_Shar_PO_1:2’02’’) (Figure 5) (audio: Example 5)
Figure 5
Figure 5

Example (5), RS with a question.

The usage of RS with content questions is facultative, as can be seen in (6) where it is absent, and the same constituent (the copula vá) exhibits a steady fall, regular short vowel and imprecise articulation.

    1. (6)
    1. va-mʰĩ̀ː=te
    2. 3-name=CNTR
    1. dáː
    2. what
    1. vá
    2. COP
    1. ‘What is its name?’ (m) (anm_20160220_Thum_PO_1 12’36’’) (Figure 6) (audio: Example 6)
Figure 6
Figure 6

Example (6), no RS in question.

With content questions the falling contour is used when the communicative channel is already established, and the question is a plain request of information. The RS-contour is found when the speaker appeals to the interlocutor with the question simultaneously establishing the communicative channel or when the speaker is surprised, puzzled or has an utmost interest in the questioned information.

3.4 Vocative

RS is inherently found on vocatives (14 examples, 15%): it either occurs on the vocative particle =óː or is found directly on the last syllable of the noun. This can be seen in (7) and (8) accordingly; both examples are produced by the same speaker within the same narrative and are shown jointly in Figure 7.

Figure 7
Figure 7

Examples (7) and (8), RS as vocative.

(7) pʉ̂ːː ‘Grandfatherǃ’ (m) (anm_20151202_PO_Anthung_2_Folkstory 4’14’’)
(8) pʉ́=ôːː ‘Grandfatherǃ’ (m) (anm_20151202_PO_Anthung_2_Folkstory 4’07’’) (Figure 7) (audio: Example 7–8)

3.5 Discussion: form and function of the Response-Seeking contour

Occurring at the end of the IU and in Transition Relevance Places, RS constitutes an interaction-managing marker. It fits the definitions of classic phatic devices as a marker that solicits establishing the contact with the interlocutor and their cooperation in the production of discourse (Smith 2010). In fact, the requested response is not limited to a coordinated discourse production, but is aimed at achieving cooperation with respect to the action-oriented purpose of the speaker’s utterance. These actions can be classified into four types: (i) an appeal for the interlocutors’ approval of their comprehension and acceptance of information, (ii) a proposal and negotiation of actions, (iii) information requests and (iv) contact-establishing calls and centring attention at the speaker in turn-taking. Some of these actions belong indeed to the management of verbal interaction while others are aimed at coordinating physical activity (such as going jointly to the forest). It can thus be concluded that RS is an appeal for the addressee’s responsive action (Stivers & Rossano 2010) with respect to the goal pursued by the utterance.

As for the form, the pitch of RS is characterised by an abrupt shift away from the unfolding and expected falling or level intonation trajectory at the end of an IU. It is characterised by an otherwise unusual rising-falling contour. Its articulation and intensity are also atypical for IU-final syllables. Moreover, the sudden rise reaches a peak which often exceeds the highest pitch level within the IU. The contour, the pitch height and the articulation are highly dissimilar from the rest of the IU and are unusual for the final position. As such, the contour is intrinsically non-integrated in its prosodic environment and consequently is salient – in the sense of its inherent distinctness relative to the surrounding items. Moreover, the non-integration of the RS pitch contour is a direct product of an additional speaker’s effort investment. Notably, research has directly linked non-integration to the effect of attention-drawing (Dingemanse & Akita 2017: 505), and as a result to the vocative function (Noel Aziz Hanna & Sonnenhauser 2013).

Thus, the form of RS is a naturally salient attention-drawing device. It inherently involves a deviant effort expenditure on the side of the speaker, which is counter-expected in the utterance final position. Hence, the contour itself is a natural sign for the fact that the speaker invests an additional production effort, and therefore constitutes an index thereof in the Peircean sense. The deviance of the contour inherently draws the interlocutors’ attention at the speaker and at the utterance, making the interlocutor establish the contact with the speaker, and/or evaluate the contribution in the search of the motivations for the expenditure of the additional effort. As such, the form of RS is directly responsible for its function as an appeal for a responsive action with respect to the utterance. RS is a non-arbitrary, indexical marker with a direct, natural relation between its form and its function.

And yet, this naturally oriented explanation overgeneralises the actual form–function relation in the case of RS and misses its language-specific details. On the side of the form, any deviant form such as a prosodic break or a rising tone would constitute an intonational deviation and consequently could be expected to achieve the same result. In fact, these devices are indeed found cross-linguistically to mark vocatives (Noel Aziz Hanna & Sonnenhauser 2013 for prosodic breaks), phatic appeals (Smith 2010 for rising intonation in French and German) and requests for a responsive action (e.g. Ozerov 2019 for final rises in Israeli Hebrew). Parallel functions are also reported for the rising tone in Naxi, a very distantly related Trans-Himalayan language: since the language has otherwise only level tones, a rising contour is an intrinsically deviant marker used for a range of interactional functions (Michaud 2006). Hence, despite its inherent distinctness, Anal Naga RS is in fact a highly conventionalised language-specific form.

The corresponding function of RS is similarly conventional and language specific. The analysis of the natural outcome produced by RS as “addressee-engaging” or “response-seeking” would be an over-generalisation of the precise set of the functions that this marker has. For instance, RS occurs with non-trivial content questions in Anal Naga, such as contact-establishing or surprised ones. This is unlike the phatic final rise in other languages, such as Israeli Hebrew, where it is commonly found in prototypical content questions (Ozerov 2019). Similarly, RS is not regularly found with imperatives, although these inherently appeal for a responsive action.

The role of RS in Anal Naga and the distribution of its usage are not as broad as an underspecified ad hoc product of natural indexing could predict. Instead, it is a consistent language specific function (“an appeal for a responsive action”) with specific distribution rules: it is pertinent to a few identifiable constructions with a distinct conventionalised opposition for each of them. It can be concluded that on the one hand, the set of the functions of RS in Anal Naga is naturally related to its form, and this form is synchronically indispensable for its attention-drawing and response-mobilising function. It is a non-arbitrary, indexical sign. On the other hand, RS is a conventionalised language-specific marker with a defined form, a closed set of functions and a specific distribution.

Consequently, RS is a non-arbitrary “symbolic indexical” (Enfield 2009), which has grammatical components, necessarily enriched by indexical context-bound interpretation. It can be regarded as a vocal gesture (Okrent 2002), as its conventional, pointing form naturally triggers a closed set of attention-drawing effects, while the precise contribution in each case is subject to a contextual resolution.

4 Prosodic Intensification (PI)

4.1 Introduction

The second deviant contour addressed in this study is Prosodic Intensification (PI). The forms of PI and RS share many properties and exhibit a parallel relationship between their non-arbitrariness and their function. While a broader generalisation of the two markers is not impossible (see Section 5), they are kept apart in this study for both formal and functional reasons. From the point of view of the form, the minimal shape of PI is identical to that of RS but occurs IU-internally. It is a strongly accented and clearly articulated syllable that abruptly deviates from the expected pitch pattern rising slightly above the pitch range of the IU and falling to the baseline. The minimal form of PI is found in (9), which additionally terminates with RS, thus allowing the comparison of the two contours.

    1. (9)
    1. atúŋ-káːl=te
    2. now-time=CNTR
    1. amá-lulu
    2. 3-like
    1. mí.râː
    2. nobody
    1. i-tʰùŋ=so
    2. NMLZ-dress=ADD
    1. a.tʰùŋ-mîː
    2. dress-3NEG
    1. ‘Nowadays, nobody dresses like him, eh?’ (f) (anm_20160924_Th_gr_1 6’31’’) (Figure 8) (audio: Example 9)
Figure 8
Figure 8

Example (9), minimal PI on mirâː ‘nobody’ vs. RS on the final syllable.

As Figure 8 demonstrates, mirâː ‘nobody’ is characterised by a minimal shift above the pitch level of the IU and a rising-falling contour. This minimal form of PI is recurrently found with the adnominal clitic ráː which forms a part of the words ‘nothing’ and ‘nobody’ (see Section 4.2). The IU in (9) terminates with a similar contour (adjusted to the lower pitch of the subsequent material), which is an instance of the stance-alignment requesting RS.

The minimal form of PI can be augmented by a pitch excursion to extraordinarily high levels or a falsetto voice, as well as by an extreme elongation of the rhyme, illustrated in (10).

    1. (10)
    1. pəhàŋ-hàŋ-ʈə̋ːː-má=nôːː
    1. ‘Don’t open the door at all, OK?’ (f) (anm_20160220_Thum_PO_1 4’44’’) (Figure 9) (audio: Example 10)
Figure 9
Figure 9

Example (10), PI with augmentative meaning ‘at all’.

The pitch abruptly shifts from below 200 Hz to over 650 Hz, becoming a high squeak. The length of the rhyme is nearly 600 ms, three times the length of a regular long vowel in the language.

From the point of view of their contribution, RS and PI act on two different functional levels. RS occurs at the end of an IU. As an edge marker, it is an interaction-managing device. PI has a content-managing role narrowly affecting the item on which it occurs. It aligns with a single syllable inside the IU and – due to the predominantly monosyllabic morphology of the language, targets a single lexeme. This salience triggers a set of semantic-pragmatic interpretations relative to the meaning of the affected item. This semantic-pragmatic contribution is typically not accompanied by any interactional effects, such as response or stance-alignment. Importantly, with a limited set of exceptions (mainly quantifiers), PI occurs on specific sets of grammatical morphemes and in one morphosyntactic construction. The following sections describe the semantic classes with which PI occurs and the interpretation that it triggers with each class, listed in Table 2. The overall function of PI is analysed in Section 4.7. Finally, Section 4.8 outlines a few cross-linguistic parallels, demonstrating that indexing-iconic prosody is a common but conventionalised language-specific phenomenon.

Table 2

PI interpretations according to semantic classes.

Semantic class Semantics of items Produced effects (from minimal pitch to extra-high) N (%)
edge degree or precision (number, time etc.) augmentative, distributive, duration, habitual, excessive verbal and nominal suffixes,
lexical quantifiers and intensifiers
• counter-expected quantity, duration or degree; contrast, novelty
• speaker’s stance, emotional involvement, textual importance
depictive communication ideophones • vivid depiction
• speaker’s stance, emotional involvement, textual importance
distal demonstrative distal demonstrative • gesture-directed attention shift
• urgency of attention-shift
• speaker’s stance, emotional involvement
interjections interjections • attention shift and salient quality
• speaker’s stance, emotional involvement
other wh-questions
emotive suffixes
• high interest/rhetoric question
• emotional involvement
10 (6%)
Total: 173 (100%)

4.2 Scale reading

The first function of PI to be addressed here is the scale-related reading, found with a large group of grammatical morphemes, as well as a few lexical items and one morphosyntactic construction. This is the most heterogeneous and largest group in the data with 115 examples (67%). It includes morphemes with scale-related meaning, indicating either an augmentative, edge reading, or precision. The augmentative markers share the semantics of an extremity on a scale, usually related to quantity, degree, or duration. These are bound adnominal clitics or one of the multiple verb-modifying suffixes. The group is heterogeneous and includes mostly grammatical morphemes having such meaning as augmentation, distribution, duration, excessiveness, permanence, intensity, entirety, completeness, absence, paucity etc. Both expressions of the upper (every, many, very, really, always, fully) and lower (nobody, nothing, never, at all, barely) extremities are attested with PI. For precision, these are expressions of the likes of exactly, precisely.

The usage of PI with augmentative expressions is optional as (11) and (12) show: the two examples contrast the distributive suffix sʉ̀ produced in the same narrative by the same speaker, while describing the same incident. In (11) the suffix exhibits PI as the pitch shifts above 500 Hz (compared to 260 Hz of the neighbouring morphemes), and the vowel is more than twice longer relative to a regular short vowel. In (12) the same suffix is pronounced as a regular low short syllable.

    1. (11)
    1. mĩ́ː=te
    2. person\ERG=CNTR
    1. kà-ː-má-sʉ̋ːː-nʉ́
    2. 1-INV-leave-DISTR-N.FUT
    1. ‘The people left me all behind.’ (f) (anm_20151123_Solhring_PO_Mithun 1’41’’) (Figure 10) (audio: Example 11)
Figure 10
Figure 10

Example (11), PI on distributive -sʉ̀.

    1. (12)
    1. i-pál-i-pál-hín-to=te
    2. NMLZ-strong-NMLZ-strong-ABS=CNTR
    1. jáːm-sʉ̀-tɕà-je
    2. scatter-DISTR-PERF-3PL
    1. ‘The strong people have all run away.’ (f)
    2. (anm_20151123_Solhring_PO_Mithun 1’11’’) (Figure 11) (audio: Example 12)
Figure 11
Figure 11

Example (12), no PI on distributive -sʉ̀.

Using PI with augmentatives, the speaker indicates that the edge property of the reported state of affairs violates the regular expectations according to which extremities and totality are unusual. It invites the interlocutor to notice the deviance of the situation. Depending on the gradual values of pitch height and duration (addressed below), this effect can trigger additional inferences regarding the speaker’s motivation to single out the described situation/event as remarkable. For instance, the high-pitched and longer PI trigger context-specific argumentative effects such as warning in (10) and empathy-evoking reading in (11).

In addition to adnominal clitics and verbal modifiers, PI occurs in the augmentative morphosyntactic construction [kʰaŋ-VERBAL STEM FINITE VERB]. It is the verbal stem after kʰaŋ where PI takes place, as (13) illustrates.

    1. (13)
    1. pəsʉ̃́ː=te
    2. rat\ERG=CNTR
    1. va-bʉ́-hín-to
    2. 3-rice-PL-ABS
    1. kʰaŋ-dʉ̋ːː
    2. AUG-gnaw
    1. va-dʉ̀-máŋ
    2. 3-gnaw-PROG
    1. turende
    2. however
    1. ‘Rats are destroying their rice a lot, but…’ (f) (anm_20160924_Th_gr_1 2’59’’) (Figure 12) (audio: Example 13)
Figure 12
Figure 12

Example (13), PI on a verb in a kʰaŋ-construction.

Scale reading of PI is the only group of examples where the contour is also attested with lexical items. These are mostly quantifiers or intensifiers indicating either abundance such as hujara, ajáːm, orʰaŋ (‘plenty, many, a lot’), or paucity: tɕariri ‘very few’, hanreːl ‘barely’, as well as numerals. Notably, PI pronunciation tends to conventionalise with the most frequent of these lexemes. In particular, it is regularly found with táŋná ‘very, really’ (10 tokens with PI vs. 1 without), dáːráː ‘nothing’ (5 vs. 1, cf. also ‘nobody’ in (9)) and is very typical of the formulaic expression arã́ːː.tũ ‘once upon a time’ where the vowel length is iconic of the antiquity of the story (‘a ve-e-ery long time ago’).

An additional scale-oriented meaning found with PI is that of precision and a related but not scale-based meaning of identity and sameness. In a way parallel to the augmentative reading above, the usage of PI in this context indicates that while general resemblance and approximate values are the norm, precision and identical sameness are exceptional and the indication of their exceptionality calls for non-trivial contextual effects. In (14) PI is found with the adnumeral marker tɕá ‘exactly’. The precision in this case evokes the coincidence of the reported event with the completion of the 10-year cycle of shifting cultivation, which has a prominent role in the community.

    1. (14)
    1. kùm
    2. year
    1. sòːm-tɕa̋-he
    2. ten-EXACT-1PROX
    1. tʉ̀-nà-ká
    2. accomplish-FUT-AFF
    1. ‘It will be exactly 10 years (since they cultivated that plot).’ (f) (anm_20160924_Th_gr_1 1’28’’) (Figure 13) (audio: Example 14)
Figure 13
Figure 13

Example (14), PI on a adnumeral precision marker tɕá.

Related to the notion of precision is the particle ráː, often reduplicated to ráráː. It is used for the effect comparable to that of “emphatic identification” (König & Gast 2006), singling out the relevant referent, often against other expectations. As such, it is used in the contexts typically conceived of as narrow focus, such as contrast or reference to the least expected candidate, as in (15).

    1. (15)
    1. ama.tũ
    2. so
    1. ní
    2. 1
    1. ka-lèːn=rára̋
    2. 1-upon=IDENT.RDP
    1. ata.tṹː
    2. like.that
    1. i-vàŋ-tʰúː-lèːla-vá-níŋ
    2. NMLZ-go-become-IDEO.openly-COP-1
    1. ‘So [even?] I myself became like that.’ (f) (anm_20160924_Th_gr_1 08’19’’) (Figure 14) (audio: Example 15)
Figure 14
Figure 14

Example (15), identificational suffix ráː (reduplicated).

In summary, PI is used with scale-related reading of edge or precision, as well as with the related meaning of identity and sameness, indicating that this state of affairs violates the regular norms and expectations. The indication of this fact has non-trivial contextually relevant consequences. As is further discussed in Section 4.7, the precise effect depends on the pitch and length values of PI. This is the only group of meaning where in addition to bound grammatical markers, PI occurs also on a limited set of lexical items and in a morphosyntactic construction.

4.3 Depictive communication

Depiction is regarded in the literature as a distinct way of representation and in particular as different from the arbitrary mode of description (Clark & Gerrig 1990). In the depicting mode, the meaning is conveyed by a “representation of an image” and as a result, unlike discrete arbitrary signs of the descriptive mode, the forms are gradual and indexical/iconic, as the listener is requested to understand through imagining. Although mimetic or iconic representation constitutes the most immediate example of depiction, cross-linguistic findings often require more sophisticated models of non-arbitrary meaning construal (Emmorey 2014).

There are three salient classes of morphemes in Anal Naga that can be regarded as having depictive properties: intersubjective verbal modifiers, ideophonic verbal modifiers and free standing ideophones (cf. Peterson 2013 for a detailed description of a parallel system in a related language). From this group, ideophonic verbal modifiers are the most frequent to occur with PI. These are reduplicated morphemes with vowel alternation that convey fine, mimicking aspects of the event/activity, such as rʰeŋrʰoŋ ‘in an arrogant manner’, jiːljul ‘in a sly and cunning way’ and many others. PI on ideophonic verbal modifiers illustrated in (16) account for 9% (16 tokens) of PI in the data.

    1. (16)
    1. náŋ=te
    2. 2=CNTR
    1. i-tɕə́ː-jűːːmjum
    1. vá-tí-nʉ̀
    2. COP-2-N.FUT
    1. ‘You used to go all around restlessly.’ (f)(anm_20160924_Th_gr_1 2’ 18’’) (Figure 15) (audio: Example 16)
Figure 15
Figure 15

Example (16), PI with an ideophonic verbal suffix.

Although PI is often mentioned cross-linguistically in the context of ideophones (e.g. Dingemanse & Akita 2017), the vast majority of ideophones in the examined corpus are not characterised by PI (cf. leːla ‘without being shy, but in a socially acceptable way’ with no PI in the lower part of Figure 14).

With depicting morphemes PI triggers the interpretation of “imaginative apprehension” (Nuckolls 1996: 96): it requests the listener to engage in a direct imagination of the event depicted by the ideophone, resulting in the effect of involved narration. In a few cases in the data, depicting morphemes with PI are accompanied by a neatly aligned mimetic gesture (Dingemanse & Akita 2017). This active request for an imaginative involvement is to be contrasted with the mere addition of depicting characteristics of an event by a prosodically unremarkable ideophonic modifier, as seen in the lower part of Figure 14 above.

4.4 Distal locative ‘yonder’

A special case of PI is the distal locative le ‘over there, away in the distance, yonder’. In this case both the PI and an accompanying gesture in the pointed direction are nearly conventionalised components of the multi-modal marker. Both PI and the gesture direct the interlocutor to shift their attention to a new referent or area outside of the currently shared speech perimeter. There are 9 tokens of this demonstrative in the examined corpus, all of which are characterised by PI (5% of PI) and a pointing gesture. An examination of 30 examples in a broader corpus of spontaneous speech reveals 7 examples (23%) that have no accompanying gesture, and 10 examples (33%) that are accented but exhibit no PI. Remarkably, the pointing gesture often accompanies le even if the direction is fictional, invisible or irrelevant as in (17) taken from the climax of a story. The speaker of (17) initially faces the listener (Figure 16 left) who sits next to the camera. On the onset of le̋ she shifts the gaze to the imaginary location of the narrated event to the right and above of her and extends her hand in a pointing gesture in that direction (right). She re-establishes the eye contact with the listener ca. 1,300 ms later, but keeps the hand pointed in the imaginary location until the end of the sentence (bottom). The pitch excursion in this case is extreme.

Figure 16
Figure 16

Example (17), pointing gesture and PI accompanying le ‘over there’.

    1. (17)
    1. va-tɕálnʉ́-hín-to(pic.1)
    2. 3-sister-PL-ABS
    1. le̋(pic.2)
    2. DIST
    1. ràːl-kʰéː
    2. opposite.side-one
    1. va-hè-sʉ́ː-náː=te(pic.3)
    2. 3-HORZ-look-when=CNTR
    1. ‘And as they saw their sister over there on the opposite hill…’
    2. (Context: The protagonist is running away from a tiger. Her brothers are looking for her.) (f) (anm_20160220_Thum_PO_1 10’28’’) (Figure 16) (audio: Example 17)

4.5 Interjections

Similarly to the demonstrative le, interjections index an external referent, shifting the interlocutors’ attention to it. They also convey a salient property of this referent and/or the speaker’s attitude towards it (Kockelman 2010: Ch. 6). Thus, they are almost inherently marked by PI (23 with vs. 2 without PI in the data) due both to their attention-shifting nature (similarly to le̋) and to the request of evaluating a salient feature, as with quantifiers. The two most common interjections in Anal Naga are aro and as. The former, seen in (18), can be preliminarily characterised as indexing an extensive degree/number.

    1. (18)
    1. atùŋ=te
    2. now=CNTR
    1. arőːː
    2. INTJ
    1. dáːtuŋ
    2. how
    1. nuŋé-tɕá-dóː-vá-nîŋǃ
    2. nice-POL-AWAY-COP-1
    1. ‘But now, ohhh how nicely I live, eh?’ (f)(anm_20160924_Th_gr_1 08’00’’) (Context: the speaker contrasts her hardships in the past with the current life). (Figure 17) (audio: Example 18)
Figure 17
Figure 17

Example (18), PI with interjection.

4.6 Other functions

There are three more classes of morphemes attested with PI, yet the insufficient number of examples (10 in total, 6%) in the data requires leaving their analysis for further research.

  1. PI occurs on content (“wh-”) question words. In this case it either indicates an extreme interest of the speaker in information or produces a rhetorical exclamatory reading.

  2. PI is found with the stance-expressing verbal modifier tɕənáː ‘to the speaker’s pity’.

  3. PI is found with the agentive suffix -pá, with yet unclear effects.

4.7 Discussion: form and function

Although the function of PI can impressionistically be described as emotive or emphatic, a closer look reveals systematically distinct interpretations based on the class of the morpheme where it occurs. The mere usage of PI in its minimal form with scale-related items invites the listener to infer the deviance of the reported situation from the expected norm. With ideophonic verbal modifiers it triggers an imaginative interpretation of the feature expressed by the item. With distant locative it seeks an attention shift towards a referent outside of the current settings of interaction, while with interjections the referring effect is combined with a request to evaluate a salient feature associated with the indexed referent.

These functions can be generalised along the lines proposed for RS above: the violation of the expected prosodic trajectory neatly aligned with a single constituent, the ensuing prosodic non-integration of this constituent and the extra-effort involved in its production naturally draw attention towards this item. This array of factors signals the requirement to invest a corresponding effort in its interpretation: shifting attention in a new direction, evaluating a salient and abnormal property, or actively engaging in imagining the reported event. Facing the prosodic form, into the production of which the speaker has evidently invested additional effort, the interlocutor is guided to trace the motivation for this deviance in the semantics of the item. Hence, the non-arbitrary, indexical nature of the form is crucial for its function. An additional compelling outcome of this analysis is the direct relationship between non-arbitrariness and the alignment of the interlocutors’ attention states: PI constitutes an “engagement” marker (Evans et al. 2018) by its very nature.

This analysis considers the PI in its minimal form. This marking is non-arbitrary, but it is discrete, and the form has a systematic functional correspondence. An additional account is required for the gradual nature of the pitch and the length of the rhyme. Figure 18 shows the pitch (X-axis) and the rhyme length (Y-axis) of PI separately for four female speakers.

Figure 18
Figure 18

Pitch (Y, Hz) and length (X, ms) of PI for 4 female speakers.

It is possible to identify some general trends shared at least partially between the speakers. However, the overall patterns are largely idiosyncratic. All figures show that, certain tendencies notwithstanding, the length and the pitch height are separately functioning factors.

The gradual pitch height added to the minimal PI produces interpretive effects in addition to the basic function of PI. These can superficially be described as “intensifying”, yet again a closer look at minimal sets of comparable examples reveals more specific interpretations associated with the gradual pitch. The examination of the highest outliers in the narratives with regards to the pitch value reveals they are not related to the semantics of the described situation, but to the properties of the context associated with the speaker’s effort investment. Ultra-high pitch or falsetto voice are found in culminative moments, reflecting speakers’ involvement (Nuckolls 1996: 80). These contexts are characterised by extensive depiction, gesticulation and direct speech.

As for the rhyme length, Figure 18 shows numerous examples where PI-syllables have a similar pitch height but vary in the length of the rhyme. The examination of the extremities and of their semantic properties reveals correspondences between the upper outliers and an iconic function: by lengthening the rhyme of the syllable, the speaker superimposes an iconic visualization, a depiction of a salient quality on top of its lexical description. A notable class of examples are the augmentative suffixes expressing a duration of an event. As opposed to the average 340 ms (σ = 175) for the examined 161 examples, the average rhyme length value for the 17 examples of augmentative modifiers expressing event-duration is 425 ms (σ = 115).

To summarise, the form of PI encompasses three aspects of form, which produce corresponding interpretive effects:

  1. The basic form of PI is a discrete salient rise-fall with accent-like properties. The marking of an item by this abrupt pitch upshift and the resulting attention-shift in this regard produce a range of interpretive effects, subject to the semantics of the constituent.

  2. The pitch height is a gradual marking that indexes the degree of the speaker’s effort investment, triggering additional interpretive effects, such as the speaker’s involvement in the message, dramatization or the immediacy of the attention-shift.

  3. The rhyme length is a gradual iconic sign producing a visualisation of a salient property.

However, similarly to RS, it would be an over-generalisation to regard the function of the basic PI contour merely as a contextually interpreted effort-indexing. This broad view would fail to explain the specific and consistent interpretations that it obtains with each group of markers. Moreover, it would fail to predict its distribution, limited to a set of primarily grammatical markers. Along with the proposed analysis for RS, PI can be regarded as a symbolic vocal gesture, which combines indexical aspects in (a) and (b) with iconic depiction in (c). This becomes particularly evident as we consider a few cross-linguistic parallels of indexical intensification and iconic lengthening.

4.8 Cross-linguistic parallels: non-arbitrary prosody as language-specific signs

High pitch and lengthening (both usually regarded as “para-linguistic”) are cross-linguistically well-known devices that produce an effect roughly dubbed “intensification”. Remarkably, typologically and genetically diverse languages developed means of prosodic detachment achieved by the speaker’s effort investment across comparable contexts. The typological parallels of the marking and its interpretations reflect the commonalities of interactional and pragmatic motivations for the usage of indexing and iconic signs, as well as the universalities of the physiological aspects of their production and of the human experience associated with them (Stross 2013). Nonetheless, the details of the form in each case, its distribution and frequency, and a close analysis of the meaning reveal that, similarities notwithstanding, the apparently common phenomenon of prosodic intensification constitutes a collection of conventionalised language-specific phenomena. In each case, the actual phonetic properties of the marker are not an ad hoc phonetic exaggeration but a conventionalised language-specific marking. And, although their meaning can typically be described as “emphatic”, “highlighting” or “intensifying”, the precise interpretive effects hidden behind these cover terms are different for each language. Finally, the distribution rules in each case similarly differ. In fact, even closely related languages such as English and Dutch demonstrate formal and functional differences and distinct conventionalised interpretations of non-arbitrary intonation patterns (Chen et al. 2004). Consequently, the salient cross-linguistic similarities suggest general tendencies for the usage of non-arbitrarily salient prosody, while the dissimilarities demonstrate the different conventionalisation paths for each language-specific device.

For example, Ogden (2012) analysed the phenomenon of Intensifying Emphasis (IE) in English, where a special prosodic marking highlights aspects of meaning in such examples as HːOːːRrible (‘really horrible’), TRːAːːFfic (‘a very terrible traffic’), PhːAːːrty (‘a very good party’). Upon superficial examination, it shows parallels with PI in Anal Naga, yet the differences between the two language-specific devices are in fact substantial. The form of IE in English is characterised by a very long closure or a long VOT for plosive onset consonants. This modification of the onset consonant is not attested with PI in Anal Naga (although aspirated consonants are found in both languages). The central role of an extreme pitch excursion of PI is not found in English. As for the meaning, the distribution of English IE superficially appears to resemble the Anal Naga counterpart: “IE modifies the meaning of the words it occurs on in a more extreme direction: (1) numbers, quantifiers and extreme case formulations; (2) in the context of an upgraded assessment; (3) the intensification of nouns” (Ogden 2012: 55). There are obvious parallels between this description and many Anal Naga examples above: the occurrence with an edge of scale, numerals, quantifiers, intensifiers and the counter-expectational effect. However, in English the prosody takes place on any lexical item, while in Anal Naga such a usage is impossible, as the marker is restricted to grammatical constituents and quantifiers. In English, IE invites the listener to infer some locally derived contextual aspects of the meaning, as well as the speaker’s stance in this regard. In Anal Naga, the contextual effects of the basic PI form are regular with specific classes of items. Such functions as attention-shift with distal locatives appear to be absent from English, but nearly conventionalised in Anal Naga. The nearly conventionalised PI pronunciation attested for ‘nothing’, ‘nobody’ and other lexemes in Anal Naga is also not common in English.

Moreover, English IE and Anal PI differ in the interactional aspects of their functions. Ogden finds that in English, IE is used to actively seek an explicit stance-alignment, so that the speakers elaborate on the issue, presenting evidence for their claims and reasoning for their position until they achieve this goal. As a result, stance-neutral acknowledgements such as mhm constitute unusual responses in English; instead “[t]ypical – normal – responses include wow, clicks, or mild expletives” (Ogden 2012: 57). This differs radically from the case in Anal Naga: PI is usually produced with no listener’s response at all. The PI in Anal Naga represents a more routinised interpretation-guiding and attention-drawing device, and is not a means for stance-expression.

In Australian languages an extreme lengthening of a lexical constituent is used to convey long duration or a magnitude of distance or size (see Simard 2013 for Jaminjung and the references therein). Again, the set of the affected meanings and their precise effects overlap only partially with PI in Anal Naga, while the distribution and location rules are entirely different. In Ambel, an Austronesian language of West Papua, an extremely elongated vowel with a high falling pitch marks climax in narratives, while an elongated vowel eːː with a similar pitch contour is a marker of excessive quantity or distance (Arnold 2018: 619–621). Further comparison to additional descriptions of language-specific non-arbitrary prosodic markers and the “intensifying” interpretive effects they trigger (e.g. Yakpo 2018: 55–57; Perez & Zipp 2019) presents a similar picture.

Despite evident parallels of the functions of indexing-iconic prosody, the form and the specific functions in each language are conventionalised and exhibit only partial similarities. There is a typologically striking tendency for a recurrent set of meanings to be marked by prosodic devices with a non-arbitrary component crucial for their functionality. However, to view them as locally derived effects of para-linguistic communication would largely overgeneralise the functions, position rules and distribution for each language-specific marker. Each form can be regarded as a language-specific grammatical marker, as it contributes conventionalised ancillary meaning (Boye & Harder 2012).

5 Conclusion

This study analysed two related non-arbitrary prosodic contours in Anal Naga. One contour (RS) is an edge-device, occurring in Transition-Relevance Places. It has a minimally deviant prosodic contour and serves as an interaction-managing, phatic marker that appeals to the interlocutor, requesting their cooperative response. The second contour (PI) is a content-managing device that occurs on a specific set of content morphemes, which can be classified into a few identifiable types. The meaningful contribution of this contour depends on the type of the morpheme, varying between a deviance in the state of affairs (with quantifiers and augmentatives), vivid depiction (with ideophones), attention-shift towards a new referent (with a distal demonstrative le̋) and a reference-shift combined with an indexing of a salient feature (with interjections). It is unacceptable with lexical morphemes and with items that do not belong to one of the types above. This basic form of PI can be augmented by a deviant pitch height, indexing additional effort-investment conveying involvement, urgency or high drama, and by an extreme lengthening of the rhyme, which is iconic of a salient feature. Thus, the basic non-arbitrary features of the conventionalised sign are additionally exploited for indexical and iconic augmentation with a more context-dependent and less conventionalised contribution. Hence, the marking has a defined non-arbitrary form and a systematic contribution, while its augmentation is non-discrete and context dependent.

Both RS and PI closely correspond to the concept of effort code: speakers produce an unusual, physiologically demanding contour in order to signal either a special informational significance of their message or their affected attitude towards it (Gussenhoven 2004: 85–89; Chen et al. 2004). As a non-arbitrary marking, it would traditionally be classified as a para-linguistic phenomenon. Indeed, non-arbitrariness is an indispensable property of the contours and crucial for their synchronic function. Nonetheless, the paralinguistic view of the markers as a device used with the broad goal of drawing attention whenever it is required would miss their conventionalised, language-specific form, distribution rules and their defined sets of functions. Conveying a conventional ancillary meaning, these forms can be analysed as part of the language grammar.

Rather than approaching their status from the binary linguistic vs. paralinguistic perspective, they can be described as highly conventionalised non-arbitrary signs in the multimodal process of communication (Okrent 2002; Enfield 2009). The basic form shared by RS and PI is a symbolic indexical, which similarly to a conventional pointing gesture relies both on its conventional properties and contextual resolution. The augmented form of PI is an additional conventional iconic-indexical device. As such, these are conventionalised markers that nonetheless achieve their function due to a natural form–function relation, parallel to other attested phenomena in sign- and spoken languages (Cormier et al. 2013; Floyd 2016). The availability of robust parallels of these phenomena additionally demonstrates the cross-linguistic existence of signs that derive their synchronic function from their non-arbitrary form (Dingemanse et al. 2015).

An additional interesting outcome of this study is the broad array of effects achieved by the immediate manipulation of the interlocutors’ attention. Deviant prosody is a natural engagement-marker (in the sense of Evans et al. 2018), that produces an alignment of the interlocutors’ attention-state. There is a close relationship between the alignment of attention-states on the one hand (PI) and an appeal to the interlocutor on the other (RS). The rudimentary means of indexical prosody, which appeal to the addressee by naturally drawing their attention, unify apparently disparate areas of linguistic inquiry. Engaging the addressee is central to interaction-management, as it is used for establishing the communicative channel and negotiating turn-taking (Section 3.1). Mobilising response (Stivers & Rossano 2010), either by requesting information or in directive acts (offers, suggestions) is produced by the same means of a naturally attention-drawing appeal. The same basic principle of natural attention-drawing is used also for the task typically regarded as representing a very different linguistic phenomenon: indicating a counter-expected status of information, such as an edge-degree or identification. These properties are typically associated with focality. Focus is indeed traditionally linked to attention-drawing (Erteschik-Shir 2007: 38) and therefore a close relationship between appeals and counter-expected information is possible. Yet, these links are not frequent in the literature (cf. Matić 2015 for the grammaticalisation of tag-questions into focus-like markers). Finally, the same basic form is employed for discourse-structuring purposes, negotiating the discourse status of information, such as indicating culminations and producing depicting effects.

Hence, study of non-arbitrary signs reveals how a natural manipulation of interlocutors’ attention constitutes a clue factor in diverse linguistic phenomena. It links the study of interaction-management, information-structuring, and discourse-organisation, and thus has broad cross-discipline implications for the research of human communication.

Additional files

The additional files for this article can be found as follows:


Example 1. DOI: https://doi.org/10.5334/gjgl.967.s1


Example 2. DOI: https://doi.org/10.5334/gjgl.967.s2


Example 3. DOI: https://doi.org/10.5334/gjgl.967.s3


Example 4a. DOI: https://doi.org/10.5334/gjgl.967.s4


Example 4b. DOI: https://doi.org/10.5334/gjgl.967.s5


Example 5. DOI: https://doi.org/10.5334/gjgl.967.s6


Example 6. DOI: https://doi.org/10.5334/gjgl.967.s7


Examples 7–8. DOI: https://doi.org/10.5334/gjgl.967.s8


Example 9. DOI: https://doi.org/10.5334/gjgl.967.s9


Example 10. DOI: https://doi.org/10.5334/gjgl.967.s10


Example 11. DOI: https://doi.org/10.5334/gjgl.967.s11


Example 12. DOI: https://doi.org/10.5334/gjgl.967.s12


Example 13. DOI: https://doi.org/10.5334/gjgl.967.s13


Example 14. DOI: https://doi.org/10.5334/gjgl.967.s14


Example 15. DOI: https://doi.org/10.5334/gjgl.967.s15


Example 16. DOI: https://doi.org/10.5334/gjgl.967.s16


Example 17. DOI: https://doi.org/10.5334/gjgl.967.s17


Example 18. DOI: https://doi.org/10.5334/gjgl.967.s18


ACKNW = acknowledgement-seeking, AFF = affirmative, BCKGR = backgrounding particle, CNTR = contrastive, DIST = distal demonstrative, DISTR = distributive, HORZ = horizontally, IDENT = identification marker, IDEO = ideophone, INV = inverse, NF = non-final, NFUT = non-future, POL = politeness marker, 1PROX = 1-person proximate (close to speaker), RDP = reduplication, SEQ = sequentialiser.


I am deeply grateful to the many members of the Anal Naga community who contributed their knowledge and time to this study. I thank the community leaders and organisations ANTA and ALS for facilitating my research on the Anal Naga language. This study was initially conducted during a fellowship at the Martin Buber Society of Fellows, and I am thankful for all the support I received there. I thank six anonymous reviewers for their helpful comments on previous drafts of this paper.

Funding information

Data collection and field research for this study were made possible thanks to generous funding from the Firebird Foundation for Anthropological Research (USA) and from ELDP (grant SG0428).

Competing interests

The author has no competing interests to declare.


Alderete, John & Alexei Kochetov. 2017. Integrating sound symbolism with core grammar: The case of expressive palatalization. Language 93(4). 731–766. DOI:  http://doi.org/10.1353/lan.2017.0056

Arnold, Laura. 2018. Grammar of Ambel, an Austronesian language of Raja Ampat, West New Guinea. PhD Thesis, Edinburgh: The University of Edinburgh.

Beckman, Mary E. & Janet B. Pierrehumbert. 1986. Intonational structure in Japanese and English. Phonology 3(1). 255–309. DOI:  http://doi.org/10.1017/S095267570000066X

Boersma, Paul & David Weenink. 2017. Praat (version 6.0.28). Amsterdam: Phonetic Sciences, University of Amsterdam. http://www.praat.org/.

Bolinger, Dwight. 1964. Around the edge of language: Intonation. Harvard Educational Review 34(2). 282–296. DOI:  http://doi.org/10.17763/haer.34.2.4474051q78442216

Bolinger, Dwight. 1989. Intonation and its uses: melody in grammar and discourse. Stanford: Stanford University Press.

Boye, Kasper & Peter Harder. 2012. A usage-based theory of grammatical status and grammaticalization. Language 88(1). 1–44. DOI:  http://doi.org/10.1353/lan.2012.0020

Chen, Aoju, Carlos Gussenhoven & Toni Rietveld. 2004. Language-specificity in the perception of paralinguistic intonational meaning. Language and Speech 47(4). 311–49. DOI:  http://doi.org/10.1177/00238309040470040101

Clark, Herbert H. 2016. “Depicting as a method of communication.” Psychological Review 123(3). 324–347. DOI:  http://doi.org/10.1037/rev0000026

Clark, Herbert H. & Richard J. Gerrig. 1990. Quotations as demonstrations. Language 66(4). 764–805. DOI:  http://doi.org/10.2307/414729

Cormier, Kearsy, Adam Schembri & Bencie Woll. 2013. Pronouns and pointing in sign languages. Lingua 137. 230–247. DOI:  http://doi.org/10.1016/j.lingua.2013.09.010

Couper-Kuhlen, Elizabeth. 2001. Intonation and discourse: Current views from within. In Deborah Schiffrin, Deborah Tannen & Heidi E. Hamilton (eds.), The handbook of discourse analysis, 13–34. Malden, MA/Oxford: Blackwell Publishers. DOI:  http://doi.org/10.1111/b.9780631205968.2003.00002.x

Couper-Kuhlen, Elizabeth. 2012. Some truths and untruths about final intonation in conversational questions. In Jan P. De Ruiter (ed.), Questions: formal, functional and interactional perspectives, 123–145. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9781139045414.009

Couper-Kuhlen, Elizabeth & Margret Selting. 2018. Interactional Linguistics. Studying language in social interaction. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/9781139507318

Daudey, Henriette. 2014. A grammar of Wadu Pumi. PhD Thesis, Bundoora: La Trobe University.

Dingemanse, Mark. 2018. Redrawing the margins of language: Lessons from research on ideophones. Glossa: A Journal of General Linguistics 3(1). 1–30. DOI:  http://doi.org/10.5334/gjgl.444

Dingemanse, Mark, Damián E. Blasi, Gary Lupyan, Morten H. Christiansen & Padraic Monaghan. 2015. Arbitrariness, iconicity, and systematicity in language. Trends in Cognitive Sciences 19(10). 603–615. DOI:  http://doi.org/10.1016/j.tics.2015.07.013

Dingemanse, Mark & Kimi Akita. 2017. An inverse relation between expressiveness and grammatical integration: On the morphosyntactic typology of ideophones, with special reference to Japanese. Journal of Linguistics 53(3). 501–532. DOI:  http://doi.org/10.1017/S002222671600030X

Elvira-García, Wendy. 2017. Create Pictures with Tiers v.4.4. Praat Script. http://stel.ub.edu/labfon/en/praat-scripts.

Emmorey, Karen. 2014. Iconicity as structure mapping. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1651). 20130301. DOI:  http://doi.org/10.1098/rstb.2013.0301

Enfield, N. J. 2009. The anatomy of meaning: Speech, gesture, and composite utterances. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511576737

Erteschik-Shir, Nomi. 2007. Information Structure: The syntax-discourse interface. Oxford: Oxford University Press.

Evans, Nicholas, Henrik Bergqvist & Lila San Roque. 2018. The grammar of engagement I: Framework and initial exemplification. Language and Cognition 10(1). 110–140. DOI:  http://doi.org/10.1017/langcog.2017.21

Ferrara, Lindsay & Gabrielle Hodge. 2018. Language as description, indication, and depiction. Frontiers in Psychology 9. 716. DOI:  http://doi.org/10.3389/fpsyg.2018.00716

Floyd, Simeon. 2016. Modally hybrid grammar? Celestial pointing for time-of-day reference in Nheengatú. Language 92(1). 31–64. DOI:  http://doi.org/10.1353/lan.2016.0013

Goldin-Meadow, Susan & Diane Brentari. 2017. Gesture, sign, and language: The coming of age of sign language and gesture studies. Behavioral and Brain Sciences 40(E46). DOI:  http://doi.org/10.1017/S0140525X15001247

Gumperz, John J. 1992. Contextualization revisited. In Peter Auer & Aldo Di Luzio (eds.), The contextualization of language, 39–53. Pragmatics & Beyond New Series 22. Amsterdam/Philadelphia: John Benjamins. DOI:  http://doi.org/10.1075/pbns.22.04gum

Gussenhoven, Carlos. 2004. The phonology of tone and intonation. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511616983

Hoey, Elliott M. 2014. Sighing in interaction: Somatic, semiotic, and social. Research on Language and Social Interaction 47(2). 175–200. DOI:  http://doi.org/10.1080/08351813.2014.900229

Kockelman, Paul. 2010. Language, culture, and mind: Natural constructions and social kinds. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511711893

König, Ekkehard & Volker Gast. 2006. Focused assertion of identity: A typology of intensifiers. Linguistic Typology 10(2). 223–276. DOI:  http://doi.org/10.1515/LINGTY.2006.008

Ladd, D. Robert. 2008. Intonational phonology. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511808814

Levinson, Stephen C. 2003. Contextualizing ‘contextualization cues.’ In Susan L. Eerdmans, Carlo L. Prevignano & Paul J. Thibault (eds.), Language and interaction: Discussions with John J. Gumperz, 31–39. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/z.117.04lev

Matić, Dejan. 2015. Tag questions and focus markers: Evidence from the Tompo dialect of Even. In M. M. Jocelyne Fernandez-Vest & Robert D. van Valin Jr. (eds.), Information Structuring of spoken language from a cross-linguistic perspective, 167–190. Berlin, Boston: De Gruyter Mouton. DOI:  http://doi.org/10.1515/9783110368758

Matisoff, James A. 1994. Tone, intonation, and sound symbolism in Lahu: Loading the syllable canon. In John Ohala, Leanne Hinton & Johanna Nichols (eds.), Sound symbolism, 115–129. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511751806.009

Michaud, Alexis. 2006. Tonal reassociation and rising tonal contours in Naxi. Linguistics of the Tibeto-Burman Area 29(1). 61–94.

Michaud, Alexis. 2017. Tone in Yongning Na: Lexical tones and morphotonology. Studies in Diversity Linguistics 13. Berlin: Language Science Press.

Morita, Emi. 2015. Japanese interactional particles as a resource for stance building. Journal of Pragmatics 83. 91–103. DOI:  http://doi.org/10.1016/j.pragma.2014.12.008

Niebuhr, Oliver. 2010. On the phonetics of intensifying emphasis in German. Phonetica 67(3). 170–198. DOI:  http://doi.org/10.1159/000321054

Noel Aziz Hanna, Patrizia & Barbara Sonnenhauser. 2013. Vocatives as functional performance structures. In Barbara Sonnenhauser & Patrizia Noel Aziz Hanna (eds.) Vocative!, Addressing between system and performance, 283–303. Berlin, Boston: De Gruyter Mouton. DOI:  http://doi.org/10.1515/9783110304176

Nuckolls, Janis B. 1996. Sounds like life: Sound-symbolic grammar, performance, and cognition in Pastaza Quechua. Oxford Studies in Anthropological Linguistics 2. New York: Oxford University Press.

Ogden, Richard. 2012. Making sense of outliers. Phonetica 69(1–2). 48–67. DOI:  http://doi.org/10.1159/000343197

Okrent, Arika. 2002. A modality-free notion of gesture and how it can help us with the morpheme vs. gesture question in sign language linguistics (or at least give us some criteria to work with). In Richard P. Meier, Kearsy Cormier & David Quinto-Pozos (eds.), Modality and structure in signed and spoken languages, 175–198. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511486777.009

Ozerov, Pavel. 2018. Tone assignment and grammatical tone in Anal (Tibeto-Burman). Studies in Language 42(3). 708–733. DOI:  http://doi.org/10.1075/sl.17030.oze

Ozerov, Pavel. 2019. This is not an interrogative: The prosody of ‘wh-questions’ in Hebrew and the sources of their questioning and rhetorical interpretations. Language Sciences 72. 13–35. DOI:  http://doi.org/10.1016/j.langsci.2018.12.004

Perez, Danae & Lena Zipp. 2019. On the relevance of voice quality in contact varieties. Language Ecology 3(1). 3–27. DOI:  http://doi.org/10.1075/le.18010.per

Perniss, Pamela, Robin L. Thompson & Gabriella Vigliocco. 2010. Iconicity as a general property of language: Evidence from spoken and signed languages. Frontiers in Psychology 1. 227. DOI:  http://doi.org/10.3389/fpsyg.2010.00227

Peterson, David A. 2013. Aesthetic aspects of Khumi grammar. In Jeffrey P. Williams (ed.), The aesthetics of grammar: Sound and meaning in the languages of Mainland Southeast Asia, 219–236. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9781139030489.019

Schembri, Adam, Kearsy Cormier & Jordan Fenlon. 2018. Indicating verbs as typologically unique constructions: Reconsidering verb ‘agreement’ in sign languages. Glossa: A Journal of General Linguistics 3(1). 89. DOI:  http://doi.org/10.5334/gjgl.468

Sicoli, Mark A. 2010. Shifting voices with participant roles: Voice qualities and speech registers in Mesoamerica. Language in Society 39. 521–553. DOI:  http://doi.org/10.1017/S0047404510000436

Simard, Candide. 2013. Prosody and function of ‘iconic engthening’ in Jaminjung. SOAS Working Papers in Linguistics 16. 65–77.

Smith, Anja. 2010. Phatic expressions in French and German telephone conversations. In Sanna-Kaisa Tanskanen, Marja-Liisa Helasvuo, Marjut Johansson & Mia Raitaniemi (eds.), Discourses in interaction, 291–311. Amsterdam/Philadelphia: John Benjamins.

Stivers, Tanya & Federico Rossano. 2010. Mobilizing Response. Research on Language and Social Interaction 43(1). 3–31. DOI:  http://doi.org/10.1080/08351810903471258

Stross, Brian. 2013. Falsetto voice and observational logic: motivated meanings. Language in Society 42(2). 139–162. DOI:  http://doi.org/10.1017/S004740451300002X

Truckenbrodt, Hubert. 2012. Semantics of Intonation. In Claudia Maienborn, Klaus von Heusinger & Paul Portner (eds.), Semantics: An international handbook of natural language meaning 3. 2039–2069. Berlin: De Gruyter Mouton.

Verhagen, Arie. 2008. Intersubjectivity and the architecture of the language system. In Jordan Zlatev, Timothy P. Racine, Chris Sinha & Isa Itkonen (eds.), The shared mind: Perspectives on intersubjectivity, 307–331. Amsterdam/Philadelphia: John Benjamins. DOI:  http://doi.org/10.1075/celcr.12.17ver

Yakpo, Kofi. 2018. A grammar of Pichi. Studies in Diversity Linguistics 23. Berlin: Language Science Press. http://langsci-press.org/catalog/book/85.