1 Introduction

Acquiring the phonetic and phonological aspects of a second language (L2) poses a variety of challenges to learners. Many of these challenges have been attributed to cross-linguistic differences between a speaker’s native language (L1) and the L2, and many aspects of L2 pronunciation have been attributed to transfer (Steele 2002; Flege & MacKay 2004; Mennen 2004; and many more). However, transfer is not the only driving force in L2 pronunciation: universal tendencies have been discovered as well (Flege 1987; Carlisle 1998; Munro & Derwing 2008). An interesting question in the field of L2 phonology is which developmental aspects are driven by transfer and which aspects are driven by universal principles. Much work on this question, and many influential models in the field, originally focused largely on the production or perception of L2 segments (e.g., Best 1994; 1995; Flege 1995; 2003; Escudero 2005; 2009). However, the importance of including prosodic phenomena in theories of L2 phonological acquisition has not gone unnoticed, and in recent years the work on L2 prosody has grown considerably (e.g. Mennen 2004; 2015; Hirsch & Wagner 2011; Zubizarreta & Nava 2011; Albin 2015; Calhoun & La Cruz & Olssen 2018; among others). There is as of yet no model that fully predicts all aspects of L2 prosodic development, due in part to the complexity of the structures under study: in addition to learning the phonetic and phonological form of a prosodic structure, learners must also understand its function. Additionally, prosody has a more hierarchical organization than segmental phonology, requiring acquisition at different levels. For example, in English, learners must acquire word stress, rhythm, and utterance-level prominence, which reflect prosody at the word level, phrase level and utterance level, respectively.

Studies on L2 prosody also differ from each other in various ways, making it difficult to compare across studies. Studies on the production of prosody generally focus on one domain of prosody, for instance word stress (e.g., Dupoux & Pallier & Sebastián Gallés & Mehler 1997; McGory 1997; Nguyen & Ingram & Pensalfini 2008), or prominence at the level of the phrase or utterance (Rasier & Hiligsmann 2009; Zubizarreta & Nava 2011; van Maastricht & Krahmer & Swerts 2016). A number of studies are based in the Autosegmental-Metrical (AM) framework. This approach places importance on the separation of form versus function. Because of this, studies have often concentrated on form (e.g. Mennen 2004; Albin 2015), or function (e.g. Zubizarreta & Nava 2011; van Maastricht et al. 2016), with a recognized need for studies examining both (e.g. Mennen 2015; van Maastricht et al. 2016). Finally, methods for measuring prosodic units have also varied, as measuring prosodic phenomena is not always straightforward; a variety of methodological approaches have been used across studies to provide evidence for the presence of nuclear accent and other prosodic prominence. This has further complicated the task of comparing results across studies. Because of this, there is still much to learn about L2 acquisition of prosody.

At the same time, various studies have provided evidence of learners transferring intonational patterns from the L1 into the L2 (Mennen 2004; 2015; Albin 2015; van Maastricht et al. 2016). The role of universal factors in the L2 acquisition of prosody is at present less established; frameworks for universal factors in this domain include Markedness Theory (Eckman 1977; Rasier & Hiligsmann 2009; van Maastricht & Krahmer & Swerts & Prieto 2020) and the competing algorithms hypothesis (Zubizarreta & Nava 2011), adapted from Yang’s (2002) hypothesis about first language acquisition, which envisions grammar as a set of competing algorithms or rules. According to Zubizarreta & Nava, an L2 rule with a direct competitor in a speaker’s L1 will be more difficult to acquire than an L2 rule that does not compete with an L1 rule, but is simply absent.

One aspect of English phrasal prosody that has been included in L2 studies investigating both transfer and universal factors is the nuclear accent (NA). The NA is generally defined as the main prominence of an utterance and has also been referred to as the Nuclear Stress (e.g. Chomsky & Halle 1968). In English, it is used to mark information structure and therefore has an important communicative function (Ladd 2008). Moreover, in English, NA placement exhibits a good deal of flexibility, as will be discussed in detail in Section 2.

The present study examines L2 acquisition of prosody from a functional point of view by looking at NA placement in English simple intransitive sentences by both L1 and L2 speakers. NA placement in broad focus intransitive sentences is highly variable (e.g. Schmerling 1976; Allerton & Cruttenden 1979; Ladd 2008), occurring on either the subject or verb depending on the context. Both structural factors, such as verb type, and pragmatic factors, such as topicality, predictability, or expectedness, have been invoked as predictors for this variation. What precise aspect of the context triggers NA placement on the subject versus the verb even in native speakers remains somewhat elusive still. Our study examines the effects of both verb type and expectedness on NA placement for both L1 English speakers and L1 Spanish L2 English learners. Spanish has less variability than English in NA placement, and by examining contexts where English and Spanish differ in NA placement, we are able to examine whether transfer occurs. By systematically varying both structural and pragmatic factors, we can furthermore examine whether learners are more sensitive to one factor than another in their acquisition of NA placement, which could point towards universal properties of L2 acquisition.

2 Background

A typological difference in prosody has been proposed between two types of languages: prosodically plastic and non-plastic languages (Vallduví 1991). Germanic languages, such as Dutch and English, are generally considered to be prosodically plastic languages, which use prosody to mark information status. In these languages, in broad focus utterances, that is, when all the information is new, the NA occurs on the final content word. However, NA placement can easily shift leftwards, often to mark a word in narrow focus. In contrast, prosodically non-plastic languages, such as Spanish and other Romance languages, are said to have more invariant NA placement. These languages use word order inversions instead of prosody to mark focus. In other words, they display more syntactic flexibility than a language like English, whereas English displays more prosodic flexibility.

The examples often cited in the literature are specifically about informational focus, defined as the new piece of information, e.g., in an answer to a wh-question (e.g. Zubizarreta & Vergnaud 2006). In English, the NA occurs utterance-finally under broad focus, but non-finally when the subject is in focus (e.g., Who broke the window? – A GIRL broke the window; capital letters are used to indicate NA placement). In Spanish, on the other hand, narrow focus on the subject can be expressed by postposing the subject after the verb.

The plastic versus non-plastic divide has largely been used to distinguish between the prosodic versus syntactic treatment of narrow focus (e.g. Vallduví 1991; van Maastricht et al. 2016). However, there are also broad focus contexts where languages differ in their reliance on prosodic versus syntactic variability; in this paper, our focus is on cross-linguistic differences of prosodic realization with broad focus intransitives. In English, two types of prosodic patterns are possible for simple intransitives (e.g. Schmerling 1976; Gussenhoven 1983; Selkirk 1995; Ladd 2008): the NA can fall on either the final element (the verb, as in (2)), or be placed non-finally on the subject, as seen in (1).

    1. (1)
    1. What happened? - An ACTRESS arrived.
    1. (2)
    1. What happened? - An actress BURPED.

In Spanish, on the other hand, broad focus intransitives are often marked by variation in SV versus VS word order, as in (3)–(4). NA placement remains invariant, occurring on the rightmost content word (Calhoun et al. 2018; Landblom 2020).

    1. (3)
    1. broad focus intransitive, VS order:
    1. ¿Qué pasó? -
    2. What happened? -
    1. Llegó
    2. arrived
    1. una
    2. an
    1. ACTRIZ.
    2. actress
    1. ‘What happened? - An actress arrived’
    1. (4)
    1. broad focus intransitive, SV order:
    1. ¿Qué pasó? -
    2. What happened? -
    1. Una
    2. An
    1. actriz
    2. actress
    1. ERUCTÓ.
    2. burped.
    1. ‘What happened? – An actress burped’

In the rest of this section, we discuss theoretical accounts of NA placement in English and Spanish intransitives, before moving on to prior findings about NA placement in L2 acquisition.

2.1 NA placement in English intransitives

While there are various reasons why the NA can occur on a non-final element in English (e.g. informational focus, contrastive focus, presence of an utterance-final indefinite pronoun; see Ladd 2008), this study investigates NA placement in broad focus intransitives.

Semantic/syntactic factors, such as verb type, have often been held responsible for at least part of the prosodic variability in English intransitives (e.g. Zubizarreta & Vergnaud 2006; Zubizarreta & Nava 2011; Irwin 2012). Some researchers have taken a syntactic approach in which NA placement is affected by the status of intransitive verbs as unaccusative versus unergative (e.g. Irwin 2012). Unaccusative verbs are analyzed as having an internal argument whereas unergative verbs have an external one (Levin & Hovav & Keyser 1995). Unaccusative verbs are non-agentive, such as verbs that denote appearance (e.g. appear, arrive) or change of state (e.g. melt, freeze). Unergative verbs, on the other hand, denote more agentive actions, such as walk or yell. While the specifics of the relationship between verb type, syntax and prosody may vary from model to model, it is generally assumed that NA assignment in English is closely intertwined with the syntax. Researchers have claimed that, due to the argument structure, sentences with unaccusative verbs trigger NA placement on the subject, whereas unergative verbs are said either to trigger NA placement on the verb (e.g. Kahnemuyipour 2009) or to exhibit variability, with the NA on the verb or the subject.

Others, however, have rejected the claim that NA placement can be based on verb type, even partially. For example, Bolinger (1954) proposed that NA assignment rests on the semantic or informational importance of words in an utterance, with words judged by the speaker to carry the most importance being more likely to receive the NA. Hirsch & Wagner (2011) found that once information structure or pragmatic factors, such as topicality, are controlled for, syntactic verb type does not matter for NA placement with intransitives, only discourse and pragmatic factors do. For example, Hirsch & Wagner examined verbs of appearance and disappearance, which are both unaccusative. They found that verbs of disappearance, where the subject could be construed as topical (something must already be introduced into the discourse in order to disappear), were more likely to elicit the NA on the verb, whereas verbs of appearance occurred more frequently with an accented subject.

Zubizarreta & Nava (2011) make a similar proposal that pragmatic factors influence NA placement in intransitives, but also tie pragmatic factors to syntactic verb type. In this approach, the NA is assigned to the subject in thetic (eventive) statements but to the verb in categorical statements (also known as topic-comment structures). Thetic statements are statements that introduce an event as a whole, whereas topic-comment statements contain two elements: a topic already established in discourse (the subject) and a comment about the topic. Zubizarreta & Nava argue that statements containing unaccusative verbs, due to the inherent lexical semantics of unaccusatives, are construed as thetic, and therefore are more likely to have an accented subject. Unergative verbs, on the other hand, can be construed as occurring in either thetic or categorical statements, depending on the context, which then leads to more variable NA placement on either the verb or subject. (5) provides an example of a thetic statement with an unaccusative verb and (6) provides an example of a categorical statement with an unergative verb.

    1. (5)
    1. What happened? My FRIEND arrived.
    1. (6)
    1. Why did the teacher stop talking? A child was YELLING.

The distinction between thetic and categorical statements with an unergative verb is said to be driven by pragmatic factors, such as noteworthiness, predictability, or expectedness. If a verb is less predictable (denoting an unexpected event), it is more likely to be accented, whereas if it is more predictable, there is a higher chance the subject will be accented. For example, the verbs in both (7) and (8) are unergative. However, in (7), the subject is more likely to be accented since birds are expected to chirp, whereas the verb is more likely to be accented in (8), as a bird burping is very unexpected, and therefore a noteworthy event:

    1. (7)
    1. A BIRD is chirping.
    1. (8)
    1. A bird is BURPING.

Zubizarreta & Nava provide a limited amount of experimental evidence to support the above claim. They conducted an experiment examining the prosodic patterns of both native English speakers and L1 Spanish L2 English learners. The participants were asked to read simple intransitive sentences aloud; the sentences were contextualized with a preceding question. NA placement was determined through perceptual annotation. In this study, native English speakers placed the NA on the subject 97% of the time for unaccusative sentences, but only 42% of the time for unergatives; the remaining 58% of unergative utterances were produced with the NA on the verb. Zubizarreta & Nava used these findings to support their claims, noting that there is little variability in unaccusatives and high variability in unergatives. However, they did not explicitly test whether the factors they proposed to be responsible for NA placement variability in unergatives, such as predictability or expectedness, were in fact behind the variation found in unergatives.

2.2 NA placement in Spanish

NA placement in Spanish generally occurs at the end of an utterance, regardless of any information structure factors (Zubizarreta 1998; Hualde 2005; Zubizarreta & Nava 2011). Spanish has been categorized as a non-plastic language (Vallduví 1991), indicating that syntactic rather than prosodic manipulations are used to mark words in narrow focus. Zubizarreta (1998) has provided what is perhaps the most prominent model in the field, which accounts for the relative prosodic inflexibility of Spanish, while predicting prosodically-motivated syntactic movement to mark narrow focus. Non-final NA placement in non-plastic languages like Spanish is possible for marking contrastive focus as is seen in (10), which Zubizarreta’s model accounts for. In other cases, such as informational focus, focus is expressed through word order inversions, with the focused element occurring at the end of a sentence, as is shown in (9). This stands in direct contrast to a language like English, which has rigid word order, but greater flexibility in NA placement.

    1. (9)
    1. Syntactic marking of informational focus, VS order:
    1. ¿Quién estornudó? -
    2. Who sneezed? -
    1. Estornudó
    2. sneezed
    1. JUAN.
    2. Juan
    1. ‘Who sneezed? - Juan sneezed’
    1. (10)
    1. Prosodic marking of contrastive focus, SV order:
    1. ¿Estornudó Pedro? -
    2. Pedro sneezed? -
    1. JUAN
    2. Juan
    1. estornudó.
    2. sneezed.
    1. ‘Pedro sneezed? – Juan sneezed.’

The lack of prosodic plasticity in Spanish, however, cannot be entirely taken for granted. Recent experimental studies have indicated that Spanish may have more prosodic flexibility than previously thought. With regard to narrow focus, written production tasks have found that native Spanish speakers do not always put the focused element at the end of a sentence, instead preferring to mark narrow focus in-situ (Hertel 2003; Hoot & Leal 2020). Oral production studies have corroborated these findings, showing that speakers are able to prosodically augment an element in narrow focus even when it occurs non-finally. Such findings have been found across various dialects, including Mexican Spanish (Kim 2016), Venezuelan Spanish (Calhoun et al. 2018), Castilian Spanish (Vanrell Bosch & Soriano 2013) and Argentinean Spanish (Gabriel 2010).

While much of the experimental research on narrow focus in Spanish has contradicted previous theoretical approaches about lack of prosodic plasticity in Spanish, less experimental work has been conducted to examine NA placement in broad focus contexts. One exception is Calhoun et al. (2018), who conducted an oral production task to test the effects of verb type (unaccusative versus unergative) and information status on word order and NA placement in Venezuelan Spanish intransitives. This study found that SV order occurred more frequently in Spanish intransitives than predicted. However, regardless of word order, the NA largely occurred on the final word. Landblom (2020) found similar results. This study examined Spanish word order and prosody by examining how both verb type and subject (in)definiteness affect word order inversions in a word order preference task and prosody in broad focus intransitives in a production task. While both factors affected the preference of SV versus VS word order, L1 speakers of Spanish were found to most commonly produce sentence final NA placement regardless of word order. These studies suggest a preference for utterance-final NA placement in Spanish broad focus intransitives. However, these studies are limited in scope and there is still more we could learn about whether prosodic flexibility in Spanish broad focus intransitives is affected by dialect or by pragmatic factors.

2.3 L2 acquisition of nuclear accent

Given that Romance and Germanic languages vary predictably with regard to NA placement, they have often been contrasted in studies of L2 acquisition of plastic versus non-plastic prosody. Studies have examined whether native speakers of a non-plastic language can acquire the prosodic flexibility of a plastic language and vice versa (e.g. Krahmer & Swerts 2001; Rasier & Hiligsmann 2009; van Maastricht et al. 2016). Most studies have focused exclusively on how NA placement in narrow focus constructions is acquired. For example, Rasier & Hiligsmann (2009), a bi-directional production study, found that French speakers tended to transfer L1 accentual patterns into L2 Dutch, and as a result, were less likely to deaccent given/old information in their Dutch productions than did native Dutch speakers. L1 Dutch/L2 French speakers, on the other hand, were more likely to imitate native French patterns, exhibiting less evidence of transfer. Based on these findings, the researchers propose that acquiring a plastic prosodic system is more difficult than acquiring invariant prosody in a non-plastic system. They used the Markedness Differential Hypothesis (Eckman 1977) to explain their findings, arguing that NA placement rules in French are only structurally driven, whereas those in Dutch are driven by both structural and pragmatic rules, making it the more marked of the two, and hence more difficult to acquire.

Van Maastricht et al. (2016) also conducted a bi-directional experiment examining L2 acquisition of narrow, contrastive focus in L1 Dutch/L2 Spanish and L1 Spanish/L2 Dutch speakers. The results of this study provided evidence of transfer of L1 prosodic patterns in both directions. Additionally, speakers at higher levels of proficiency (for both speaker groups) had relative maximum pitch values (specifically the maximum pitch of the stressed syllable of the first word minus that of the second word) closer to those of the native speakers, indicating that transfer of the L1 patterns was decreasing with proficiency, and the learners were converging on the target L2 prosodic patterns.

While most L2 studies of prosody have examined narrow focus, English and Spanish also differ regarding the prosodic structure of broad focus intransitives, as discussed above. To examine whether speakers can acquire a non-final NA when it does not exist in the L1, and to learn more about the trajectory of acquiring flexible NA placement in an L2, it is important to examine broad focus as well as narrow focus contexts. Zubizarreta & Nava (2011), described in Section 2.1, is one of the few studies to include broad-focus intransitive stimuli in their investigation of NA placement in the L2 English of Spanish speakers. They suggest that L2 learners only acquire non-final NA placement at high levels of proficiency. In unaccusative sentences, the high proficiency learners produced an accented subject at a rate of 36% (compared to 97% for the native English speakers). In comparison, the intermediate L2 learner group accented the subject only 4% of the time. For unergatives, the rate of NA placement on the subject was 42% for native speakers, 39% for high-proficiency learners, but only 16% for intermediate learners.

Zubizarreta & Nava’s findings illustrate that it is difficult for Spanish speakers to acquire variable English prosody in broad focus intransitives. However, further investigation is required. As discussed in section 2.1, according to Zubizarreta & Nava (among others), NA placement in intransitives is driven by two factors: verb type and pragmatic factors, such as expectedness. Therefore, L2 English learners need to learn two distinct properties. First, they need to relate verb type to NA placement: on Zubizarreta & Nava’s proposal, unaccusatives occur in thetic statements, which have non-final NA placement on the subject, while unergatives can occur either in thetic statements or in categorical statements, and hence either the subject or the verb may be accented. Second, L2 English learners need to learn that pragmatic factors such as expectedness/predictability affect whether an unergative verb occurs in a thetic or a topic-comment statement, and hence whether the NA is assigned to the subject or the verb. The exact contributions of verb type versus pragmatic factors to NA placement have not yet been fully teased apart in studies with either native speakers or L2 learners of English. Our study aims to do just that, by independently manipulating verb type and (un)expectedness. Additionally, by recruiting speakers of differing proficiency levels, we hope to start examining the learning trajectory in more detail, by exploring whether verb types or pragmatic factors that affect English NA placement are acquired earlier.

2.4 Research questions

Our study is a partial replication and extension of Zubizarreta & Nava (2011). The goal of our study is two-fold. First, we examine the effects of both verb type and expectedness on NA placement in English broad-focus intransitives. As discussed above, Zubizarreta & Nava proposed that unaccusative verbs favor NA placement on the subject due to their inherent lexical and syntactic properties; in contrast, unergative verbs exhibit more variability as they can occur in either thetic or categorical statements, depending on the discourse. However, while Zubizarreta & Nava demonstrated that unergatives show more variability than unaccusatives, they did not explicitly manipulate their stimuli for expectedness, only for verb type. In this study, we manipulate the two factors separately, asking the following research question:

RQ1: How is NA placement in English intransitive sentences affected by verb type and by the pragmatic factor of (un)expectedness for L1 speakers?

The second goal of this study is to examine the L2 acquisition of intransitives more closely, by testing L1 Spanish L2 English learners and comparing their performance to that of native English speakers. Following the prior literature discussed above, we assume that English exhibits more flexibility in NA placement in broad-focus intransitives than Spanish, which nearly always places the NA sentence-finally. Therefore, L1 Spanish L2 English learners need to acquire the fact that English uses variable NA placement to distinguish between thetic and categorical statements; as discussed above, it is an open question as to whether the relevant factors behind variable NA placement are verb type, pragmatic expectedness, or both. Regarding L2 acquisition, we therefore ask the following research question:

RQ2: Do Spanish speakers transfer strategies of NA placement in broad-focus intransitives into their L2 English? Do they acquire variable NA placement based on verb type and/or expectedness?

As discussed in section 2.2, while there is more prosodic flexibility in NA placement Spanish than had been originally thought, this flexibility has been noted primarily in narrow-focus utterances. The studies that have looked at broad-focus intransitive utterances find that NA placement is usually utterance-final, regardless of word order. This is in sharp contrast with English, where NA placement in broad-focus SV intransitives is affected by both verb type and pragmatic factors. Following the prior studies on broad-focus intransitives in Spanish, we assume that transfer from Spanish should lead L1 Spanish L2 English learners to prefer NA placement on the verb in intransitive SV utterances.

Examining RQ2 with learners of different proficiency levels can provide insight into the trajectory of L2 acquisition of NA placement with broad focus intransitives, informing us as to whether it is easier to acquire non-final NA placement based on structural factors such as verb type, or on pragmatic factors like expectedness.

3 Methodology

We examined both L1 and L2 English speakers’ oral production of intransitives, measuring effects of verb type and expectedness. While verb type was relatively easy to establish, expectedness was more difficult to define, as it largely relies on the surrounding discourse context. Therefore, a sentence norming task was first conducted on a larger set of intransitive stimuli to select stimuli for the main task.

3.1 Sentence norming task

The norming task1 had a 2x2 design in which verb type (unaccusative versus unergative) was crossed with predicted expectedness (expected versus unexpected); 32 token sets were built. Verbs were drawn from Levin et al. (1995), who listed and categorized a large number of intransitive verbs. After the verbs were chosen, target sentences were created, each with a contextual question designed to elicit broad focus and to create a situation in which half of the target sentences would be expected as answers to the question, and the other half unexpected. We attempted to keep the target question the same for all conditions in each token set. However, due to the differences in meaning between unergative and unaccusative verbs, and the difficulty in setting up expected and unexpected contexts for each verb type, some token sets had two different contextual questions, one for both unaccusative verbs and one for both unergative verbs. Most token sets (n = 21) had only one question for all four verbs, as in Table 1, while the remaining 11 had two questions, as in Table 2: in the latter case, we tried to keep the contextual questions as similar as possible.

Table 1

Example token set with one contextual question: “What happened after you hung up your birdfeeder?”.

Expected answer Unexpected answer
Unaccusative A robin appeared. A robin ripped.
Unergative A robin chirped. A robin burped.
Table 2

Example token set with two contextual questions.

Question Expected answer Unexpected answer
Unaccusative What happened while you were walking around the cornfield? A farmer arrived. A farmer vanished.
Unergative What happened after you trespassed on the cornfield? A farmer bellowed. A farmer limped.

The norming task was administered to 42 participants via Amazon Mechanical Turk. Two participants were excluded due to being native in a language other than English; three participants who reported being bilingual in English and another language – one speaker each for French, Spanish and Mandarin – were kept in the analysis as they met the criteria of being a native speaker in English and reported English being spoken in the home while growing up. Following an example, participants saw question-answer pairs and were asked to rate how expected the response was as an answer to the question provided, on a scale from 1 (very unexpected) to 5 (very expected). The norming task consisted of one experimental list with the full 128 items (32 token sets X 4 tokens per set).2

To determine which of the token sets would be used for the production experiment, the ratings were averaged across participants. Then, for each token set, a difference score was calculated between the predicted expected and unexpected tokens, for both unaccusative and unergative verbs. The 12 token sets with the highest overall expectedness difference scores for both verb types were used in the oral production task. Table 3 shows the average ratings by condition for the 12 token sets selected.

Table 3

Expectedness norming task results for the 12 selected token sets: mean (sd).

Expected Unexpected
Unaccusative 4.56 (.84) 2.25 (1.18)
Unergative 4.35 (.96) 2.03 (1.15)

To gauge whether the unergative and unaccusative stimuli differed in perceived expectedness, a post-hoc analysis was conducted, in the form of a linear mixed effects regression on expectedness ratings of the 12 selected token sets which was run using the lmer function from the lme4 package in R (Bates et al. 2015). Verb type (2 levels) and expectedness (2 levels) were introduced as fixed effects, and an interaction between these terms was also introduced. Both token and participant were included as random effects. Table 4 shows the output. Participants gave significantly lower ratings to unexpected than to expected verbs, but verb type did not have an effect, and did not interact with expectedness.

Table 4

Linear regression output for expectedness ratings, by verb type and expectedness.

Random effects variance: .07 for token, .05 for participant.

Estimate Std. Error Df T-value P value
(Intercept) 4.56 .09 56.33 49.22 <.05*
Verb Type Unergative –0.2 .12 44 –1.67 .1
Expectedness Unexpected –2.3 .12 44 –18.94 <.05*
Verb Type Unergative : Expectedness Unexpected –0.02 .17 44 –0.13 .9

The 12 token sets selected for the experiment corresponded to 48 distinct sentences (12 per condition). A post-hoc frequency analysis was conducted on the verbs from these sentences to ensure any differences between conditions in the main experiment could not be attributed to word frequency. We examined the frequency of occurrence of each verb in the Corpus of Contemporary American English (COCA). Table 5 shows the average number of occurrences by condition. An ANOVA was conducted on the frequency scores, with verb type and expectedness as the main effects. The ANOVA results yielded a significant effect of expectedness (F(1,44) = 7.48, p < .05). Neither Verb Type (F(1,44) = 2.59, p = .11) nor the interaction between the two factors (F(1,44) = 1.39, p = .24) turned out to be significant, despite unaccusatives having a much greater frequency than the unergatives.

Table 5

Average frequency of selected verbs in COCA.

Unaccusatives Unergatives
Expected 463, 896 176,160
Unexpected 60,067 15,575

These results indicate that frequency is confounded with expectedness, as the expected verbs are much more frequent than the unexpected ones; thus, any effects of expectedness in the oral production task can just as easily be attributed to frequency. Crucially, however, the relationship between expectedness and frequency holds for both unergative and unaccusative verbs.

3.2 Oral production task

3.2.1 Participants and procedure

20 native speakers of American English (12 female) completed the oral production task. All reported being born and raised in the U.S. and were residing there at the time of the study. Sixteen of the 20 L1 speakers identified only English as their L1. Four participants indicated being bilingual in English and one other language (Spanish, Greek, Polish and Cantonese); they were kept in the analysis, since they spoke English as one of their native languages, had grown up in the U.S. and reported being dominant in English. Additionally, 23 L1 Spanish/L2 English learners participated in this study. Thirteen (seven female) were recruited and recorded at a university in Costa Rica. The remaining ten (seven female) were recruited and recorded at a university in the U.S. This was done to recruit L2 speakers of varying proficiency levels. The participants from Costa Rica all reported Spanish and only Spanish as their native language. None reported time spent abroad in an English-speaking country. The Spanish-speaking participants recruited in the U.S. came from a diverse range of countries including Colombia, Mexico, Puerto Rico and Spain. All were residing in the U.S. at the time of recording, and had moved there as adults, no more than six years previously; all but two had resided in the U.S. for less than one year. All participants reported only Spanish as their native language, except for one who reported both Spanish and Nahuatl as native languages. However, she also reported that she stopped speaking Nahuatl at the age of four. Most participants reported studying other languages besides their native language and English. Four of the L2 participants reported having studied German, a prosodically plastic language like English. However, all four participants reported relatively low proficiency in German (no more than 30 of 100), making it unlikely that German prosody would have influenced their performance in English.

All participants completed the experimental tasks in person, in the presence of an experimenter. All participants tested in the U.S. (both native speakers and learners) completed the experiment in a sound-attenuated booth in a research laboratory, while wearing a head-mounted microphone. Lab space and equipment was not available for the L2 speaker group recruited in Costa Rica, so instead, the procedure was completed in a quiet room, using the researcher’s computer, and participants were recorded by a Sony ICD PX312 handheld recorder that was placed to the right with the microphone as close to the speaker as possible.

Participants completed multiple tasks, in the following order: a vocabulary task (for learners only); the oral production task; an oral fluency test; an intuition task (not discussed here); a proficiency test; and a language background questionnaire. All tasks were completed in one sitting and generally took about 30–45 minutes for L1 speakers and 30–60 minutes for the L2 speakers.

The vocabulary test was designed to measure learners’ knowledge of the words included in the experiment; it consisted of multiple-choice questions in which the participants were asked to select the most fitting Spanish translation for each word. The results for this test can be found in Table 6. The oral fluency test was included to provide another way to estimate learners’ development in case there was no correlation between proficiency scores and NA placement in learners. Fluency was operationalized by using the speech rate, measured by the number of syllables per minute (Riggenbach 1991). However, since there were significant findings based on proficiency scores, only the proficiency scores (and not the fluency scores) are included in the statistical models reported here . An advantage of including the proficiency score instead of the fluency score in the analysis is that the former provides a measure gauging proficiency that is independent of speaking ability (cf. Tremblay 2011).

Table 6

Background information for L1-English and L1-Spanish/L2-English speaking participants.

Age Proficiency Score (of 40) Fluency Score Self-Reported English Proficiency (of 100) Vocab Score (% correct)
L1 Speakers
(n = 20)
Mean (SD)
Range
20 (1)
18–22
38.2 (1.11)
35–40
175.42 (41.75)
65–238
97.74 (5.28)
82–100
n/a
L2 Speakers:
US Group
(n = 10)
Mean (SD)
Range
27 (3)
24–32
35.2 (2.82)
31–38
136.62 (29.07)
103–193
72.2 (17.77)
40–100
79.36 (14.57)
48–96
L2 Speakers:
Costa Rica Group
(n = 13)
Mean (SD)
Range
21 (2)
18–27
32.38 (2.81)
26–38
148.06 (26.68)
111–199
66.69 (19.44)
29–90
81.42 (7.64)
65–92

The proficiency task was a forced-choice cloze test used in Ionin & Montrul (2010); it consisted of a text passage (from O’Neill et al. 1981) which was missing every seventh word, for a total of 40 missing words. Participants selected from three options for each missing word, only one of which was appropriate for the context. The language background questionnaire asked participants about their language background and asked them to self-report their English proficiency on a scale from 0 to 100.

Table 6 reports the participants’ age, along with their proficiency, fluency, and vocabulary test scores. As the table shows, learners tested in the U.S. had higher mean proficiency scores than the Costa Rica group, while the opposite was the case for fluency scores. T-tests yielded a significant difference between the two L2-groups in proficiency test scores (t(21) = 2.38, p < .05), but not in fluency (t(21) = –0.98, p = .34) or self-reported proficiency (t(21) = –0.7, p = .49).

3.2.2 Materials

Each item in the oral production task consisted of a question-answer pair. The 2X2 design crossed verb type and expectedness, resulting in four conditions, illustrated by the sample token set in Table 1. Of the 12 token sets, four had two different contextual questions, while eight shared the same question across all conditions. All target sentences consisted of a simple indefinite subject NP and a verb in the simple past.

Because verbs were drawn from a limited set, and because the expectedness ratings further constrained the selection of sentences, there were some repetitions of verbs (though all target sentences were distinct). Repetition was mostly found in the unaccusative expected category, which consisted of six distinct verbs: appear(x3), arrive(x3), come(x2), return(x2), flee, leave. Unexpected unaccusatives had 10 distinct verbs including die and vanish, which each occurred twice.3 Unergative verbs in both expectedness conditions each consisted of twelve distinct verbs.

In addition to the intransitive stimuli, the oral production task also contained 84 items from four other experiments, which manipulated predicted NA placement to occur on either the penultimate or final element of an utterance. These four experiments tested four different phenomena, including contrastive focus, indefinite pronouns, and various properties of compounds. The experiment on intransitives and the four other experiments served as fillers for each other, as they tested different syntactic environments as well as different types of focus (broad, informational, and contrastive).

The stimuli were divided into two lists, each containing 24 items with intransitives (six per condition) and 42 items for the other four experiments. For the intransitives, each list contained two tokens from each token set that differed in both verb type and expectedness (e.g., unaccusative expected and unergative unexpected). The lists were blocked and randomized for order of presentation.4

The oral production task was administered to participants on a computer using Psychopy 1.84.2 (Peirce 2007; 2009). For each target sentence, participants first both heard and read the question, which was presented in the upper half of the screen and played automatically as soon as the participant advanced to a new screen. After the written question appeared and the corresponding audio-file played, the answer appeared in the lower half of the screen, and participants were asked to read it out loud, as if they were engaged in a dialogue and were responding to the preceding question. All answers were declarative sentences which would, predictably, elicit the same intonation across items and across participants.5 After reading the response out loud, participants advanced to the next item by clicking a key on the keyboard. Because participants were in control of moving to the next sentence, they could read the sentence again if they felt the need, in which case the second version was kept for analysis. The test was preceded by oral and written instructions, and a sample question-answer pair.

4 Data analysis and results

4.1 Measuring nuclear accent

Prosodic units, such as nuclear accent, have generally been challenging to measure. Acoustic cues marking NA placement can vary, even within the same language. For example, the NA may be marked with an obvious pitch excursion at times, whereas other times the accented word is made prominent through other acoustic means, such as duration or intensity (e.g. Ladd 2008). There is also variation across languages, which has important implications for any study on the L2 acquisition of prosody. Because the NA can be difficult to measure, many studies have relied on perceptual annotation to establish NA placement (e.g. Zubizarreta & Nava 2011). However, this is subject to its own shortcomings, as this method relies on trained annotators to complete the task, which requires time and resources. Even though annotators may be trained, the nature of the task allows for the introduction of human error or bias. Because of this, it can be helpful to use a combination of acoustic measurements in determining NA placement, which can be collected automatically and are not reliant on the availability of hired annotators. This approach has been taken in numerous studies (e.g. Irwin 2011; Kim 2016). It is also possible to use acoustic measures in conjunction with perceptual annotation, to provide more robust, converging evidence of NA placement (e.g. Calhoun et al. 2018; Landblom 2020).

Many studies examining prominence at the phrasal level have investigated the acoustic correlates of narrow focus (e.g. Cooper & Eady & Mueller 1985; Xu & Xu 2005; Burdin et al. 2015). However, there are studies which have included broad focus constructions as well (e.g. Nguyen et al. 2008; Morrill 2011). There is no one unified way of measuring prominence, and methodologies vary. One approach which has been employed in various studies is to examine the relative difference in various acoustic measures of the stressed syllables between two adjacent words (e.g. Kim 2016; van Maastricht et al. 2016). In this approach, a variety of acoustic measurements are taken from the stressed syllables of both words, and then the measurements from word 2 are subtracted from that of word 1 for a difference measurement. The assumption behind this approach is that prominent words, those assigned the NA, will be acoustically augmented (e.g. higher intensity, higher maximum pitch, or pitch range), while unaccented words will be more acoustically compressed. Studies that have used this method with L1 and/or L2 speakers have included Nguyen et al. (2008), Hirsch & Wagner (2011) and Irwin (2011) for English, as well as Kim (2016) and van Maastricht et al. (2016) for Spanish. The present study adopts this methodology and uses the relative values of acoustic measurements between the subject and the verb to determine NA placement.

4.2 Data analysis

NA placement was analyzed through both quantitative acoustic measurements and perceptual annotation.6

4.2.1 Quantitative acoustic measurements

The quantitative acoustic measurements included measures of the pitch (maximum pitch, mean pitch, and pitch range) and the mean intensity of the vowel of the stressed syllables. For all four measures, the measure obtained from the verb was subtracted from that of the subject, providing a relative value. As discussed in Section 4.1, prosodic prominence correlates with higher pitch and intensity measurements. If the NA occurs on the final word, in this case the verb, we expect lower relative values than if it occurs on the subject. In the case of NA placement on the subject, the verb is expected to have lower measures than the subject, so the relative value should be larger. Since the relative values will vary depending on which word is accented (the verb vs. the subject), they can be compared across conditions to help determine NA placement.

Tokens were eliminated from both quantitative and perceptual annotation analysis for a variety of reasons, including a disfluency on the subject or the verb, an intonational boundary between these two words, misplaced word stress, skipped items, and background noise in the acoustic analysis. After these eliminations, 97.08% of the native speakers’ utterances (n = 466) and 89.13% of the L2 learners’ utterances (n = 494) were retained for analysis. For these retained tokens, measurements for F0 and mean intensity were extracted using ProsodyPro 6 (Xu 2013). For F0 values, ProsodyPro provides the mean F0 for each labeled interval, as well as the maximum and minimum measurements for each interval. All three pitch measures were collected for each labeled segment. The last two measurements were used to calculate the pitch range of each syllable (max – min = pitch range). Because there were multiple acoustic dimensions that were measured to find NA placement, a Principal Component Analysis (PCA) was run as a pre-processing step to reduce dimensionality. This is a fairly new approach for such a study. Similar studies measuring acoustic dimensions of prosody have often run multiple statistical analyses on the various measurements (e.g. Irwin 2011; Kim 2016) or have settled on one single measurement (e.g. van Maastricht et al. 2016). As mentioned in Section 4.1, prosodic prominence can be expressed through multiple acoustic dimensions, so including only one measurement runs the risk of missing other important cues and requires the researcher to decide a priori which cue may be the most important in realizing NA placement. Including multiple measurements allows the researcher to compensate for this. However, it often means running multiple statistical analyses, which raises the risk of a Type I error. To avoid both issues, some researchers have pre-processed the data by putting it though a PCA, which clusters correlated data points together and creates a new linear combination of variables, known as Principal Components (PCs), which are orthogonal to each other, to account for the most variance in the data. This reduces dimensionality of measurements taken in order to achieve a smaller set of variables (e.g. Li & Chen & Yang 2011; Huang & Guo & Kasakoff & Grieve 2016).

Because a PCA requires that all included components are measured in the same units, all raw measurements were z-scored within participant. This also helped control for differences in speech rate, pitch, and intensity values (Kim 2016), which can cause variation in the data unrelated to the research questions. The output of the PCA was examined to determine how many PCs should be retained for analysis, and these components were then used as the outcome variables in a linear mixed effects regression model to evaluate the predicted relevant variables on production.

All measured tokens from both speaker groups were included in the PCA. Missing data was imputed using the imputePCA function from the missMDA package in R (Josse & Husson 2016). The PCA was run using the PCA function from the FactoMineR package in R (Lê et al. 2008). The percent of variance that each PC accounts for is given in Table 7 along with the eigenvalue.

Table 7

Variances and Eigenvalues for each Principal Component.

PC1 PC2 PC3 PC4
Percent of variance 50.82 28.76 18.81 1.61
Eigenvalue 2.03 1.15 .75 .06

Only the PCs that account for most of the data are generally kept for further analysis (e.g. Aston & Chiou & Evans 2010; Bro & Smilde 2014; Huang et al. 2016). One way to determine if a PC explains enough variance to be retained is to examine the eigenvalue. Eigenvalues with a value greater than 1 are often retained, whereas those with smaller values are not (Bro & Smilde 2014). Both PC1 and PC2 had eigenvalues of over 1 and were therefore retained for further analysis.

Maximum pitch measures made the largest contribution to PC1 (46%) followed by mean pitch (39%). Pitch range (9%) and intensity (6%) were the lowest contributors. Pitch range (49%) and mean intensity (44%) were the largest contributors to PC2, followed by mean pitch (5%) and maximum pitch (2%). The PC coordinates were used as the dependent variable in the linear mixed effects regression models.

4.2.2 Perceptual annotation

Annotators listened to each recorded production and indicated whether they perceived the NA on the verb or the subject. Because the production data were collected over a long period of time, the entire data set was annotated by two separate groups of annotators over two rounds of annotation. In the first round, the native speaker data and the L2 data from Costa Rica were annotated by two annotators. The L2 data from the U.S. were collected at a later date, so this dataset was annotated by a separate group of three different annotators. For the first round of annotations, there was an overall agreement rate of 85.5% (kappa = .511, z = 20.3, p < .05). Tokens with disagreement between the two annotators from the first round of ratings were given to one of the annotators from the second round to resolve.

All the L2 data from the U.S. were annotated by all three annotators in the second round; any disagreements were resolved by taking the annotation agreed on by the majority of the three. There was a 65.01% agreement rate among all three annotators for the second round of annotations (kappa = .257, z = 4.08, p < .05). It is unknown why the agreement rate for the second round is as low as it is, but the risk of a low agreement rate was why acoustic measures were also employed so as to supplement the perceptual analysis.

All annotators were native, monolingual speakers of American English and had at least an introductory background in linguistics. This project was completed for course credit. Annotators were explicitly trained to listen for NA placement and provided with an overview of the phonetic correlates of prominence. After this introduction, they were given examples and training materials, both written and auditory, of utterances with differing NA placement. Even though they were trained explicitly how to identify the NA, they were blind to the research questions of the study and not given information on which constructions would be predicted to have final versus non-final NA placement.

4.3 Predictions

4.3.1 L1 speakers

If verb type has an effect on NA placement in English intransitives, then unaccusatives should be realized with NA placement on the subject, regardless of expectedness (per Zubizarreta & Nava 2011). In this case, there will only be an effect of expectedness with unergative verbs. Constructions with unexpected unergative verbs would elicit the NA on the verb, whereas expected unergative constructions would elicit the NA on the subject resulting in lower measurement values in unexpected unergative conditions compared to the other three conditions. In the perceptual annotation, there correspondingly would be a higher percentage of an identified NA on the verb in unexpected unergatives, and higher percentages of an identified NA on the subject in the other conditions.

If, however, verb type does not matter (according to, e.g., Hirsch & Wagner 2011), then expectedness would affect both verb types equally. In this case, lower measurement values in the unexpected conditions relative to the expected would be predicted. This trend would also be supported by the perceptual annotation results.

4.3.2 L2 speakers

It is predicted that the NA placement alternation in intransitive verbs is a difficult distinction for L2 learners to acquire, and that L2 speakers transfer their native pattern of utterance-final NA placement to L2 English. We assume based on prior findings (discussed in section 2.2) that broad-focus intransitives in Spanish tend to have utterance-final NA placement (unlike narrow-focus intransitives, which exhibit more variability). The predictions about L1 transfer are based on Zubizarreta & Nava (2011), who also found L1 transfer from Spanish to English in the domain of NA placement. Under L1 transfer, we would not expect to see a differentiation between unaccusative and unergative verbs, or between the expectedness conditions. In the acoustic results, all measurements would be similar, and the perceptual annotation results would indicate that the NA is placed exclusively on the verb, regardless of condition. However, as proficiency has been shown to have an effect (Zubizarreta & Nava 2011; van Maastricht et al. 2016), it is predicted that learning will take place and that L2 speakers of higher proficiency levels will produce more variable NA placement, similarly to native speakers.

4.4 Results: acoustic measurements

As discussed in Section 4.2.1, the analysis includes the PC coordinates as opposed to each acoustic measurement, so for space reasons we include only those in this section. Figure 1 shows the overall values of each observation on the new PC1 coordinates and Figure 2 shows those for PC2. Based on these boxplots, it appears that for the L1 speaker group, the distributions for the expected versus unexpected tokens are separating for unaccusative verbs, but less so for unergative verbs. The PC values for the L2 speaker group, on the other hand, appear very similar across verb types and expectedness conditions with an exception for PC2 values, which show some separation between unaccusative verbs.

Figure 1
Figure 1

PC1 values by verb type and expectedness (L1 versus L2).

Figure 2
Figure 2

PC2 values by verb type and expectedness (L1 versus L2).

The lmer function from the lme4 package in R (Bates et al. 2015) was used for the regression models. Two different linear mixed effects models were run on both PC coordinates. The fixed effects included in both models were verb type (unaccusative versus unergative), expectedness (expected versus unexpected) and sex (female versus male). The reason for including sex in the models is that the PCs include pitch measurements. Model1 included speaker group (L1 versus L2). Model2 was built to examine the effect of proficiency for the L2 speaker group; this model included only the data from the L2 speakers, with proficiency test scores as a covariate. For all models, interactions between the fixed effects were evaluated as well. Speaker (n = 43) and Token (n = 48) were included as random effects. The final models were selected using the drop1 function in R. In this approach, all fixed effects and interactions are included in the original model. The drop1 function then evaluates the model and indicates whether a fixed effect or interaction can be dropped because it does not explain the variation in the data and does not contribute to the model. Interactions and fixed effects not contributing to the model are dropped one by one until there are no longer any predictors that can be dropped. Thus, everything included in the final model contributed significantly to the outcomes. P-values were computed from the final model using the lmerTest package in R (Kuznetsova et al. 2015). Follow-up pairwise comparisons were calculated using emmeans in R (Lenth et al. 2018) to explore significant interactions.

For PC1, Model1, the final regression model included verb type, expectedness, speaker group and two two-way interactions, between verb type and speaker group, and between expectedness and speaker group. Verb type was significant as a main effect. Both interactions were also significant (Table 8).

Table 8

Linear mixed effects output for PC1 by verb type and expectedness (Model 1, L1 versus L2).

Random effects variance: .17 for token, .15 for speaker.

Estimate Std. Error Df T-value P value
(Intercept) .26 .15 92.59 1.76 .08
Verb Type Unergative –.4 .14 64.25 –2.84 <.05*
Expectedness Unexpected –.16 .14 64.27 –1.11 .27
Speaker Group L2 –.25 .15 72.35 –1.66 .1
Verb Type Unergative : Speaker Group L2 .25 .1 872.04 2.38 <.05*
Expectedness Unexpected : Speaker Group L2 .24 .11 872.57 2.26 <.05*

Further pairwise comparisons were conducted to better explore the two-way interactions. When expectedness was held constant, unaccusative and unergative verbs differed significantly for L1 speakers (t = 2.78, p < .05) but not for L2 speakers (t = 1.06, p = .29). Holding verb type constant, there is no significant effect of expectedness for either group (L1: t = 1.09, p = .28; L2: t = –.55, p = .58).

For PC2, Model1, the final regression model included verb type, expectedness and speaker group as main effects, as well as an interaction between verb type and expectedness, all of which were significant (Table 9).

Table 9

Linear mixed effects output for PC2 by verb type and expectedness (Model 1, L1 versus L2).

Random effects variance: .17 for token, .01 for speaker.

Estimate Std. Error Df T-value P value
(Intercept) .64 .13 52.96 4.77 <.05*
Verb Type Unergative –.7 .18 47.7 –3.83 <.05*
Expectedness Unexpected –.75 .18 47.83 –4.07 <.05*
Speaker Group L2 –.16 .06 36.61 –2.7 <.05*
Verb Type Unergative : Expectedness Unexpected .61 .26 48.22 2.32 <.05*

Pairwise comparisons indicate that there is a significant difference between the two verb types in the expected condition (t = 3.9, p < .05), but not the unexpected condition (t = .74, p = .46). Sentences with unaccusative verbs have higher PC2 values than all other conditions.

Thus, as both PC1 and PC2 indicate, both verb type and expectedness have an effect on L1 speaker productions. PC1 indicates a significant effect of verb type, with higher values for unaccusatives than unergatives. PC2 indicates an interaction between verb type and expectedness, with the unaccusative, expected condition having higher values overall – a trend that is seen in the boxplots for both PC1 and PC2.

In comparison, the L2 speaker group, as a whole, appears not to be consistently distinguishing among conditions. It does appear, according to the results for PC1, that values for unaccusatives are higher than for unergatives, but this does not reach significance as it does for the L1 speaker group. Moreover, there is also a significant effect of speaker group, indicating that L1 speakers may have higher values than L2 speakers, which could signal a higher rate of sentence-final NA placement for the L2 speaker group. Such invariable NA placement would be consistent with L1 transfer.

Model2 for PC1, the regression model including only the L2 speakers and their proficiency scores, indicates that only higher proficiency speakers are beginning to produce the NA on the subject. The final regression model evaluating proficiency effects included only the main effect of proficiency score, which was significant, as is seen in Table 10.

Table 10

Linear mixed effects output for PC1 (Model2, L2 only).

Random effects variance: .12 for token, .2 for speaker.

Estimate Std. Error Df T-value P value
(Intercept) –.01 .11 33.27 –.09 .93
Proficiency Score .21 .1 25.12 2.17 <.05*

For PC2, Model2 included the main effects of expectedness and proficiency, both of which were significant (Table 11).

Table 11

Linear mixed effects output for PC2 (Model2, L2 only).

Random effects variance: .22 for token, .00 for speaker.

Estimate Std. Error Df T value P value
(Intercept) .2 .11 48.11 1.79 .08
Expectedness Unexpected –.43 .16 49.24 –2.75 <.05*
Proficiency Score .13 .04 480.41 3.27 <.05*

Both the PC1 and PC2 results suggest that as L2 speakers’ proficiency scores increase, so do the PC values, which could indicate that higher proficiency speakers are more likely to start producing a non-final NA on the subject.

4.5 Results: perceptual annotation

After the annotations were completed, two binomial mixed effects regression models were run to test the effects of verb type, expectedness and speaker group on identified NA placement. Model3 tested the effect of speaker group (L1 versus L2) as well as the other relevant factors, while Model4 included the L2 data only and tested the effect of proficiency. The dependent measure was the likelihood of a non-final NA. Item and speaker were included as random effects in both models barring any problems with convergence. Each model was selected using a forward stepwise regression selection method in which first a minimal model (including only random effects) is evaluated. Predictors are then added until the Akaike Information Criterion (AIC) number becomes larger instead of smaller (Akaike 1974). Models were run using the glmer function from the lme4 package in R (Bates et al. 2015).

Overall, the perceptual annotation results support the findings reported in Section 4.4 for the acoustic results. Figure 3 shows the mean percentage of NA placement on the subject as identified by the annotators. There is a preference for an accented subject in sentences with expected, unaccusative verbs for L1 speakers only. A final, accented verb is preferred across all other conditions in the L1 group and for the expected verb in the L2 group. For the unexpected verbs, there is about a 50/50 split for the NA identified on the subject versus on the verb in the L2 group.

Figure 3
Figure 3

Average percentage of identified NA placement on subject divided by verb type, expectedness and speaker group. Error bars represent 1 SD from the mean.

The final binomial regression model for Model3 included verb type, expectedness and speaker group as fixed effects as well as an interaction between speaker group and expectedness. All main effects and the interaction were found to be significant (Table 12).

Table 12

Binomial mixed effects regression output for perceptual annotation of intransitives including effects of speaker group (L1 versus L2).

Random effects variance: .87 for token, 3.12 for speaker.

Estimate Std. Error Z value P value
(Intercept) 0.69 0.5 1.39 0.17
Verb Type Unergatives –1.16 0.34 –3.43 <.05*
Expectedness Unexpected –2.52 0.4 –6.29 <.05*
Speaker group L2 –1.87 0.61 –3.03 <.05*
Expectedness Unexpected : Speaker Group L2 2.23 0.4 5.56 <.05*

Pairwise comparisons using emmeans showed that there was a significant difference between expected and unexpected tokens for L1 speakers (t = 6.29, p < .05), but not for L2 speakers. Additionally, L1 speakers differed from L2 speakers significantly only in the expected conditions (t = 3.03, p < .05), and not in the unexpected conditions. Overall, L1 speakers are more likely to have the NA identified on the subject when the verb is expected versus when it is unexpected, whereas the L2 speaker group is more likely to have the NA identified on the verb regardless of expectedness.

Figure 4 shows the correlation of the proportion of tokens with an identified NA on the subject with the proficiency scores of the L2 participants.

Figure 4
Figure 4

Proportion of identified NA placement on the subject divided by verb type and expectedness in correlation to proficiency score.

Model4, on the effect of proficiency score for the L2 speakers only, included verb type and proficiency score as main effects, both of which were significant. Expectedness was not significant, nor was the interaction between the two main effects (Table 13). Only speaker was included as a random effect due to issues of convergence.

Table 13

Binomial mixed effects output for perceptual annotation including effects of proficiency (L2 only).

Random effects variance: 3.13 for speaker.

Estimate Std. Error Z value P value
(Intercept) –14.88 4.66 –3.19 <.05*
Verb Type Unergatives –0.51 0.26 –2 <.05*
Proficiency Score 0.4 0.14 2.93 <.05*

These results indicate that the higher in proficiency a speaker is, the more likely they are to place the NA on the subject. Sentences with unaccusative verbs have higher levels of non-final NA placement than those with unergative verbs. Visually, there is evidence that expectedness makes a slight difference within the unaccusative constructions, with expected verbs more likely to exhibit non-final NA placement than unexpected verbs. However, this trend is not strong enough to reach significance.

5 Discussion

Overall, both the quantitative acoustic measurements and the perceptual annotation results suggest that L1 speakers are more likely than L2 speakers to produce non-final NA placement on the subject. Moreover, they are most likely to produce this pattern with expected unaccusative verbs. L2 speakers do show evidence of acquiring non-final NA placement. This is seen largely in sentences with unaccusative verbs, where speakers are more likely to produce a non-final NA on the subject at higher proficiency levels.

5.1 Native speaker results

Given the results, RQ1, repeated below, can now be addressed.

RQ1: How is NA placement in English intransitive sentences affected by verb type and by the pragmatic factor of (un)expectedness for L1 speakers?

Findings indicate that both verb type and expectedness play a role in NA assignment for L1 English speakers. The acoustic analysis provides evidence of an effect of both verb type and expectedness, with unaccusatives and expected conditions having higher values than other conditions. The perceptual annotation analysis supports these results, as it found that NA placement on the subject was most likely for L1 speakers for the expected, unaccusative condition. More variability is seen for unergatives, which is especially apparent from the perceptual annotation results.

These results partially replicate those from Zubizarreta & Nava (2011), who found that L1 speakers of English were more likely to produce the NA on the subject for unaccusatives, and that NA placement in unergatives was highly variable. However, there are two main differences. First, we found an effect of expectedness in unaccusative verbs, which was not predicted by the theoretical model proposed in Zubizarreta & Nava. Secondly, it was predicted that expectedness would affect NA placement in unergatives, but this was not found to be the case.

It is possible that the results found here are in part a reflection of frequency effects (see section 3.1). Frequency has been found to correlate with accentuation in that more frequently occurring words are less likely to be assigned prominence, as frequency correlates with predictability, which in turn contributes to the prominence structure of an utterance (e.g. Watson & Arnold & Tanenhaus 2008; Bell & Brenier & Gregory & Girand & Jurafsky 2009; Cole & Hualde & Smith & Eage & Mahrt & de Souza 2019). However, frequency cannot be the only explanatory factor: if it were, we would expect to see a greater degree of non-final NA placement in the expected, unergative verbs as well, as this condition had the second most frequently occurring verbs.

It is surprising that utterances with unergative verbs were produced overwhelmingly with final NA placement. One question that could be investigated further is whether this could be an effect of the read-aloud task. It would be good to compare the results of a more spontaneous speaking task to the findings in this study to see if sentences with unergative verbs still show a preference for utterance-final NA placement. Additionally, it may be the case that when speakers have only limited context, in this case, a single preceding question, it may be difficult to construe a statement as thetic. Schmerling’s (1976) examples show that thetic sentences are often uttered when there is ample context – more than what may be provided in one question. Additionally, there were no checks to see if participants were actually listening to or reading the contextual question. It may be helpful in a future study to find a way to ensure that participants are engaged in the dialogue. For unaccusative expected verbs, their inherent properties may lead to their being interpreted as thetic, in which case, context would matter less. This would lend support to Zubizarreta & Nava’s proposal of inherent lexical properties of unaccusative verbs leading to a thetic construal which, in turn, helps with the assignment of NA placement on the subject.

A limiting factor of this study, as well as of Zubizarreta & Nava (2011), is the fact that the differences between unaccusative verbs (cf. Sorace 2000) are not taken to account. Zubizarreta & Nava used a small set of unaccusative verbs, including many verbs of appearance. In contrast, a sub-experiment on unaccusatives from Hirsch & Wagner (2011) manipulated the type of unaccusative, using verbs of disappearance and of appearance only. The assumption was that verbs of disappearance lead to inherently more topical sentences than verbs of appearance, as the subject must already be implied in the surrounding discourse in order to disappear. This factor was not controlled for in our study, and coincidentally, there were four verbs of disappearance (six tokens) in the unexpected unaccusative condition (die(x2), expire, perish, vanish(x2)). Furthermore, many verbs used in the expected unaccusative condition were verbs of appearance (appear(x3), arrive(x3), come(x2)). Zubizarreta & Nava (2011) proposed that unaccusative verbs are construed as thetic due to their lexical semantics (p. 654). This seems to be only partially true, as a difference is seen between the two unaccusative verb types in NA placement.

Our findings are consistent with Hirsch & Wagner’s claim that topicality can be an important factor for NA placement in unaccusatives. While Hirsch & Wagner further claim that verb type plays no role in NA assignment, there may still be room for this factor, as the inherent function of such verbs as appear, arrive or come in a broad focus context is to introduce a referent into the discourse. If the referent has already been introduced, then it becomes old information and is prosodically treated as such (i.e. either reduced to a pronoun or prosodically reduced). It may be that the verb semantics interacts with topicality, thus driving NA placement. It would be fruitful to control for verb type in future studies. If there were more verbs of disappearance in the expected conditions, and more verbs of appearance in the unexpected conditions, would the patterns of NA placement change? Additionally, our findings suggest that topicality does play a role in NA placement assignment, which can extend across verb types, and should not be ignored in future studies.

5.2 L2 speaker findings

We now address RQ2, repeated below:

RQ2: Do Spanish speakers transfer strategies of NA placement in broad-focus intransitives into their L2 English? Do they acquire variable NA placement based on verb type and/or expectedness?

Overall, it was found that L2 speakers as a group were less likely than the L1 group to make a distinction among conditions. The perceptual annotation results indicate a higher proportion of utterance-final NA placement. This is similar to prior findings for Spanish as an L1, suggesting a role for L1-transfer, especially considering the finding that the pattern of final-NA placement is more likely at lower levels of proficiency (when transfer is likely more pervasive).

For the second part of the question, there is evidence that L2 speakers are acquiring non-final NA placement as proficiency increases, as was shown in both the acoustic and the perceptual analyses. The perceptual analysis found an effect only of verb type, whereas the acoustic analysis found evidence of an effect of expectedness only. Taken together, these findings suggest that L2 speakers are learning patterns on non-final NA placement on the subject. It is still unclear what aspects of NA placement – pragmatic or syntactic – learners are acquiring. The acoustic analysis indicates an effect of expectedness only. However, since expectedness correlates with frequency (see section 3.1), this could also be a frequency effect, with learners struggling to pronounce words that are less frequent, and therefore quite possibly less familiar to them. The perceptual annotation analysis indicates that speakers may be acquiring non-final NA placement based on verb type, which would indicate that speakers are acquiring structural factors before pragmatic ones. However, since this was only found in the perceptual annotation analysis, it would be important to confirm this with more acoustic data. It may be that the relatively small sample size (23 L2 participants) made it difficult to see any trends more clearly. It would be beneficial to collect data from a larger group of L2 speakers who vary in proficiency, in order to determine whether NA placement based on verb type or pragmatics is acquired first.

The fact that there are no clear patterns of non-final NA placement replicates Zubizarreta & Nava’s (2011) findings that acquiring this prosodic pattern in English is difficult for speakers of Spanish. While there is an effect of proficiency, with higher proficiency speakers becoming more target-like, the effects found for the L1 speaker group, such as higher rates of non-final NA placement in sentences with expected, unaccusative verbs, are harder to see for the L2 speaker group, even for speakers of higher proficiency levels.

It would also be fruitful to conduct a study where participants record sentences in both their L1 Spanish and their L2 English. As discussed in Section 2.2, this study assumes utterance-final NA placement for broad-focus unaccusatives in native Spanish, which is an assumption based on the results of prior studies (Calhoun et al. 2018; Landblom 2020). However, neither of these studies examined the effects of pragmatic factors such as expectedness in native Spanish. Collecting both Spanish and English data from the same set of participants would strengthen our conclusions about the role of L1 transfer.

6 Conclusion

To conclude, this study examined NA placement in English simple intransitives in both L1 and L2 speakers. Factors leading to final NA placement on the verb versus non-final placement on the subject were examined and the effects of both verb type and expectedness were tested. Both factors were found to be significant for L1 speakers, who were most likely to produce NA placement on the subject with expected unaccusatives. The question as to how much this was driven by verb type versus topicality presents an avenue for future research in which sub-types of unaccusative verbs are compared.

The findings from the L2 speaker group contribute to our understanding of the acquisition of metrical prominence, specifically when NA placement is more flexible in the L2 than the L1. Overall, the patterns found in L2 productions suggest the possibility of transfer from the L1 into the L2. We also see that, as proficiency increases, so do the rates of non-final NA placement.

There is still much to understand with regards to L2 acquisition of NA placement. Zubizarreta & Nava (2011) point to the need to understand the sequence of development of variable NA placement with narrow focus versus compounds versus intransitives. This study presents one part of the picture and shows how L2 speakers begin to acquire variable NA placement when the rules are quite complex and driven by both pragmatic and (possibly) lexico-semantic or syntactic factors. It would be interesting to look at this in a broader context to understand which algorithms of NA placement are easier or more difficult to acquire and which linguistic factors contribute to the level of difficulty.

Data accessibility statement

Data from the sentence norming task can be found here: https://osf.io/jyqz3/?view_only=4b6636ee10b14e2693346314cac952f2

For the list of verbs used, see the Verb_List.csv file here: https://osf.io/bm3dp/?view_only=c7170d245e06430aa757993ef0c07bbc

Data from the experiment can be found here: https://osf.io/bm3dp/?view_only=c7170d245e06430aa757993ef0c07bbc

Ethics and consent

This research was approved by the Institutional Review Board at the University of Illinois Urbana Champaign (IRB Protocol #: 17717). All participants were 18 years of age or older and consented to participate in the study.

Notes

  1. For materials for this study, please see here: https://osf.io/jyqz3/?view_only=4b6636ee10b14e2693346314cac952f2. [^]
  2. We also ran a second version of the norming task, in which the target sentences were presented in isolation (with no contextual questions) and rated for naturalness since we recognize that there is a difference between pragmatic incongruence in context and unnaturalness or oddness independent of the context. For example, a sentence such as “A robin ripped” is odd regardless the context, in contrast to “A farmer vanished”, which isn’t odd or unnatural by itself, but may be unexpected in a given context. Despite the conceptual differences between (un)expectedness in context and (un)naturalness, results for the two norming tasks were very similar. Selection of items for the main study was ultimately based on the expectedness norming task reported here. [^]
  3. The full list of verbs is available here: https://osf.io/bm3dp/?view_only=c7170d245e06430aa757993ef0c07bbc. [^]
  4. Following the presentation of the 66 test items, participants saw 24 additional items corresponding to intransitive sentences with the verb in narrow focus; these will not be discussed here. Since these narrow-focus items were presented after the target broad-focus intransitive items, performance on the broad-focus items (of interest to us here) cannot be affected by performance on the narrow-focus items. [^]
  5. While intonation was not explicitly measured, the form of the sentence was kept consistent across items. Items were listened to by the first author (a native English speaker), and nothing strange was noted except for one participant who produced notably strange intonation and was therefore discarded by the analysis. [^]
  6. Materials for each analysis can be found here: https://osf.io/bm3dp/?view_only=c7170d245e06430aa757993ef0c07bbc. [^]

Acknowledgements

The study reported in this paper formed part of a larger, bidirectional study (L1 English L2 Spanish as well as L1 Spanish L2 English) which is reported in the first author’s unpublished dissertation, Landblom (2020). We would like to thank Dr. José Ignacio Hualde (the first author’s dissertation advisor) as well as Dr. Chilin Shih and Dr. Silvina Montrul (the first author’s dissertation committee members), for providing important and valuable feedback on the dissertation. We would like to thank Dr. Marissa Barlaz for her help with statistical consulting. A special thanks goes to our annotators: Daniel Martin Bargon, Elias Decker, Nick Helms, Dominique Jefferson and Chris Kovac. Thanks also to Hannah Epstein, who helped with the annotation and segmentation of the data. We would like to thank Lizbeth Muñoz Cortes who helped tremendously with the stimuli recordings. A special thanks goes to Dr. Viviana Núñez, who very kindly opened her space at the University of Costa Rica, allowing the first author to collect data there. Finally, thanks to all our research participants, and all the teachers and other students both in Costa Rica and in the U.S. who helped in the efforts to recruit participants.

Funding information

Travel to Costa Rica was funded by the University of Illinois Urbana-Champaign Graduate College through a Graduate College Dissertation Travel Grant to the first author.

Competing interests

The authors have no competing interests to declare.

References

Akaike, Hirotugo. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6). 716–723. DOI:  http://doi.org/10.1109/TAC.1974.1100705

Albin, Aaron Lee. 2015. Typologizing native language influence on intonation in a second language: Three transfer phenomena in Japanese EFL learners. Bloomington, IN: Indiana University dissertation.

Allerton, David John & Cruttenden, Allen. 1979. Three reasons for accenting a definite subject. Journal of Linguistics 15(1). 49–53. DOI:  http://doi.org/10.1017/S0022226700013104

Aston, John A.D. & Chiou, Jeng-Min & Evans, Jonathan P. 2010. Linguistic pitch analysis using functional principal component mixed effect models. Journal of the Royal Statistical Society: Series C (Applied Statistics) 59(2). 297–317. DOI:  http://doi.org/10.1111/j.1467-9876.2009.00689.x

Bates, Douglas & Mӓchler, Martin & Bolker, Benjamin M. & Walker, Steven C. 2015. Fitting linear mixed effects models using lme4. Journal of Statistical Software 67(1). 1–48. DOI:  http://doi.org/10.18637/jss.v067.i01

Bell, Alan & Brenier, Jason M. & Gregory, Michelle & Girand, Cynthia & Jurafsky, Dan. 2009. Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language 60(1). 92–111. DOI:  http://doi.org/10.1016/j.jml.2008.06.003

Best, Catherine T. 1994. The emergence of native-language phonological influences in infants: A perceptual assimilation model. In Nussbaum, Howard C. (ed.), The development of speech perception: The transition from speech sounds to spoken words, 167–244. Cambridge, MA: MIT Press.

Best, Catherine T. 1995. A direct realist view of cross-language speech perception. In Strange, Winifred (ed.), Speech perception and linguistic experience: Theoretical and methodological issues in cross-language speech research, 171–204. Timonium, MD: York Press.

Bolinger, Dwight L. 1954. English prosodic stress and Spanish sentence order. Hispania 37(2). 152–156. DOI:  http://doi.org/10.2307/335628

Bro, Rasmus & Smilde, Age K. 2014. Principal component analysis. Analytical Methods, 6(9). 2812–2831. DOI:  http://doi.org/10.1039/C3AY41907J

Burdin, Rachel Steindel & Phillips-Bourass, Sara & Turnbull, Rory & Yasavul, Murat & Clopper, Cynthia G. & Tonhauser, Judith. 2015. Variation in the prosody of focus in head-and head/edge prominence languages. Lingua 165. 254–276. DOI:  http://doi.org/10.1016/j.lingua.2014.10.001

Calhoun, Sasha & La Cruz, Erwin & Olssen, Ana. 2018. The interplay of information structure, semantics, prosody, and word ordering in Spanish intransitives. Laboratory Phonology: Journal of the Association for Laboratory Phonology 9(1). 1–30. DOI:  http://doi.org/10.5334/labphon.65

Carlisle, Robert S. 1998. The acquisition of onsets in a markedness relationship: A longitudinal study. Studies in Second Language Acquisition 20. 245–260. DOI:  http://doi.org/10.1017/S027226319800206X

Chomsky, Noam & Halle, Morris. 1968. The sound pattern of English. Harper & Row New York.

Cole, Jennifer & Hualde, José I. & Smith, Caroline L. & Eager, Christopher & Mahrt, Timothy & Napoleão de Souza, Ricardo. 2019. Sound, structure and meaning: The bases of prominence ratings in English, French and Spanish. Journal of Phonetics 75. 113–147. DOI:  http://doi.org/10.1016/j.wocn.2019.05.002

Cooper, William E. & Eady, Stephen J. & Mueller, Pamela R. 1985. Acoustical aspects of contrastive stress in question-answer contexts. The Journal of the Acoustical Society of America 77(6). 2142–2156. DOI:  http://doi.org/10.1121/1.392372

Dupoux, Emmanuel & Pallier, Christophe & Sebastian, Nuria & Mehler, Jacques. 1997. A destressing ‘deafness’ in French? Journal of Memory and Language 36. 406–421. DOI:  http://doi.org/10.1006/jmla.1996.2500

Eckman, Fred. 1977. Markedness and the contrastive analysis hypothesis. Language Learning 27(2). 315–330. DOI:  http://doi.org/10.1111/j.1467-1770.1977.tb00124.x

Escudero, Paola. 2005. Linguistic perception and second language acquisition: Explaining the attainment of optimal phonological categorization. Utrecht, Holland: Utrecht University dissertation.

Escudero, Paola. 2009. Linguistic perception of “similar” L2 sounds. Phonology in Perception 15. 152–190.

Flege, James Emil. 1987. The production of “new” and “similar” phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics 15(1). 47–65. DOI:  http://doi.org/10.1016/S0095-4470(19)30537-6

Flege, James Emil. 1995. Second language speech learning: Theory, findings, and problems. Speech Perception and Linguistic Experience: Issues in Cross-Language Research 92. 233–277.

Flege, James Emil. 2003. Assessing constraints on second-language segmental production and perception. Phonetics and phonology in language comprehension and production: Differences and Similarities 6. 319–355.

Flege, James Emil & MacKay, Ian R.A. 2004. Perceiving vowels in a second language. Studies in Second Language Acquisition 26. 1–34. DOI:  http://doi.org/10.1017/S0272263104261010

Gabriel, Cristoph. 2010. On focus, prosody, and word order in Argentinean Spanish: A minimalist OT account. Revista Virtual de Estudos da Linguagem 4. 183–222.

Gussenhoven, Carlos. 1983. Focus, mode and the nucleus. Journal of Linguistics 19(2). 377–417. DOI:  http://doi.org/10.1017/S0022226700007799

Hertel, Tammy Jandrey. 2003. Lexical and discourse factors in the second language acquisition of Spanish word order. Second Language Research 19(4). 273–304. DOI:  http://doi.org/10.1191/0267658303sr224oa

Hirsch, Aron & Wagner, Michael. 2011. Patterns of prosodic prominence in English intransitive sentences. GLOW (34).

Hoot, Bradley & Leal, Tania. 2020. Processing subject focus across two Spanish varieties. Probus 32(1). 93–127. DOI:  http://doi.org/10.1515/probus-2019-0004

Hualde, José Ignacio. 2005. The sounds of Spanish with audio CD. Cambridge University Press.

Huang, Yuan & Guo, Diansheng & Kasakoff, Alice & Grieve, Jack. 2016. Understanding US regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems 59. 244–255. DOI:  http://doi.org/10.1016/j.compenvurbsys.2015.12.003

Ionin, Tania & Montrul, Silvina. 2010. The role of L1 transfer in the interpretation of articles with definite plurals in L2 English. Language Learning 60(4). 877–925. DOI:  http://doi.org/10.1111/j.1467-9922.2010.00577.x

Irwin, Patricia. 2011. Intransitive sentences, argument structure, and the syntax-prosody interface. In Washburn, Mary Byram & McKinney-Bock, Katherine & Varis, Erika & Sawyer, Ann & Tomaszewicz, Barbara (eds.), Proceedings of the 28th west coast conference on formal Linguistics, 275–284. Somerville, MA: Cascadilla Proceedings Project.

Irwin, Patricia. 2012. Unaccusativity at the interfaces. New York University dissertation.

Josse, Julie & Husson, François. 2016. MissMDA: A package for handling missing values in multivariate data analysis. Journal of Statistical Software 70(1). 1–31. DOI:  http://doi.org/10.18637/jss.v070.i01

Kahnemuyipour, Arsalan. 2009. The syntax of sentential stress. Oxford University Press. DOI:  http://doi.org/10.1093/acprof:oso/9780199219230.001.0001

Kim, Ji Young. 2016. The perception and production of prominence in Spanish by heritage speakers and L2 learners. Urbana, IL: University of Illinois dissertation.

Krahmer, Emiel & Swerts, Marc. 2001. On the alleged existence of contrastive accents. Speech Communication 34. 391–405. DOI:  http://doi.org/10.1016/S0167-6393(00)00058-3

Kuznetsova, Alexandra & Brockhoff, Per Bruun & Christensen, Rune Haubo Bojesen. 2015. Package ‘lmertest’. R Package Version 2(0).

Ladd, Robert D. 2008. Intonational phonology. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511808814

Landblom, Stephanie. 2020. L2 acquisition of nuclear accent. Urbana, IL: University of Illinois dissertation. https://osf.io/549sb/?view_only=868fff7c08a04485836c5b9a37a66825

Lê, Sébastien & Josse, Julie & Husson, François. 2008. FactoMineR: An R package for multivariate analysis. Journal of Statistical Software 25(1). 1–18. DOI:  http://doi.org/10.18637/jss.v025.i01

Lenth, Russell & Singmann, Henrik & Love, Jonathon & Buerkner, Paul & Herve, Maxime. 2018. Emmeans: Estimated marginal means, aka least-squares means. R Package Version 1(1).

Levin, Beth & Hovav, Malka Rappaport & Keyser, Samuel Jay. 1995. Unaccusativity: At the syntax-lexical semantics interface, vol. 26. Cambridge, MA: MIT press.

Li, Xiaoqing & Chen, Yiya & Yang, Yufang. 2011. Immediate integration of different types of prosodic information during on-line spoken language comprehension: An ERP study. Brain Research 1386. 139–152. DOI:  http://doi.org/10.1016/j.brainres.2011.02.051

McGory, Julia Tevis. 1997. Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese speakers. The Ohio State University dissertation.

Mennen, Ineke. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics 32(4). 543–563. DOI:  http://doi.org/10.1016/j.wocn.2004.02.002

Mennen, Ineke. 2015. Beyond segments: Towards a L2 intonation learning theory. In Delais-Roussarie, Elisabeth & Avanzi, Mathieu & Herment, Sophie (eds.), Prosody and Language in Contact, 171–188. Berlin and Heidelberg: Springer. DOI:  http://doi.org/10.1007/978-3-662-45168-7_9

Morrill, Tuuli. 2011. Acoustic correlates of stress in English adjective-noun compounds. Language and Speech 55(2). 167–201. DOI:  http://doi.org/10.1177/0023830911417251

Munro, Murray J. & Derwing, Tracey M. 2008. Segmental acquisition in adult ESL learners: A longitudinal study of vowel production. Language Learning 58(3). 479–502. DOI:  http://doi.org/10.1111/j.1467-9922.2008.00448.x

Nguyễn, T. Anh-Thư & Ingram, CL John & Pensalfini, J. Rob. 2008. Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns. Journal of Phonetics 36(1). 158–190. DOI:  http://doi.org/10.1016/j.wocn.2007.09.001

O’Neill, Robert & Cornelius, Edward T. & Washburn, Gay N. 1981. American kernel lessons: Advanced: Student’s book. United Kingdom: Longman Publishing Group.

Peirce, Jonathan W. 2007. PsychoPy—psychophysics software in Python. Journal of Neuroscience Methods 162(1–2). 8–13. DOI:  http://doi.org/10.1016/j.jneumeth.2006.11.017

Peirce, Jonathan W. 2009. Generating stimuli for neuroscience using PsychoPy. Frontiers in Neuroinformatics 2. DOI:  http://doi.org/10.3389/neuro.11.010.2008

Rasier, Laurent & Hiligsmann, Philippe. 2009. Exploring the L1-L2 relationship in the L2 acquisition of prosody, In Online Proceedings of First and Second Languages: Exploring the Relationship in Pedagogy-Related Contexts. Oxford, UK.

Riggenbach, H. 1991. Toward an understanding of fluency: A microanalysis of nonnative speaker conversations. Discourse Processes 14(4). 423–441. DOI:  http://doi.org/10.1080/01638539109544795

Schmerling, Susan F. 1976. Aspects of English sentence stress. University of Texas Press. DOI:  http://doi.org/10.7560/703124

Selkirk, Elisabeth. 1995. Sentence prosody: Intonation, stress, and phrasing. In Goldsmith, John A. (ed.), The handbook of phonological theory 1, 550–569. DOI:  http://doi.org/10.1111/b.9780631201267.1996.00018.x

Sorace, A. 2000. Gradients in auxiliary selection with intransitive verbs. Language. 859–890. DOI:  http://doi.org/10.2307/417202

Steele, Jeffrey. 2002. Representation and phonological licensing in the L2 acquisition of prosodic structure. Montreal, Quebec: McGill University dissertation.

Tremblay, Annie. 2011. Proficiency assessment standards in second language acquisition research: “Clozing” the gap. Studies in Second Language Acquisition. 339–372. DOI:  http://doi.org/10.1017/S0272263111000015

Vallduví, Enric. 1991. The role of plasticity in the association of focus and prominence, In No, Yongkyoon & Libucha, Mark (eds.), Proceedings of the Eastern States conference on Linguistics, 295–306. Columbus, OH: Ohio State University Press.

van Maastricht, Lieke & Krahmer, Emiel & Swerts, Marc. 2016. Prominence patterns in a second language: Intonational transfer from Dutch to Spanish and vice versa. Language Learning 66(1). 124–158. DOI:  http://doi.org/10.1111/lang.12141

van Maastricht, Lieke & Zee, Tim & Krahmer, Emiel & Swerts, Marc. 2020. The interplay of prosodic cues in the L2: how intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility. Speech Communication 133. 81–90. DOI:  http://doi.org/10.1016/j.specom.2020.04.003

Vanrell Bosch, Maria del Mar & Fernández Soriano, Olga. 2013. Variation at the interfaces in Ibero Romance. Catalan and Spanish prosody and word order. Catalan Journal of Linguistics, 12. 253–282. DOI:  http://doi.org/10.5565/rev/catjl.63

Watson, Duane G. & Arnold, Jennifer E. & Tanenhaus, Michael K. 2008. Tic Tac TOE: Effects of predictability and importance on acoustic prominence in language production. Cognition 106(3). 1548–1557. DOI:  http://doi.org/10.1016/j.cognition.2007.06.009

Xu, Yi. 2013. ProsodyPro—A tool for large-scale systematic prosody analysis. Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence. 7–10.

Xu, Yi & Xu, Ching X. 2005. Phonetic realization of focus in English declarative intonation. Journal of Phonetics 33(2). 159–197. DOI:  http://doi.org/10.1016/j.wocn.2004.11.001

Yang, Charles D. 2002. Knowledge and learning in natural language. Oxford University Press on Demand.

Zubizarreta, Maria Luisa. 1998. Prosody, focus, and word order. Cambridge, MA: MIT Press.

Zubizarreta, Maria Luisa & Nava, Emily. 2011. Encoding discourse-based meaning: Prosody vs. syntax. Implications for second language acquisition. Lingua 121(4). 652–669. DOI:  http://doi.org/10.1016/j.lingua.2010.06.013

Zubizarreta, Maria Luisa & Vergnaud, Jean-Roger. 2006. Phrasal stress and syntax. In Everaert, Martin & van Riemsdijk, Henk (eds), The Blackwell companion to syntax, vol. 1, 522–568. Blackwell Publishing Ltd. DOI:  http://doi.org/10.1002/9780470996591.ch49