1 Introduction

Iconicity — the resemblance between aspects of form and aspects of meaning — is an important property of all human languages, spoken and signed (Perniss et al. 2010; Dingemanse et al. 2015; Ferrara & Hodge 2018). A recent upsurge of work on iconicity has shown it to play a role in language processing (Bosworth & Emmorey 2010; Vinson et al. 2015; Sidhu et al. 2020), speech production (Shintel et al. 2006; Perlman et al. 2015a), language acquisition (Imai et al. 2008; Imai & Kita 2014; Asano et al. 2015; Perry et al. 2015, 2017), language evolution (Fay et al. 2013; Imai & Kita 2014; Perlman et al. 2015b; Macuch Silva et al. 2020; Ćwiek et al. 2021), playful language (Dingemanse & Thompson 2020), direct quotations (Clark & Gerrig 1990), as well as the communication of sensory impressions and perceptual detail (Kita 1997; Winter et al. 2017; Lu & Goldin-Meadow 2018).

There are many different communicative phenomena that reflect iconicity. One phenomenon that has received a lot of attention, often called “size sound symbolism,” involves the association of speech sounds with large/small size. For example, speech sounds with more high-frequency components, such as the vowel /i/, are associated with smallness (Tarte 1982; Knoeferle et al. 2017). This is thought to derive from the fact that smaller things (e.g., animals, objects) typically produce higher-frequency sounds (Ohala 1983). This makes the association of particular vowels with size concepts a form of iconicity, because the acoustic qualities of ‘small’ vowels such as /i/ and /ɪ/ resemble the sounds produced by small things in the world, and vice versa for ‘large’ vowels such as /a/, /ɑ/, /ɔ/, and /o/.

The association between size and speech sounds has been noted for a long time (Wedgwood 1845; Jespersen 1922), with Sapir (1929) being the first to demonstrate experimentally that English-speaking participants match pseudowords such as mil with small objects, and pseudowords such as mal with large objects. A diverse set of experiments has directly or conceptually replicated this basic pattern (Newman 1933; Birch & Erickson 1958; Greenberg & Jenkins 1966; Johnson 1967; Tarte & Barritt 1971; Tarte 1982; Berlin 2006; Parise & Spence 2009; Baxter & Lowrey 2011; P. D. Thompson & Estes 2011; Parise & Spence 2012; Baxter et al. 2015; Auracher 2017; Knoeferle et al. 2017). In addition, typological research has consistently found that high front vowels are associated with the concept ‘small’ in a large number of the world’s languages (Ultan 1978; Fitch 1994; Haynie et al. 2014; Blasi et al. 2016; Johansson et al. 2019). Yet, these cross-linguistic generalizations are often based on a single word pair (e.g., small/large). Does size sound symbolism also characterize larger swaths of vocabulary within a respective language? From the very first experimental studies with nonce words such as Sapir’s mil/mal, researchers voiced skepticism whether these studies on pseudowords had anything to do with the actual vocabularies of natural languages (Bentley & Varon 1933).

When speaking of size sound symbolism being a property of the English lexicon, we are interested demonstrating its “systematicity”, defined by Dingemanse et al. (2015: 604) as “a statistical relationship between the patterns of sound for a group of words and their usage.” Such systematicity is, in theory, orthogonal to iconicity (Nielsen 2016; Nielsen & Dingemanse 2020): a systematic pattern in the lexicon can be motivated by iconicity or it can be non-iconic, such as when resulting from an accident of language history (cf. discussion in Cuskley & Kirby 2013). As an example of a non-iconic systematic pattern, consider the fact that in English, the voiced dental fricative /ð/ occurs word-initially only in function words (the, that, this, there) (Bloomfield 1933: 244). This association between sound and function is not obviously rooted in any form of iconicity, as it is not clear how the phoneme /ð/ could be taken to resemble the corresponding meanings. Other patterns are systematic and iconic, such as the fact that in Chaoyang, ideophones denoting abrupt sounds are more likely to end in stops, such piak and pak ‘sound of a gunfire’, tiak ‘clacking sound of an abacus’, and kok ‘sound of hen clucking’ (Thompson & Do 2019). The systematicity of this pattern is evidenced by the fact that this form-meaning pairing recurs across several different Chaoyang ideophones. The iconicity of these ideophones is presumably grounded in the fact that words ending in stops involve a more abrupt articulatory closure as well as a more abrupt acoustic offset (Rhodes 1994). Evidence that this is a case of iconicity rather than mere systematicity comes from the fact that the same pattern (stop-final words for abrupt offsets) has been reported numerous times for a diverse set of languages (Stoddart 1858; Sommer 1933; Wissemann 1954; Hamano 1998: 86; Assaneo et al. 2011; Blake 2017; Friberg et al. 2018). This suggests that this systematic pattern is rooted in a universally accessible sense of resemblance between sound and meaning that is used by speakers of different languages.

In the case of English, researchers have been keen to point out that there are salient counterexamples to size sound symbolism, such as notably the adjectives big and small having high and low vowels respectively (Jespersen 1922: 406; Wescott 1971: 421). Empirical analyses of the English lexicon have produced mixed results. Using a thesaurus to assemble a list of words for small and large concepts, Newman (1933) failed to find an association between vowels and size. But as noted by Brown (1958: 119), his list included “many words whose association with size is remote, e.g., decimate, descend, wretched, stalwart.” In another analysis of 181 highly frequent English monosyllables, Thorndike (1945: 10) found that /i/ and /ɪ/ are associated with smallness, and that /ɔ/ and /oʊ/ are associated with largeness. However, Thorndike’s study was also problematic in that it relied on his own, subjective introspection to judge a word’s semantic size.

In later work, Johnson (1967) asked American English speakers to generate as many words as they could that suggest smallness or largeness, yielding 324 unique word types. This set exhibited evidence for size sound symbolism, with /i/ being much more frequent in words for small as opposed to large concepts, in contrast to /a/, /u/, and /o/, which showed the reverse pattern. Yet, this procedure of generating words is problematic because the existence of phonological priming effects and semantic priming effects has the potential to inflate sound symbolic properties. In addition, Johnson’s (1967) word list is unpublished, leaving the possibility that it contains many etymologically related forms, which would artificially inflate the sample size by adding non-independent cases to a statistical analysis. Finally, Katz (1986) investigated a set of 60 words with size ratings from Paivio (1975), failing to replicate the association between vowels and size previously established by Thorndike (1945) and Johnson (1967). However, Katz’s word list included ‘small’ nouns such as butterfly, toe, and grape, and his ‘large’ words included nouns such tractor, iceberg, and elephant. While these words arguably denote small and large objects respectively, they are not words that relate specifically to size.

In summary, despite decades of interest in the phenomenon of size sound symbolism, there is no conclusive evidence that speech sounds are associated with semantic size across distinct word types in the English lexicon. Besides controversy about whether the word lists were suitable (Brown 1958: 119; Taylor 1963: 201), a persistent methodological problem of this line of research is the sole focus on vowels, even though voicing (Taylor 1963; Klink 2000; Haryu & Zhao 2007; Shinohara & Kawahara 2010; P. D. Thompson & Estes 2011; Johansson et al. 2019; Kawahara & Kumagai 2019) and place of articulation (Klink 2000; Preziosi & Coane 2017) also have also been implicated in size symbolism. For example, Taylor (1963) already suggested that Newman (1933) may have missed that his list of words for large concepts features an overabundance of the velar stops /g/ and /k/ (gargantuan, glaring, great, gross, colossus, cargo, comprehensive, corporation, corpulence). Thus, what is needed is a statistical approach that takes all potentially relevant phonemes into account. Here, we use a machine-learning algorithm — random forests (Breiman 2001) — that is able to incorporate a large number of predictor variables, allowing us to investigate the simultaneous impact of all English phonemes.

2 Methods

All data and code are available under the following OSF repository: https://osf.io/9q4nc/.

2.1 Creation of the word list

2.1.1 Size adjective list

Our goal was to compile a word list that was unbiased, as exhaustive as possible, not generated by participants, more homogeneous than previous lists, and included only words directly about size without mixing different parts of speech. To achieve this, we used https://www.thesaurus.com/ to extract 223 words which were listed as synonyms of tiny, small, large, big, and huge.1 In our analysis, we focus on words that have only one stem (including those with reduplicated stems), which excludes pint-sized, pocket-size, pocket-sized, yea big, barn door, small-scale, a whale of a, super colossal, oversize, undersize, and sizable. The word pocket (from pocket-sized) was excluded because its primary sense is nominal. We further excluded words whose first synonym in the thesaurus was not an adjective and/or not obviously size-related. This led to the exclusion of the following 13 forms: populous, prodigious, spacious, copious, ample, substantial, monster, magnificent, generous, immeasurable, insignificant, trifling, minimum.

The complete list of 36 large adjectives (synonyms of big/large/huge) included:

behemontic, big, bulky, colossal, considerable, cyclopean, elephantine, enormous, gargantuan, giant, gigantic, goodly, great, gross, hefty, huge, hulking, humongous, immense, jumbo, large, leviathan, mammoth, massive, mondo, mountainous, prodigious, strapping, substantial, titanic, towering, tremendous, vast, voluminous, walloping, whopping

The complete list of 31 small adjectives (synonyms of tiny/small) included:

baby, bantam, bitsy, bitty, diminutive, infinitesimal, itsy-bitsy, itty-bity, Lilliputian, little, meager, microscopic, midget, mini, miniature, minikin, minuscular, miniscule, minute, pee-wee, petite, puny, runty, slight, small, teensy, teensy-weensy, teeny, tiny, wee

The underlined words are etymologically related or suspected to be related to at least one other word within the set, as established via the Oxford English Dictionary (http://oed.com/).2 Here, we only analyze a dataset of 52 adjectives containing one of each of the related forms so as to not artificially inflate our sample size with non-independent cases. The online scripts show that the same result is obtained with all adjectives (N = 66).

We stripped all derivational morphology away from the word’s root as we do not want grammatical information (e.g., the -ing suffix) to be counted. This also involves excluding the diminutive suffix -y, as in baby (the diminutive of babe), bitsy, bitty, itsy-bitsy, itty-bitty, puny, runty, teensy, teensy-weensy, teeny, and tiny (which comes from tine + -y). Although -y may be a grammaticized version of size sound symbolism (Waugh 1994; Shih & Rudin 2020: 3), including this morpheme would unduly bias our results towards /i/ being associated with small size. Our stemming procedure also excluded Latin and Greek morphology that may not be productive in English anymore (e.g., -al in colossal, or -ic in titanic). Thus, the word humongous is represented as humong- in our data, and the word colossal as coloss-.

2.1.2 Processing Glasgow norms

We additionally analyzed a set of 5,553 words rated for size by 829 native English speakers on a 7-point scale (Scott et al. 2019). These ‘Glasgow norms’ were collected on word senses, not word forms (e.g., arm ‘limb’ versus arm ‘weapon’). We averaged ratings across word senses, yielding a reduced set of 4,683 words. From this, we extracted all monomorphemic words (based on English Lexicon Project, Balota et al. 2007) that were either nouns, verbs, or adjectives (based on SUBTLEX, Brysbaert et al. 2012), yielding a total of 2,667 words (330 adjectives, 472 verbs, 1865 nouns).

2.1.3 Carnegie Mellon University Pronunciation Dictionary

We used phonological transcriptions from the open-source Carnegie Mellon University Pronunciation Dictionary. This dictionary is based on North American English (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) and contains over 134,000 English words. For the adjective list, we hand-supplied transcriptions where CMU data was unavailable.

2.2 Statistical analysis

All analyses were conducted with R 4.0.2 (R Core Team 2019) and the ‘tidyverse’ package 1.3.0 (Wickham et al. 2019). We used ‘ranger’ 0.12.1 (Wright & Ziegler 2017) for random forests, and ‘effsize’ 0.8.0 (Torchiano 2019) and ‘lsr’ 0.5 (Navarro 2015) for effect sizes.

The core of our analysis uses random forests, a machine-learning algorithm that builds an ensemble of decision trees, in our case classification trees because our main response variable is categorical (‘large’ versus ‘small’ adjectives). Decision trees work by recursively splitting the data based on a split criterion, such as trying to minimize the variance at each node of the tree. For example, a decision tree could partition the data into those words that have a /t/ as opposed to those that do not have a /t/ based on the fact that many words with this sound are small adjectives, and most words without it are large adjectives (as demonstrated below). Then, within the /t/-featuring words, another branch of the tree could be added by further noticing that those words with /t/ and /i/ are even more likely to be small adjectives, and so on.

Random forests grow such trees using only a random subset of predictor variables that is different for each tree, e.g., one tree may be grown with the predictors /t/, /b/, /o/, another tree with /ð/, /e/, /m/, and so on. The intuition behind this approach is that certain predictors may mask the effects of other predictors, and only by accumulating insight from a number of trees with different combinations of predictor variables do we get a more stable estimate of how much each variable contributes to the overall prediction. In addition to only considering a random subset of variables, each tree is grown on a random bootstrap sample of the data. The performance of each tree is evaluated on the data that the tree has not witnessed, the so-called “out-of-bag sample.” Here, we focus on the out-of-bag error rate (OOB error), an estimate of how well the random forest generalizes to unseen data. To the extent that the OOB error is low, it is possible to predict whether size adjectives are large or small from phonological structure alone. That is, a low OOB error indicates systematicity in the lexicon, as it indicates that phonology predicts semantics.

We used random forests because they have been argued to be appropriate for “small n, high p” situations that involve many different features that are potentially collinear (Strobl et al. 2009). Collinearity can be expected when doing analyses of the phonological structure of words since due to phonotactic constraints, particular phonemes tend to occur together. Moreover, we have a large set of predictors (36 phonemes) to predict large/small size semantics, as well as — in the case of the size adjectives — a very small dataset (52 adjectives). Each predictor codes for the presence/absence of a phoneme in the stem. We used this binary presence/absence measure rather than the raw count of phonemes because most phonemes only occur once per word.

Random forests have various parameters that need to be tuned for each dataset (Probst et al. 2019), such as the number of random variables considered for each tree. Here, we used the ‘tuneRanger’ package version 0.5 (Probst et al. 2019) for hyperparameter tuning, which uses sequential model-based optimization (Hutter et al. 2011) for finding the best parameter values. We used tuneRanger to tune hyperparameters individually for each dataset (see OSF repository for exact specifications: https://osf.io/9q4nc/). We ran all random forests with 1,000 trees. The size adjective analysis features class imbalance, with nearly twice as many large (38) as small (18) adjectives. Because ranger’s default “gini” split rule is known to be biased in the presence of class imbalance (Strobl et al. 2007), we used the “extratrees” split rule instead (Geurts et al. 2006).

3 Results

3.1 Size adjectives

For the 52 etymologically unrelated adjectives, the random forest was able to predict the large/small distinction with very high classification accuracy (training accuracy: 98.1%) and, more importantly, relatively low out-of-bag prediction error (OOB = 22.30%) indicating that this finding would generalize well to unseen data. The accuracy of this random forest is much better than what would be expected if we naïvely assigned the majority category (in this case, large adjectives) regardless of which phonemes a word contains, in which case we would be accurate only 65.38% of the time. This shows that for size adjectives, sound structure is highly predictive of semantic size. Table 1 shows the probabilities of individual words belonging to each category. Only one word, runty, was misclassified. Based on its phonological properties, the random forest assigned it to the ‘large’ rather than the ‘small’ set. Additionally, the random forest was undecided for the word small, which was neither assigned to be part of the ‘small’, nor of the ‘large’ set.

Table 1

Small and large adjectives with the probability of the word being assigned to the large category (analysis on etymologically unrelated forms).

Small word p(large) Large word p(large)
petite 0.11 large 0.95
teensy 0.12 colossal 0.94
pee-wee 0.14 hulking 0.93
lilliputian 0.15 mondo 0.92
meager 0.15 whopping 0.90
mini 0.16 walloping 0.90
little 0.18 leviathan 0.90
diminutive 0.21 gargantuan 0.89
infinitesimal 0.21 hefty 0.89
midget 0.23 towering 0.89
bitsy 0.28 goodly 0.89
microscopic 0.45 gross 0.88
baby 0.46 prodigious 0.88
slight 0.49 vast 0.85
puny 0.49 immense 0.84
bantam 0.49 mammoth 0.83
small 0.50 tremendous 0.83
runty* 0.57 bulky 0.82
giant 0.82
massive 0.81
voluminous 0.8
considerable 0.78
huge 0.78
titanic 0.78
strapping 0.76
elephantine 0.76
mountainous 0.76
jumbo 0.74
big 0.72
great 0.72
enormous 0.71
substantial 0.66
cyclopean 0.62
behemontic 0.55

A look at variable importances (Figure 1a) suggests that four phonemes are especially important: /i/, /ɪ/, /t/, and /ɑ/, in order of predictive performance. The vowel /i/ occurred in only 3% of large adjectives as opposed to 28% of small adjectives (Figure 1b). Cramer’s V for contingency tables (df = 1) suggests that this is a medium effect size: V = 0.31. The vowel /ɪ/ occurred in 18% of large adjectives as opposed to 39% of small ones (small-to-medium effect size, V = 0.19). The vowel /ɑ/ occurred in 26% of the large adjectives and none of the small adjectives (V = 0.28). The consonant /t/ occurred in 38% of large adjectives but 61% of small adjectives (V = 0.18).

Figure 1
Figure 1

(a) Relative variable importance (permutation-based); the dashed line represents the absolute value of the lowest variable importance, which can be used as a heuristic cut-off threshold for predictors that contribute strongly to the random forest’s overall predictive performance (Strobl et al. 2009); (b) Proportion of size adjectives in the large/small class for the four most predictive phonemes.

It is noteworthy that three out of the four most important phonemes were vowels, which appears to be in line with the fact that most studies focusing on size sound symbolism have focused on vowels (Sapir 1929; Katz 1986; Haynie et al. 2014). To assess the extent to which predictive accuracy depends on vowels and consonants, we ran separate random-forest analyses with consonant predictors and vowel predictors only. Both analyses revealed a stark drop in predictive accuracy (OOB error for vowels only: 34.54%; for consonants only: 34.62%), which suggests that both vowels and consonants are needed to attain high predictive accuracy.

3.2 Glasgow norms

Next, we assessed to what extent the patterns observed for size adjectives extend to other types of words. The same 36 binary predictors were used to predict the Glasgow size ratings, which were discretized for comparability with the size adjective analysis (median split). The overall accuracy was 72.21%, much lower than what was observed for the size adjectives. Crucially, the out-of-bag predictor error was nearly double compared to the size adjectives (OOB = 42.11%). This is not a result of the dichotomization of the continuous size ratings, as suggested by the fact that the effect sizes for the most predictive phonemes are also negligible in the continuous case (Cohen’s d = 0.19, 0.22, 0.31, for the three most predictive phonemes). Even if we take the 10th percentile most extreme large/small words, the OOB error is still nearly double as large (41.46%) than for the above analysis size adjectives; the same applies to the 20th percentile most extreme words (OOB = 37.65%). A look at the variable importances also reveals that the pattern of importances does not make sense with respect to the existing literature on size sound symbolism, with neither /i/ nor /ɪ/ being indicated to have any predictive power. Instead, for example, the most predictive phoneme for the median-split random forest turned out to be /r/, which was slightly more frequent in the ‘large’ words (31.90%) than in the ‘small’ words (24.20%). However, the effect size of even this most predictive phoneme was minimal (Cramer’s V = 0.08), and much smaller than the effect sizes we observed for size adjectives.

Given the fact that lexical categories differ in how much they are prone to iconicity (Perry et al. 2015; Winter et al. 2017), we also ran separate random forest analyses for nouns, verbs, and adjectives. The predictive error was still twice as high for the adjectives from the Glasgow ratings (OOB = 42.73%) as for the set of size adjectives, and it was similarly high for nouns (OOB = 42.90%) and verbs (OOB = 45.55%). Table 2 summarizes the OOB error for all analyses performed on the Glasgow ratings, highlighting that across the board, the out-of-bag prediction error is much higher than in the case of the size adjectives.

Table 2

OOB error rates are low for all analyses performed on the Glasgow size ratings (Scott et al. 2019).

Dataset Dataset N Chance baseline (majority class) Out-of-bag prediction error
size adjectives (thesaurus) all size adjectives 52 65.38% 23.08%
all size adjectives (vowels only) 52 65.38% 36.54%
all size adjectives (consonants only) 52 65.38% 34.62%
Glasgow size ratings all size ratings 2,677 50.24% 42.11%
10% most large/small words 533 50.09% 41.46%
20% most large/small words 1,065 50.14% 37.65%
nouns only 1,865 52.98% 42.90%
verbs only 472 57.63% 45.55%
adjectives only 330 53.94% 42.73%

4 Discussion

Our results show that size sound symbolism is a systematic property of English words, but it specifically resides in the lexical domain of size adjectives. Our results are broadly consistent with empirical evidence from iconicity rating studies which shows that adjectives are generally rated to be high in iconicity, especially when compared to nouns (Perry et al. 2015; Winter et al. 2017). More generally, our results fit with the observation that adjectives are more focused on singling out specific semantic dimensions, especially in contrast to nouns (Lynott & Connell 2013; Winter 2019). Given that size is just one perceptual dimension among many, nouns are arguably too multidimensional to allow for strong generalizations on size sound symbolism. This may partially explain why Katz (1986) obtained a null result in his analysis: for many of the words he considered (e.g., ‘small’: grape, toe, butterfly; ‘large’: tractor, iceberg, elephant), size is only one semantic dimension among many others. We conclude that in investigations of size sound symbolism, it matters where one looks in the lexicon.

It also matters how one investigates size sound symbolism. Our study improved on past methodology by being agnostic to which phonemes should matter for size sound symbolism. Rather than performing separate hypothesis tests for individual phonemes, we used a bottom-up machine-learning algorithm that treats all phonemes equally. This method converges on the high front vowels /ɪ/ and /i/ predicting ‘small’, even though we excluded the diminutive suffix -y. In contrast, the low back vowel /ɑ/ is associated with ‘large’. The only consonant that mattered was /t/, a voiceless plosive, which was associated with ‘small’. Thus, altogether our results suggest that there are more English vowels that matter to size sound symbolism than consonants. However, the comparison between the random forest that considers all phonemes as opposed to the random forests that only used vowel predictors or consonant predictors shows that all phonemes are needed to achieve an accurate prediction. This may further explain why past research has failed to find size sound symbolism, since researchers, following the early experimental investigations of Sapir (1929), have generally focused their studies on vowels (e.g., Thorndike 1945; Johnson 1967; Katz 1986; Haynie et al. 2014).

It is interesting that specifically the voiceless alveolar stop /t/ turned out to be the most predictive consonant. Kawahara et al. (2018) observed that larger fictional creatures in the Japanese video game Pokémon were more likely to contain more voiced obstruents. Several other studies have found voicing to be associated with size in various languages (Klink 2000; Haryu & Zhao 2007; Shinohara & Kawahara 2010; P. D. Thompson & Estes 2011; Johansson et al. 2019). A number of different acoustic cues distinguish voiced from voiceless stops in English (Lisker 1986). Among other cues, voiced stops induce lower pitch in surrounding vowels (Kingston & Diehl 1994). In addition, voiceless stops have higher spectral components in the release of the stop than voiced consonants (Chodroff & Wilson 2014). Interestingly, these high spectral components are highest for the alveolar place of articulation (Chodroff & Wilson 2014), which could be a factor in making /t/ particularly suitable for the depiction of small size. All of this is suggestive of John Ohala’s Frequency Code (Ohala 1983), according to which iconicity for size mimics the acoustics of small objects or animals, which tend to produce higher frequency sounds.

Our results also provide further support for the idea that sound symbolism is a probabilistic phenomenon (Kawahara et al. 2019). Given that phonemes primarily serve to distinguish meanings within the lexicon, it is futile to look for strict rule-like correspondences, such as all /i/ being always ‘small’. Past discussions of size sound symbolism have over-emphasized individual counterexamples, thereby obscuring broader patterns in the lexicon. For example, linguists have repeatedly noted that small is as an exception to size sound symbolism (Jespersen 1922; Wescott 1971), and indeed, our bottom-up machine learning algorithm agrees with this intuition. However, along with runty, small turns out to be one of just two exceptions across all English size adjectives. Thus, these results show that an over-emphasis of a few individual examples can distract from generalizations that can be made across cohorts of words.

It is useful to compare our findings to the literature on the cross-linguistic study of size sound symbolism, which has reported similar findings (Ultan 1978; Haynie et al. 2014; Blasi et al. 2016; Johansson et al. 2019). These studies generally focus on fewer words in order to cover more languages, thus trading cross-linguistic breadth for within-language depth. The findings of these studies show that the pattern that we have observed here across words in a language also exists across languages. These two facts taken together suggests that the systematicity established here is rooted in a genuine crossmodal correspondence between sound and size, rather than just being a statistical fluke of the English language. Thus, the association of high/low-frequency sounds with smallness/largeness is a universal statistical tendency, and it can be found both across languages and across distinct word types within languages, as demonstrated here for English.

Returning to Bentley and Varon’s (1933) criticism that Sapir’s (1929) experiment of nonce words like mil and mal has nothing to say about the English lexicon, our analyses clearly show that, in fact, the English lexicon does harbor size sound symbolism. This finding is in line with a growing body of work showing that the general vocabulary, not just ‘specialized’ words such as onomatopoeias or ideophones, feature more iconicity than has traditionally been acknowledged (Haynie et al. 2014; Blasi et al. 2016; Joo 2020; Sidhu et al. 2021). Moreover, as we have shown here by correlating phonology and semantics, this iconicity does not only characterize a few words, but larger sets of words, such as our set of English size adjectives. This demonstrates that iconicity can play a role in shaping the vocabulary of spoken languages.


  1. A reviewer pointed out that thesauri are still constructed by lexicographers, which opens up the possibility for bias. However, we are not aware of ways that this bias would favor sound symbolic patterns. Moreover, to our knowledge, our word list is a nearly exhaustive of size terms, and so other lists would, presumably, include a very similar set of words. [^]
  2. Humongous is suspected to combine huge and tremendous; giant and gigantic are related; behemontic and mammoth are suspected to have influenced each other; bitty, bitsy, itsy-bitsy, and itty-bity are all related to each other, and so are teeny, tiny, teensy, and teensy-weensy; as well as wee and pee-wee, and miniscular and miniscule. Finally, the word mini is a truncation of miniature, and the min- form has subsequently been extended to other forms, such as minikin (Bolinger 1949: 60). [^]


Bodo Winter was supported by the UKRI Future Leaders Fellowship MR/T040505/1.

Competing interests

The authors have no competing interests to declare.


Asano, Michiko, Mutsumi Imai, Sotaro Kita, Keiichi Kitajo, Hiroyuki Okada & Guillaume Thierry. 2015. Sound symbolism scaffolds language development in preverbal infants. Cortex 63. 196–205. DOI:  http://doi.org/10.1016/j.cortex.2014.08.025

Assaneo, María Florencia, Juan Ignacio Nichols & Marcos Alberto Trevisan. 2011. The anatomy of onomatopoeia. PloS One 6(12). e28317. DOI:  http://doi.org/10.1371/journal.pone.0028317

Auracher, Jan. 2017. Sound iconicity of abstract concepts: Place of articulation is implicitly associated with abstract concepts of size and social dominance. PloS One 12(11). e0187196. DOI:  http://doi.org/10.1371/journal.pone.0187196

Balota, David A., Melvin J. Yap, Keith A. Hutchison, Michael J. Cortese, Brett Kessler, Bjorn Loftis, Rebecca Treiman, et al. 2007. The English lexicon project. Behavior Research Methods 39(3). 445–459. DOI:  http://doi.org/10.3758/BF03193014

Baxter, Stacey, Jasmina Ilicic, Alicia Kulczynski & Tina Lowrey. 2015. Communicating product size using sound and shape symbolism. Journal of Product & Brand Management. DOI:  http://doi.org/10.1108/JPBM-11-2014-0748

Baxter, Stacey & Tina M. Lowrey. 2011. Phonetic symbolism and children’s brand name preferences. Journal of Consumer Marketing 28(7). 516–523. DOI:  http://doi.org/10.1108/07363761111181509

Bentley, Madison & Edith J. Varon. 1933. An accessory study of ‘phonetic symbolism’. The American Journal of Psychology 45(1). 76–86. DOI:  http://doi.org/10.2307/1414187

Berlin, Brent. 2006. The first congress of ethnozoological nomenclature. Journal of the Royal Anthropological Institute 12(s1). DOI:  http://doi.org/10.1111/j.1467-9655.2006.00271.x

Birch, David & Marlowe Erickson. 1958. Phonetic symbolism with respect to three dimensions from the semantic differential. The Journal of General Psychology 58(2). 291–297. DOI:  http://doi.org/10.1080/00221309.1958.9920401

Blake, Barry J. 2017. Sound symbolism in English: Weighing the evidence. Australian Journal of Linguistics 37(3). 286–313. DOI:  http://doi.org/10.1080/07268602.2017.1298394

Blasi, Damián E., Søren Wichmann, Harald Hammarström, Peter F. Stadler & Morten H. Christiansen. 2016. Sound–meaning association biases evidenced across thousands of languages. Proceedings of the National Academy of Sciences 113(39). 10818–10823. DOI:  http://doi.org/10.1073/pnas.1605782113

Bloomfield, Leonard. 1933. Language. Chicago, IL: Chicago University Press.

Bolinger, Dwight. 1949. The sign is not arbitrary. Thesaurus 1(1). 52–62.

Bosworth, Rain G. & Karen Emmorey. 2010. Effects of iconicity and semantic relatedness on lexical access in american sign language. Journal of Experimental Psychology: Learning, Memory, and Cognition 36(6). 1573. DOI:  http://doi.org/10.1037/a0020934

Breiman, Leo. 2001. Statistical modeling: The two cultures. Statistical Science 16(3). 199–231. DOI:  http://doi.org/10.1214/ss/1009213726

Brown, Roger. 1958. Words and things. Glencoe, IL: The Free Press.

Brysbaert, Marc, Boris New & Emmanuel Keuleers. 2012. Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods 44(4). 991–997. DOI:  http://doi.org/10.3758/s13428-012-0190-4

Chodroff, Eleanor & Colin Wilson. 2014. Burst spectrum as a cue for the stop voicing contrast in American English. The Journal of the Acoustical Society of America 136(5). 2762–2772. DOI:  http://doi.org/10.1121/1.4896470

Clark, Herbert H. & Richard J. Gerrig. 1990. Quotations as demonstrations. Language 764–805. DOI:  http://doi.org/10.2307/414729

Cuskley, Christine & Simon Kirby. 2013. Synesthesia, cross-modality, and language evolution. In Julia Simner & Edward M. Hubbard (eds.), Oxford Handbook of Synesthesia, 869–907. Oxford, UK: Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780199603329.013.0043

Ćwiek, Aleksandra, Susanne Fuchs, Christoph Draxler, Eva Liina Asu, Dan Dediu, Katri Hiovain, Marcus Perlman, et al. 2021. Novel vocalizations are understood across cultures. Scientific Reports 11(1). 10108. DOI:  http://doi.org/10.1038/s41598-021-89445-4

Dingemanse, Mark & Bill Thompson. 2020. Playful iconicity: structural markedness underlies the relation between funniness and iconicity. Language and Cognition, 1–22. DOI:  http://doi.org/10.1017/langcog.2019.49

Dingemanse, Mark, Damián E. Blasi, Gary Lupyan, Morten H. Christiansen & Padraic Monaghan. 2015. Arbitrariness, iconicity, and systematicity in language. Trends in Cognitive Sciences 19(10). 603–615. DOI:  http://doi.org/10.1016/j.tics.2015.07.013

Fay, Nicolas, Michael Arbib & Simon Garrod. 2013. How to bootstrap a human communication system. Cognitive Science 37(7). 1356–1367. DOI:  http://doi.org/10.1111/cogs.12048

Ferrara, Lindsay & Gabrielle Hodge. 2018. Language as description, indication, and depiction. Frontiers in Psychology 9. DOI:  http://doi.org/10.3389/fpsyg.2018.00716

Fitch, William Tecumseh. 1994. Vocal tract length perception and the evolution of language. Providence: Brown University dissertation.

Friberg, Anders, Tony Lindeberg, Martin Hellwagner, Pétur Helgason, Gláucia Laís Salomão, Anders Elowsson, Sten Ternström, et al. 2018. Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields. The Journal of the Acoustical Society of America 144(3). 1467–1483. DOI:  http://doi.org/10.1121/1.5052438

Geurts, Pierre, Damien Ernst & Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63(1). 3–42. DOI:  http://doi.org/10.1007/s10994-006-6226-1

Greenberg, Joseph H. & James J. Jenkins. 1966. Studies in the psychological correlates of the sound system of American English. Word 22(1–3). 207–242. DOI:  http://doi.org/10.1080/00437956.1966.11435451

Hamano, Shoko Saito. 1998. The sound-symbolic system of Japanese. Stanford, CA: CSLI.

Haryu, Etsuko & Lihua Zhao. 2007. Understanding the symbolic values of Japanese onomatopoeia: Comparison of Japanese and Chinese speakers. Shinrigaku Kenkyu: The Japanese Journal of Psychology 78(4). 424–432. DOI:  http://doi.org/10.4992/jjpsy.78.424

Haynie, Hannah, Claire Bowern & Hannah LaPalombara. 2014. Sound symbolism in the languages of Australia. PLoS ONE 9(4). DOI:  http://doi.org/10.1371/journal.pone.0092852

Hutter, Frank, Holger H. Hoos & Kevin Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In Carlos A. Coello (ed.), International conference on learning and intelligent optimization, 507–523. Berlin: Springer. DOI:  http://doi.org/10.1007/978-3-642-25566-3_40

Imai, Mutsumi & Sotaro Kita. 2014. The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Phil. Trans. R. Soc. B 369(1651). 20130298. DOI:  http://doi.org/10.1098/rstb.2013.0298

Imai, Mutsumi, Sotaro Kita, Miho Nagumo & Hiroyuki Okada. 2008. Sound symbolism facilitates early verb learning. Cognition 109(1). 54–65. DOI:  http://doi.org/10.1016/j.cognition.2008.07.015

Jespersen, Otto. 1922. Language: its nature and development. New York: Henry Holt and Company.

Johansson, Niklas, Andrey Anikin, Gerd Carling & Arthur Holmer. 2019. The typology of sound symbolism: Defining macro-concepts via their semantic and phonetic features. Linguistic Typology.

Johnson, Ronald C. 1967. Magnitude symbolism of English words. Journal of Verbal Learning and Verbal Behavior 6(4). 508–511. DOI:  http://doi.org/10.1016/S0022-5371(67)80008-2

Joo, Ian. 2020. Phonosemantic biases found in Leipzig-Jakarta lists of 66 languages. Linguistic Typology 24(1). 1–12. DOI:  http://doi.org/10.1515/lingty-2019-0030

Katz, Albert N. 1986. Meaning conveyed by vowels: Some reanalyses of word norm data. Bulletin of the Psychonomic Society 24(1). 15–17. DOI:  http://doi.org/10.3758/BF03330490

Kawahara, Shigeto, Atsushi Noto & Gakuji Kumagai. 2018. Sound symbolic patterns in Pokémon names. Phonetica 75(3). 219–244. DOI:  http://doi.org/10.1159/000484938

Kawahara, Shigeto & Gakuji Kumagai. 2019. Expressing evolution in Pokémon names: Experimental explorations. Journal of Japanese Linguistics 35(1). 3–38. DOI:  http://doi.org/10.1515/jjl-2019-2002

Kawahara, Shigeto, Hironori Katsuda & Gakuji Kumagai. 2019. Accounting for the stochastic nature of sound symbolism using Maximum Entropy model. Open Linguistics 5(1). 109–120. DOI:  http://doi.org/10.1515/opli-2019-0007

Kingston, John & Randy L. Diehl. 1994. Phonetic knowledge. Language 70(3). 419–454. DOI:  http://doi.org/10.1353/lan.1994.0023

Kita, Sotaro. 1997. Two-dimensional semantic analysis of Japanese mimetics. Linguistics 35. 379–415. DOI:  http://doi.org/10.1515/ling.1997.35.2.379

Klink, Richard R. 2000. Creating brand names with meaning: The use of sound symbolism. Marketing Letters 11(1). 5–20.

Knoeferle, Klemens, Jixing Li, Emanuela Maggioni & Charles Spence. 2017. What drives sound symbolism? Different acoustic cues underlie sound-size and sound-shape mappings. Scientific Reports 7(1). 5562. DOI:  http://doi.org/10.1038/s41598-017-05965-y

Lisker, Leigh. 1986. “Voicing” in English: A catalogue of acoustic features signaling /b/ versus /p/ in trochees. Language and Speech 29(1). 3–11. DOI:  http://doi.org/10.1177/002383098602900102

Lu, Jenny C. & Susan Goldin-Meadow. 2018. Creating images with the stroke of a hand: Depiction of size and shape in sign language. Frontiers in Psychology 9. 1276. DOI:  http://doi.org/10.3389/fpsyg.2018.01276

Lynott, Dermot & Louise Connell. 2013. Modality exclusivity norms for 400 nouns: The relationship between perceptual experience and surface word form. Behavior Research Methods 45(2). 516–526. DOI:  http://doi.org/10.3758/s13428-012-0267-0

Macuch Silva, Vinicius, Judith Holler, Asli Ozyurek & Seán G. Roberts. 2020. Multimodality and the origin of a novel communication system in face-to-face interaction. Royal Society Open Science 7(1). 182056. DOI:  http://doi.org/10.1098/rsos.182056

Navarro, Danielle. 2015. Learning statistics with R: A tutorial for psychology students and other beginners. http://ua.edu.au/ccs/teaching/lsr

Newman, Stanley S. 1933. Further experiments in phonetic symbolism. The American Journal of Psychology 45(1). 53–75. DOI:  http://doi.org/10.2307/1414186

Nielsen, Alan. 2016. Systematicity, motivatedness, and the structure of the lexicon. Edinburgh: University of Edinburgh dissertation.

Nielsen, Alan & Mark Dingemanse. 2020. Iconicity in word learning and beyond: A critical review. Language and Speech 1–21. DOI:  http://doi.org/10.1177/0023830920914339

Ohala, John J. 1983. Cross-language use of pitch: an ethological view. Phonetica 40(1). 1–18. DOI:  http://doi.org/10.1159/000261678

Paivio, Allan. 1975. Perceptual comparisons through the mind’s eye. Memory & Cognition 3(6). 635–647. DOI:  http://doi.org/10.3758/BF03198229

Parise, Cesare & Charles Spence. 2009. ‘When birds of a feather flock together’: synesthetic correspondences modulate audiovisual integration in non-synesthetes. PloS One 4(5). e5664. DOI:  http://doi.org/10.1371/journal.pone.0005664

Parise, Cesare & Charles Spence. 2012. Audiovisual crossmodal correspondences and sound symbolism: a study using the implicit association test. Experimental Brain Research 220(3–4). 319–333. DOI:  http://doi.org/10.1007/s00221-012-3140-6

Perlman, Marcus, Nathaniel Clark & Marlene Johansson Falck. 2015a. Iconic prosody in story reading. Cognitive Science 39(6). 1348–1368. DOI:  http://doi.org/10.1111/cogs.12190

Perlman, Marcus, Rick Dale & Gary Lupyan. 2015b. Iconicity can ground the creation of vocal symbols. Royal Society Open Science 2(8). 150152. DOI:  http://doi.org/10.1098/rsos.150152

Perniss, Pamela, Robin L. Thompson & Gabriella Vigliocco. 2010. Iconicity as a general property of language: evidence from spoken and signed languages. Frontiers in Psychology 1. DOI:  http://doi.org/10.3389/fpsyg.2010.00227

Perry, Lynn K., Marcus Perlman, Bodo Winter, Dominic W. Massaro & Gary Lupyan. 2017. Iconicity in the speech of children and adults. Developmental Science, e12572. DOI:  http://doi.org/10.1111/desc.12572

Perry, Lynn K., Marcus Perlman & Gary Lupyan. 2015. Iconicity in English and Spanish and its relation to lexical category and age of acquisition. PloS One 10(9). e0137147. DOI:  http://doi.org/10.1371/journal.pone.0137147

Preziosi, Melissa A. & Jennifer H. Coane. 2017. Remembering that big things sound big: Sound symbolism and associative memory. Cognitive Research: Principles and Implications 2(1). 10. DOI:  http://doi.org/10.1186/s41235-016-0047-y

Probst, Philipp, Marvin N. Wright & Anne-Laure Boulesteix. 2019. Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9(3). e1301. DOI:  http://doi.org/10.1002/widm.1301

R Core Team. 2019. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

Rhodes, Richard. 1994. Aural images. In Leanne Hinton, Johanna Nichols, & John J. Ohala (eds.), Sound symbolism, 276–292. Cambridge, UK: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511751806.019

Sapir, Edward. 1929. A study in phonetic symbolism. Journal of Experimental Psychology 12(3). 225–239. DOI:  http://doi.org/10.1037/h0070931

Scott, Graham G., Anne Keitel, Marc Becirspahic, Bo Yao & Sara C. Sereno. 2019. The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods 51(3). 1258–1270. DOI:  http://doi.org/10.3758/s13428-018-1099-3

Shih, Stephanie & Deniz Rudin. 2020. On sound symbolism in baseball player names. Names, 1–18. DOI:  http://doi.org/10.1080/00277738.2020.1759353

Shinohara, Kazuko & Shigeto Kawahara. 2010. A cross-linguistic study of sound symbolism: The images of size. In Annual Meeting of the Berkeley Linguistics Society 36. 396–410. DOI:  http://doi.org/10.3765/bls.v36i1.3926

Shintel, Hadas, Howard C. Nusbaum & Arika Okrent. 2006. Analog acoustic expression in speech communication. Journal of Memory and Language 55(2). 167–177. DOI:  http://doi.org/10.1016/j.jml.2006.03.002

Sidhu, David M., Chris Westbury, Geoff Hollis & Penny M. Pexman. 2021. Sound symbolism shapes the English language: The maluma/takete effect in English nouns. Psychonomic Bulletin & Review, 1–9. DOI:  http://doi.org/10.3758/s13423-021-01883-3

Sidhu, David M., Gabriella Vigliocco & Penny M. Pexman. 2020. Effects of iconicity in lexical decision. Language and Cognition 12(1). 164–181. DOI:  http://doi.org/10.1017/langcog.2019.36

Sommer, Ferdinand. 1933. Lautnachahmung. Indogermanische Forschungen 51. 229–268. DOI:  http://doi.org/10.1515/if-1933-0164

Stoddart, John. 1858. Glossology: or, The historical relations of languages. London: R. Griffin and Company.

Strobl, Carolin, Anne-Laure Boulesteix, Achim Zeileis & Torsten Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8(1). 1–21. DOI:  http://doi.org/10.1186/1471-2105-8-25

Strobl, Carolin, James Malley & Gerhard Tutz. 2009. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4). 323. DOI:  http://doi.org/10.1037/a0016973

Tarte, Robert D. 1982. The relationship between monosyllables and pure tones: An investigation of phonetic symbolism. Journal of Verbal Learning and Verbal Behavior 21(3). 352–360. DOI:  http://doi.org/10.1016/S0022-5371(82)90670-3

Tarte, Robert D. & Loren S. Barritt. 1971. Phonetic symbolism in adult native speakers of English: Three studies. Language and Speech 14(2). 158–168. DOI:  http://doi.org/10.1177/002383097101400206

Taylor, Insup Kim. 1963. Phonetic symbolism re-examined. Psychological Bulletin 60(2). 200–209. DOI:  http://doi.org/10.1037/h0040632

Thompson, Arthur Lewis & Youngah Do. 2019. Defining iconicity: An articulation-based methodology for explaining the phonological structure of ideophones. Glossa: A Journal of General Linguistics 4(1). 72. DOI:  http://doi.org/10.5334/gjgl.872

Thompson, Patrick D. & Zachary Estes. 2011. Sound symbolic naming of novel objects is a graded function. The Quarterly Journal of Experimental Psychology 64(12). 2392–2404. DOI:  http://doi.org/10.1080/17470218.2011.605898

Thorndike, Edward L. 1945. On Orr’s hypotheses concerning the front and back vowels. British Journal of Psychology 36(1). 10. DOI:  http://doi.org/10.1111/j.2044-8295.1945.tb01098.x

Torchiano, Marco. 2019. effsize: Efficient effect size computation. DOI:  http://doi.org/10.5281/zenodo.1480624

Ultan, Russell. 1978. Size-sound symbolism. In Joseph H. Greenberg (ed.), Universals of human language, 525–568. Stanford, CA: Stanford University Press.

Vinson, David, Robin L. Thompson, Robert Skinner & Gabriella Vigliocco. 2015. A faster path between meaning and form? Iconicity facilitates sign recognition and production in British Sign Language. Journal of Memory and Language 82. 56–85. DOI:  http://doi.org/10.1016/j.jml.2015.03.002

Waugh, Linda R. 1994. Degrees of iconicity in the lexicon. Journal of Pragmatics 22(1). 55–70. DOI:  http://doi.org/10.1016/0378-2166(94)90056-6

Wedgwood, Hensleigh. 1845. On Onomatopoeia. Proceedings of the Philological Society 2(34). 109–118. DOI:  http://doi.org/10.1111/j.1467-968X.1845.tb00049.x

Wescott, Roger W. 1971. Linguistic iconism. Language, 416–428. DOI:  http://doi.org/10.2307/412089

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Jim Hester, et al. 2019. Welcome to the Tidyverse. Journal of Open Source Software 4(43). 1686. DOI:  http://doi.org/10.21105/joss.01686

Winter, Bodo. 2019. Sensory linguistics: Language, perception, and metaphor. Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/celcr.20

Winter, Bodo, Marcus Perlman, Lynn K. Perry & Gary Lupyan. 2017. Which words are most iconic? Iconicity in English sensory words. Interaction Studies 18(3). 433–454. DOI:  http://doi.org/10.1075/is.18.3.07win

Wissemann, Heinz. 1954. Untersuchungen zur Onomatopoiie. Heidelberg: Carl Winter Verlag.

Wright, Marvin N. & Andreas Ziegler. 2017. A fast implementation of random forests for high dimensional data in {C++} and {R}. Journal of Statistical Software 77(1). 1–17. DOI:  http://doi.org/10.18637/jss.v077.i01