Familiar vs. unique in a diachronic perspective. Case study of the rise of the definite article in North Germanic

The aim of the present study is to follow the development of the suffixed definite article in North Germanic, in particular taking into account the unique reference expressed by the nascent article. The study is based on the corpora of Old Swedish, Old Danish and Old Icelandic texts written between 1200 and 1550. Both qualitative and quantitative methods, such as logistic regression models, are applied. The study is grounded in the notions of familiarity and uniqueness, which we explore diachronically. The results indicate that the use of the definite article is much more frequent in familiar than in unique contexts in North Germanic in the periods studied, as a greater proportion of NPs with direct anaphors is definite in the oldest extant texts, as well as throughout the later periods, than the proportion of NPs with unique referents. NPs with unique referents are further shown to constitute a non-uniform group, where the ‘more local’ unique NPs (grounded in specific knowledge) appear more frequently with a definite article than the ‘more global’ unique referents (grounded in encyclopaedic knowledge)


Introductory remarks
The discussion of the meaning of definiteness dates back to Frege's classic example The Morning Star is the Evening Star.It has been further fuelled by a debate between Russell (1905) and Strawson (1950) and has later mainly focused on why some discourse referents may be definite: either because they are familiar or because they are unique.In languages which have developed definite article, some uses of the article may be explained by evoking familiarity, while others by invoking uniqueness.A number of attempts have been made to either subsume all uses of the definite article under one or the other (e.g.Christophersen 1939 opts for familiarity), or to reconcile the notions by widening the scope of their meaning, as in weak familiarity (Roberts 2003).Enlightening data comes from studies on languages with more than one definite article and languages in which the definite article displays different patterns of behaviour depending on the type of context and the semantic notion invoked, e.g. the German definite article which may be contracted with a preposition or remain a full lexeme, e.g.im (= in+dem) vs. in dem 'in the'.In recent years, a number of authors have suggested that the two notions cannot be reconciled and that there are two types of definiteness instead, instantiated by two definite articles: weak and strong, with distribution roughly corresponding to the contexts relying on the semantic notions of uniqueness and familiarity.This observation is grounded in empirical studies, most notably Ebert (1971a), and has been further explored in theoretically oriented works such as Hawkins (1978) and Löbner (1985).The terms weak and strong were proposed by Schwarz (2009) in his work on definite articles in German.We will discuss this in more detail in section 2.
In the following we differentiate between: a) semantic concepts of familiarity and uniqueness, b) morphosyntactic representation, i.e. definite articles and c) contexts in which the articles may or may not be used, such as anaphora, unique reference and larger situation use.
The meaning of definiteness has so far mainly been unravelled in synchronic studies.The majority of these are based on constructed examples, which are studied in detail, with possible and probable contexts constructed around them (Hawkins 1978;Heim 1988).Fewer studies have been based on actual corpus data (famously Fraurud 1990 and her later publications; cf.Löbner 2003), revealing a number of counter-intuitive facts, e.g. that nominal phrases with a definite article (defNPs) used to introduce new discourse referents, i.e. first mention definites, are in fact more frequent than definites used anaphorically (Fraurud 1990).The majority of these first mention uses were established by bridging (Hawkins' associative anaphora), a use that is grounded in both familiarity (via an anchor) and uniqueness (the only possible discourse referent), as in 1. (1) John has bought a new house.The roof is green.
The different uses of NPs with a definite article seem to be ordered diachronically from those in which the definite article is used because the referent can be subsumed under some notion of familiarity (e.g.anaphora) to those in which the referent can be more easily subsumed under the notion of uniqueness (e.g.prototypically unique entities in a context -king in a country).In their model of definite article grammaticalization De Mulder and Carlier place direct anaphora (co-referring NPs) at the onset of grammaticalization of the definite article (De Mulder & Carlier 2011;cf. Himmelmann 1997;Lyons 1999: 158ff; see also Simonenko & Carlier 2016).As the final stage of definite article grammaticalization they discuss the use with unique referents and the bridging uses as the half-way stage.
In the following discussion we will take Hawkins' typology of the major usage types of the definite article as our starting point (Hawkins 1978: 106-129).This typology includes the following subtypes, illustrated by relevant examples quoted after Hawkins: 1 (2) 1. anaphoric and immediate situation uses, e.g.Fred was wearing trousers.The pants had a big patch on them.(Hawkins 1978: 107) 1 2.
larger situation uses (relying on specific or general knowledge), e.g.The Prime Minister has just resigned.(Hawkins 1978: 116)

3.
associative anaphoric uses, e.g.The man drove past our house in a car.The exhaust fumes were terrible.(Hawkins 1978: 123) 1 Note that Hawkins (1978) groups anaphoric and immediate situation uses together; in this paper, however, these terms are not treated as interchangeable.
For the present study we are mainly concerned with the second type, i.e. larger situation uses.These are different from immediate situation uses, in which the unique referent is present, though not necessarily visible, in the situation in which the utterance is made (a requirement that the referent be visible is important for demonstratives, not for articles).Thus in larger situation use people in the same village can talk about the church, the pub, the village green; members of the same nation can talk about the Queen, the navy, the Prime Minister; all people can talk about the sun, the moon, the planets.These larger situations can be of varying size, but they will all have as their focal, defining point the immediate situation of utterance in which the speech act is taking place (Hawkins 1978: 115).In this sense the definite article retains its link with the original demonstrative.Depending on the size of the situation, the definite reference may be made based on specific knowledge (we are in a village we are familiar with, we know there is a church in it, we can talk about the church) or on general knowledge (we do not know the church, but it is part of our knowledge that there usually is one in a village).In all these uses the referent in question has to be unique in the situation independent of its size.
The aim of the present study is to consider uses of the incipient definite article in North Germanic, based on a corpus of Danish, Swedish and Icelandic texts written between 1200 and 1550.In the study we focus on NPs with a suffixed definite article used to refer to unique discourse referents.We argue that unique discourse referents do not form a homogenous group as the semantic concepts invoked for their resolution involve also familiarity.In our typology of definite article uses we follow Hawkins (1978) (see example 2).We regard an NP as familiar if it is co-indexed with another NP that precedes it in the text.When specified, we also use the notion of familiarity in a broader, pragmatic sense, i.e. an NP is familiar if its referent is associated with the context of the situation, but not necessarily found in the text (Jespersen's 1943situational basis, cf. Lyons 1999: 254).Consider e.g. a defNP 'the king' used in a text discussing a specific country.The definite article is used because the referent is unique, however, it may be considered unique only with respect to this particular country.In this sense it is made familiar by the familiarity of the country under discussion or in which the discussion takes place and its uniqueness is 'local' rather than 'absolute'.We assume that definiteness uses can be put on a scale that goes from more familiarity (e.g.anaphora) to less familiarity (e.g.global uniqueness), with other categories (such as bridging, local uniqueness, immediate situations) falling in-between.
The main issue we wish to address is whether there is a difference in expression between the types of larger situation use that rely on specific knowledge (more 'local' uniqueness as it were) and the types that rely on general knowledge (more 'global' uniqueness) in a diachronic context.We assume that such differences exist and our hypothesis is that the more local types of reference predate the more global ones in terms of definite article presence, which can be measured by the frequency of the incipient definite article in these contexts.A significant rise in frequency of a grammaticalizing item is often considered to be an important indicator of the ongoing grammaticalization process (Bybee et al. 1994), although not a prerequisite.It can be hypothesised that certain uses of the incipient definite article will display higher frequencies earlier than others.Our hypothesis is that the uses in which the semantic concept of familiarity may be involved will be those with higher frequencies than the uses involving semantic uniqueness only.
The paper is organized as follows: we begin with an introduction of the theoretical tenets of the paper in 2; in 3 we give a short presentation of definiteness marking in modern North Germanic.In 4 the corpus and annotation method applied in the present study are presented.In 5 we present and discuss the results of the corpus study; section 6 concludes the paper.

Familiar vs. unique in a diachronic perspective
The observation that not all uses of the definite article can be subsumed under one category is of particular interest to a language historian.The diversity of definite article uses naturally leads to an assumption that not all uses have emerged at the same time, but rather that some predate others in the grammaticalization of the definite article.
The traditional model of definite article grammaticalization recognizes the original deictic meaning (most definite articles are derived from demonstratives or other deictic elements, see Lyons 1999) and treats the first uses of the demonstrative to point within texts rather than situations (i.e.direct anaphora) as an extension of this original meaning.It is possible to use demonstratives as Piotrowska and Skrzypek Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1178 anaphoric markers in any language without necessarily stipulating that they are on the verge of grammaticalization into definite articles.The interpretation of the definite form in such contexts is based on the semantic concept of familiarity (cf.Hawkins' anaphoric and immediate situation use).
However, in the next stage, the original demonstrative comes to be used with new discourse referents which are in some way grounded in the preceding discourse (Hawkins' associative anaphoric use).Despite several attempts to prove the contrary, demonstratives can only appear in these contexts under restricted circumstances (e.g. in Japanese, see Takamine 2014: 39; see also Diessel 1999: 24) and the fact that a form is used in the associative anaphoric context may be taken to be a symptom of its ongoing grammaticalization into a definite article.So far, the transition from direct to indirect (or associative) anaphora escapes us.The indirect anaphora itself is a heterogeneous and complex context, which seems to be partly based on familiarity, but also reliant on uniqueness.Some attempts to describe the diachronic path through indirect anaphora have been made (Skrzypek 2020), but many questions remain unanswered.
The final stage of definite article grammaticalization is the possibility (later on an obligation) to use the original demonstrative with referents which are unique within the utterance-situation, or which are inherently unique (Hawkins' larger situation use).This use is clearly at odds with the original meaning of the form, as the demonstrative is used to pick out a discourse referent from among other potential referents, while the definite article used in a larger situation use serves to ensure the hearer of the referent's uniqueness and confirms that it cannot be confused with another referent.This use is based solely on uniqueness.
The strong-weak dichotomy proposed by Schwarz (2009) classifies the definite article used anaphorically as strong, and the definite article in the larger situation use as weak.The definite articles in associative anaphoric uses are classified by Schwarz (2009) partly as strong and partly as weak.In languages with two distinct definite articles, such as North Fering (Ebert 1971b) and some German dialects (Hartmann 1967;1978), we may assume two grammaticalizations: of two definite articles (the concept of two definite articles is also explored in Breu 2004 andWespel 2008), see also Dahl's study on the grammaticalization of the definite articles in Swedish (Dahl 2015).In languages with one definite article which unites both familiar and unique uses, we propose instead one grammaticalization, whose sub-stages include the rise of strong and weak uses of the grammaticalizing article.Considering the etymology of the definite article, i.e. deictic element, typically a demonstrative pronoun, it is natural to assume that the grammaticalization proceeds from uses based on the semantic concept of familiarity to those based on uniqueness, which is a claim supported in this paper (but also among others in De Mulder & Carlier 2011).
We will address this hypothesis with a statistical analysis of the North Germanic data in section 5.
We want further to address the issue of the bridge between the textually grounded uses of the developing definite article and the larger situation uses that seem to be free of such grounding.How exactly does the development proceed from one to the other type?Since there exist different types of larger situation uses (based on how large the situation under consideration is), which are the earliest to adopt the definite article and which lag behind?These questions will be addressed in section 5, where we present an analysis of the North Germanic data.

Definiteness marking in present-day North Germanic languages
An overview of the NPs in North Germanic is given in Table 1 along with the respective form of definite and indefinite articles.
In present-day North Germanic languages there are two definite articles: postposed and preposed.
The postposed definite article is a suffix, always attached enclitically to the noun (in Insular Scandinavian languages Icelandic and Faroese the article attaches to the case-inflected form of the noun), as illustrated in examples 3-5 below.The origins of the postposed definite article are to be sought in the distal demonstrative hinn 'yon' (Perridon 1989;Skrzypek 2012).Apart from the postposed article, in Mainland Scandinavian languages (Danish, Norwegian, Swedish) and Faroese there is a preposed definite article den 'this', which generally occurs in NPs where the noun is accompanied by an attribute.In the present study we are only concerned with the suffixed definite article.Piotrowska and Skrzypek Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1178 (3) Swedish Jag ha-r köp-t hus-et.1sg have-prs buy-pst house-def 'I have bought the house.' (4) Danish Jeg ha-r køb-t hus-et.1sg have-prs buy-pst house-def 'I have bought the house.' (5) Icelandic Ég hef keyp-t hús-ið.1sg have.1sg.prsbuy-pst house-def 'I have bought the house.' The postposed definite article in North Germanic languages may be used deictically, to refer to objects in the speaker's or hearer's direct vicinity, as in 6.Another function of the definite article is the direct anaphoric reference, which can be seen as the textual form of deictic reference, as it refers back to referents already accessible in discourse, as in example 7. The definite article is further broadly used with the indirect (or associative) anaphora, in which the referent is anchored in another discourse referent, as in 8.The last two contexts in which the definite article is used in North Germanic is the unique reference, as in the sun, the king, the Prime Minister, the uniqueness of the referents being established relative to space/time coordinates, and the generic reference, as in example 9.However, the definite article is not the only option for expressing generic referents, as they may be marked by the indefinite article, or they may appear in bare NPs with no articles (both in singular and plural forms).( 9) Swedish (Teleman et al. 2010: 150) Piano-t är ett stränginstrument.piano-def be.prs indf string.instrument'The piano is a string instrument.' The case of bare nouns in North Germanic is worth exploring, as they are quite frequently used despite the existence of definite and indefinite articles (except for Icelandic, which did not develop an indefinite article).Bare nouns are used, for example, as predicates in phrases stating names of professions, where English requires an indefinite article, as in 10.
Danish Jan er praest.Jan be.prs priest 'Jan is a priest.' The use of bare nouns does not always correspond to English indefinite articles and does not necessarily involve existential interpretation per se.Consider examples 11-12.The reference of NPs in these examples is based neither on familiarity or uniqueness, but rather it is incorporated into verbs ('to house-buy', 'to elk-hunt') or nouns ('elk-tracks', 'dog-tracks').These constructions have been previously analysed as pseudo-incorporations (cf.Asudeh & Mikkelsen 2000), and are considered to be neutral with respect to definiteness and number, i.e. they are formally singular nouns, but they do not necessarily refer to only one entity (cf.Pettersson 1976).( 11) Swedish att köpa hus / *ett hus / *hus-et to buy house / indf house / house-def 'To be house-buying' (to be looking for a house with prospect of buying it) (12) Swedish att jaga älg / *en älg /*älg-en to hunt elk / indf elk / elk-def 'To be elk-hunting'

The corpus and the method
The study is based on the corpora of Old Swedish, Old Danish and Old Icelandic texts written between 1200 and 1550.We omitted the Norwegian texts since due to Norway's political situation the extant texts from 1350-1550 represent one genre only, i.e. the so-called diplomas.As these are short legal documents with formulaic openings and closings, it is difficult to extract fragments of high narrativity, which has been the goal of the project.We have compiled the corpus ourselves from the available online sources and corpora, such as the Fornsvenska textbanken 2 for Swedish texts, Middelalder og renaessance 3 for Danish texts and The Icelandic Piotrowska and Skrzypek Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1178 Parsed Historical Corpus 4 for Icelandic texts.The period 1200-1550 is the time of many rapid changes in the structure of nominal phrases in North Germanic languages, so in order to make the corpus more comparable across the three languages we have divided it into three periods, i.e.Period I (1200-1350), Period II (1350-1450), and Period III (1450-1550).The texts chosen for the corpus represent four genres, namely legal prose, religious prose, profane prose and sagas.Fragments of high narrativity are chosen to ensure examples of multiple references to salient discourse referents in different contexts.The texts of different genres are not entirely comparable across all the periods studied due to varied availability of source texts from different epochs.In the first period the majority of the texts are examples of legal prose.These are the oldest extant texts written in Nordic languages in Latin script and they are original Scandinavian texts, not translations.For these reasons they must be included in the study, even though they may be of very specific genre.The corpus for Periods II and III includes profane and religious prose, the majority of which are texts translated or adapted from foreign models, with the exception of Icelandic sagas which are original texts.
The overview of the corpus used in the study is presented in Table 2.The corpus consists of ca.280 000 words.The length of the corpus for each language varies heavily, with Old Icelandic being represented by the longest texts and Old Danish by the shortest texts.This disparity is of little importance, however, as the number of NPs extracted from each part of the corpus is comparable across the three languages.We were striving to obtain ca. 3 000 NPs in each language.The total number of noun phrases annotated in the corpus amounts to 9 363.
The annotation included tagging the relevant NPs as U (unique).This was done under the following conditions: the NP in question had no antecedent and no anchor, it was a first mention use with no introduction in the earlier (or later) text, nor was the reading generic.
Among the annotated noun phrases there are 677 instances of unique referents, which constitute 7.2% of all of the NPs.Most of NPs with unique referents are found in Old Danish and Old Swedish where they constitute 10.5% of all NPs, while the Old Icelandic texts show a very low frequency of NPs with such referents.The number and proportion of the NPs with direct anaphoric reference is also given in Table 2, as the comparison between the two types of reference will be explored in section 5. Anaphoric NPs are evidently more frequent in the corpus.The languages display a similar average proportion of ca.22% of NPs with direct anaphora within all of the NPs.
In gathering the data of NPs with unique referents it became clear that one item is overwhelmingly frequent in each language, namely the referent gud 'God', which continuously appears in its bare form throughout all the periods studied.If included, this item would constitute 25.2% (N = 228) of all of the instances of NPs with unique reference (N = 905) visibly skewing the results in favour of BNs, especially in Period II which includes many texts of religious nature.Given all that and the fact that gud 'God' is one of the few lexemes that remain bare in presentday North Germanic, we exclude all instances of this item from the present study, which leaves us with 677 NPs with unique referents (see Table 2).
Regarding the methods of annotating the corpus, special computer software is used to facilitate the manual annotation process.NPs on different levels of linguistic information, such as the type of article, type of reference, grammatical information (case, gender and number), syntactic roles and semantic information (e.g.animacy).The tool provides several features that improve the efficiency of use.For most annotation levels the system displays a context-sensitive list of prompts of available annotation tags, as well as tag suggestions based on previously annotated words.Each annotation decision is saved automatically in a periodically backed-up database.The tool also generates simple statistics that help us analyse the nature of noun phrases in Old North Germanic texts.The texts were annotated primarily from February 2017 to February 2018.In the analysis several statistical tests are used, such as contingency tables with a Chi-square Test of Independence and Binary and Multinomial Logistic Regression models.All tests were conducted with the help of the IBM SPSS Statistics programme.
5 Results of the corpus study

General results: raw frequencies
Tables 3 and 4 illustrate the results of the corpus study.We compare the frequencies of bare nouns (BN) and the grammaticalizing suffixed definite article (DEF) in two types of uses: in NPs with unique reference, i.e. based on uniqueness (Table 3), and in NPs with direct anaphoric reference, i.e. based on familiarity (Table 4).At this point, we are not yet differentiating between different types of larger situation use.Our main concern is to examine whether the postulated familiar-unique division in the frequency of use of the definite form can be found in the diachronic study and whether the uses of the definite article are associated with period in the given contexts.
As far as our annotation guidelines are concerned, the NPs annotated as unique reference have no co-referring NP in the preceding text, i.e. no antecedent (see 4  genitives, adjectives, etc.) but no explicit definite article; as we will discuss in section 5.2 the contribution of the OTHER category is not significant for NPs with unique referents.
A Chi-Square Test of Independence was performed to test the association between two variables in the tables, namely period (an ordinal variable) and type of article (a nominal variable).The null hypothesis for this test is the following: (i) Period is not associated with the type of article in Swedish, Danish, and Icelandic.
Since the p-values reported for the tables are small, we decide to reject the null hypothesis in (i) for Swedish and Danish since the probability of Type I error is very small.There is thus a statistically significant association between period and article type in NPs with unique referents for these two languages.The test could not be performed for Icelandic because of insufficient data.There is simply not enough evidence in the Icelandic data to suggest an association between period and presence of a definite article (or lack of it) for NPs with unique referents.
Figures 1 and 2 illustrate the distribution of the types of NPs among NPs with unique reference over time. 5In the case of NPs with unique referents (see Table 3 and Figures 1 and 2), both Danish and Swedish exhibit similar developments.BNs are the most frequent in both languages and nearly all periods (with the exception of Period III in Swedish). 6We observe that the highest proportion of BNs in Danish and Swedish occurs in Period I and decreases continuously through Periods II and III.Conversely, NPs with a definite article have a relatively low frequency with unique referents in Period I, but rise somewhat throughout Periods II and III, especially in Danish.On average only 12.3% and 16.1% of NPs with unique referents display the suffixed definite article in Danish and Swedish respectively.In the Icelandic corpus there were extremely few instances of NPs with unique referents compared to the Danish and Swedish corpora.No patterns of change or conclusions can be drawn based on the Icelandic data; we thus exclude 5 Period is treated here as a continuous variable with 50 year intervals.

6
In the texts from Period III in Swedish there are, unfortunately, very few NPs with unique referents, so the proportions in this period are not statistically significant.Icelandic NPs with unique referents from the present analysis.The lack of unique reference NPs is most likely due to the composition of the corpus; the Icelandic texts include only one legal text and a few religious sagas, the majority of the corpus, however, is composed of chivalric sagas.The Danish and Swedish corpora comprise many more legal and religious texts in which NPs with unique referents are found more often. 7 As a comparison we also present the data concerning NPs with direct anaphoric reference, a use that is often given as an example of familiarity as the underlying meaning of definiteness (see Table 4 and Figures 3-5).In this use we observe that the frequency of NPs with the suffixed definite article fluctuates, but it is overall gaining in frequency between 1200 and 1400 in Swedish and between 1200 and 1450 in Danish, while the use of bare nouns in direct anaphoric contexts decreases.On average, the incipient definite article is used in 34.5% of all anaphoric NPs, while bare nouns are used in 21.6% of anaphoric NPs.Compared to the average of 74.0% of NPs with unique referents that are expressed through bare nouns (here we are excluding the results for Icelandic), it is clear that BNs are quite strongly disfavoured in the context of direct anaphora in Swedish and Danish as early as in Period II.The pattern of bare NPs with anaphoric referents in Icelandic is quite the opposite, as they still constitute a large proportion of NPs relative to definite-marked NPs.The frequency results here are misleading.A closer inspection of Icelandic NPs with anaphoric referents in Period III reveals that among all of the bare NPs used anaphorically there are in fact only two lexical items: kóngur 'king' and drottning 'queen', as in 13, in which the referent is introduced in the first line as Ríkharður kóngur 'king Richard' and later on referred back to in a bare NP kóngur 'king'.Since there is a clear co-referring antecedent such examples are annotated as NPs with direct anaphora, even though such cases are ambiguous and may waver between anaphoric and unique reading. 7 Relative to contexts with direct anaphora, NPs with unique referents are 4.8 times more likely to appear in legal texts than in profane prose, and 2.2 times more likely to appear in religious texts than in profane prose in the corpora studied.5 and 6 illustrate the make-up of the OTHER category in both contexts, namely NPs with unique reference and NPs with anaphoric reference.Both uses, namely NPs with unique referents and NPs with direct anaphoric referents, differ in their use of OTHER constructions, i.e. neither the suffixed definite article nor bare NPs.As regards the OTHER category among NPs with unique referents (Table 5), it is quite homogenous as it almost exclusively includes definite-marked NPs other than NPs with the suffixed article.The data here is, however, too scarce to draw any conclusions.The OTHER category among NPs with direct anaphora (Table 6) displays a similar make-up, although more adjectival NPs with no determiners can be found here, especially in the first period.
The results of binary logistic regression are presented in the next section.

Time period as a factor influencing the presence of article type: binary logistic regression
To see how big an impact the time period has on the presence of a given article type (BN, DEF and OTHER) in NPs with unique referents and NPs with direct anaphoric referents, the results of binary logistic regression are reported in this section.We fitted binary logistic regression models with Period as a single continuous independent variable. 8All three languages are taken here together, since excluding the Icelandic data did not change the results in any significant way.This method allows one to build a predictive model that shows how great the probability of a particular article type is given the increase in the period variable (this information is provided by odds ratios).We use binary logistic regression instead of, for example, linear regression, as the dependent variable is categorical and dichotomous rather than continuous or ordinal.
The B Coefficient should be interpreted as the rate of change.As regards the model for BNs among NPs with unique referents (see Table 7), as the Period variable increases by one unit (by one year), the log-odds of bare NPs occurring in this context decrease by 0.8%.Contrary, as Period increases by one year, the log-odds of NPs with a definite article occurring among NPs with unique reference increase by 1.006 times, or in other words by 0.6%.We observe the same exact result for OTHER NP types in this context.The log-odds presented here are very small because the Period variable is coded as continuous and thus the model estimates the log-odds for difference in one-year intervals.The overall trend is that the log-odds for BNs occurring continuously decrease by 0.8%, while the log-odds for DEF occurring continuously increase by 0.6% each year.
We observe the same overall tendency among NPs with direct anaphoric referents (see Table 8).
Firstly, as the Period variable increases by one year, the log-odds of bare NPs occurring in this context decrease by 0.7%; the result here is thus nearly identical with the result for NPs with unique referents, although the decrease in log-odds for BNs occurring is marginally less steep here.Secondly, as Period increases by one year, the log-odds of NPs with a definite article occurring among NPs with anaphoric reference increase by 1.004 times, or in other words by 0.4%.The increase of log-odds here is smaller than for NPs with unique referents.The ratios for the OTHER NP type are not statistically significant in this context.

8
We fitted binary regression models to three dependent variables in each context.The binary dependent variables correspond to the use/non-use of a given article type, for instance, all bare NPs in the dataset are coded as 'yes' in the BN variable, all other NPs are coded as 'no' in that variable.The same holds for the remaining variables: DEF and OTHER.The independent variable (Period) is coded as a continuous variable.Further, to check if Period may have a greater impact on a given context (i.e.NPs with unique referents or NPs with direct anaphoric referents) in the selection of the definite article, we build a binary logistic model with DEF as a dependent variable and Period (continuous variable) and Context (nominal variable: anaphoric vs. unique) as independent variables.Firstly, a simple scatter plot is presented with regression lines for both contexts in Figure 6.The y-axis corresponds to the predicted probability of the definite article occurring and the x-axis corresponds to the Period variable; the data points represent the predicted probability of actual cases (as many cases have the same value, the dots are superimposed onto each other resulting with the plot with what seems like few data points).Both regression lines display the same trend, already presented in Tables 7 and 8, namely that as the years increase so does the probability for the definite article to occur.The slopes of the regression lines appear to be very similar, indicating that the context does not influence the outcome significantly, or in other words that Period has the same effect on the presence of the definite article irrespective of the context.

Regression model
Secondly, to check if the effect of Period on the definite article is independent of context, an interaction term Period*Context is added to the logistic regression model to check if there is a significant interaction between the two.The results are reported in Table 9.
The results indicate that, if Period had a value of 0 (0 years) then NPs with direct anaphoric referents would be over 55 times more likely to occur with the definite article than the NPs with unique referents.The interaction term Period*Context is the difference between the log-odds ratio corresponding to an increase in Period by one year amongst NPs with direct anaphora and the log-odds ratio corresponding to an increase in Period by one year amongst NPs with unique reference.The interaction between Period and Context is not statistically significant (p = 0.118), proving that there is no statistically significant difference in regression slopes presented in Figure 6.Thus, even though Context is a significant variable on its own and the slopes illustrated in Figure 6 have different constants (the line for NPs with anaphoric referents is consistently higher than that for NPs with unique referents), the Period variable does not have a greater impact on either of the contexts.Lastly, we will closely examine NPs with unique referents.As Table 3 (see 5.1) illustrates, the number of bare NPs drops so significantly across the three periods in both Danish and Swedish (while at the same time the overall number of NPs with unique referents also decreases), that the proportion of NPs with the suffixed definite article is on the rise even though the raw numbers do not change much or actually drop (see With the exception of the two abovementioned lexical items, the NPs with unique referents are predominantly bare NPs. Overall, the two contexts explored here, namely NPs with anaphoric and unique referents, show corresponding patterns, i.e. over time there appear more NPs with the definite article and fewer bare NPs.The NPs with the definite article gain in frequency with more or less the same pace for both contexts, as illustrated by the regression slopes in Figure 6.However, the frequency of NPs with a definite article is consistently higher in the context of direct anaphora than in the context of unique referents.The definite article was more frequently used already in Period I in anaphoric contexts than in unique contexts.While we cannot claim that the use of the definite article with anaphoric referents predates its use with unique referents, the differences in frequencies are quite striking.These differences may be due to the fact that the anaphoric context, while based on familiarity, may also satisfy uniqueness conditions, and would thus be more likely to appear as definite at the time of language change (see also Simonenko & Carlier 2020).This assumption is further explored in 5.3 and we will return to it in the concluding section.

Further analysis of NPs with unique referents: multinomial logistic regression
In this section we explore NPs with unique referents in more detail to see which factors apart from time period affect the presence of the suffixed definite article.Firstly, all the NPs with unique referents in our dataset can be divided into three domains based on what they refer to.This division is based on the semantics of the head noun.Each NP was thus annotated for the variable Domain with one of the three categories.These categories are: nature (for example, sun, earth, air, nature, etc.), religion (church, Bible, devil, heaven, hell, faith, etc.), and law (law, king, mayor, emperor, etc.).The majority of these belong to the category of larger situation use (in terms of Hawkins 1978), which relies on general (and not specific) knowledge.Some NP referents, such as the sun or the Pope, may also be considered to be absolute uniques, unvarying for all people (see Lyons 1999: 8).The aim is to see if any of the three domains shows a significantly higher frequency of definite forms.
Further, in each domain we can discern referents that are either absolute uniques, namely uniques that rely on general encyclopaedic knowledge (what we call here 'global' uniques), and Piotrowska and Skrzypek Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1178 those that rely on specific knowledge ('local' uniques).We annotated each NP with a unique referent for a variable Unique type with two categories, namely local and global uniques.To operationalize the annotation of this variable we assumed that all NP referents common for at least the whole country (for instance, the king, the religion, the earth) are global uniques, and those that are common for smaller groups of people, such as towns or villages are local uniques (for instance, the chieftain, the priest, the forest).
We hypothesize that, since the definite article was not fully grammaticalized in the unique context in the time periods we study, it would occur more often in contexts that rely on both familiarity and uniqueness semantics (see also 5.2 and 6).NPs with local unique referents, while clearly unique in the sense of having no direct or indirect antecedents, can still fulfil the familiarity condition in the sense of pragmatic familiarity, in which the referent is linked to the utterance situation.We expect that NPs with local uniques will occur in definite form more frequently than those with global uniques, since they rely on the familiarity with the context.As for which domain might attract the definite form more strongly, we hypothesize that the domain of law will exhibit a higher frequency of definite form.Overall, all the predictor variables entered into the model are statistically significant. 10The factor of Period is also included and controlled for, but we omit it in Table 10 so as not to redundantly repeat the results presented in 5.2.
As regards NPs with a definite article relative to bare NPs, they are less likely to occur in Danish than in Swedish corpus texts.They are also over 3 times more likely to appear in religious texts than in profane texts.As for Domain, while all the other factors are held constant, NPs assigned to the religion category are significantly less likely to be definite than NPs in the nature category, to be exact there are 76.6% less odds of the definite article occurring with the religion category than with the nature category relative to BNs.Lastly, NPs favouring definite forms rather than bare forms are 4.6 times more likely to occur with local unique referents than with global unique referents.
As regards OTHER NPs relative to BNs, only two variables are selected as significant here.OTHER NPs relative to BNs are less likely to occur in Danish texts than in Swedish texts.OTHER NPs also strongly favour local unique referents; they are 5.6 times more likely to occur in local contexts than in global contexts.
The model's predictive accuracy is 74.9%; the model is very good at predicting BNs (95.9% of cases classified correctly), but not as felicitous at predicting DEF or OTHER (respectively 18.4% and 20.5% of cases predicted correctly).As far as the relative variable importance is concerned, Genre proves to be the strongest predictor factor in the dataset (legal texts very strongly favour BNs to other article types, which confirms the somewhat archaic character of these texts and the specific features of the genre as such, which is characterized by, among others, high frequency of BNs in Modern Swedish as well, cf.Gunnarsson 1982), followed by Domain, Period (see section 5.2), and lastly Unique type.
In conclusion, the variable of Domain is not epiphenomenal to the variable of Unique type (the local/global distinction), as it is the category of nature that strongly favours definite forms and this category is in no way correlated with local uniques (i.e. another category that strongly favours definite forms).In the nature domain we find a relatively significant amount of NPs with a definite article with referents that are considered to be absolute uniques, common to all people, such as the sun, the earth, the world, etc.These lexical items, in particular the earth and the world, are high frequency NPs; together they constitute 48.9% (23 out of 47 examples) of all examples within the nature category, which in itself is a relatively small category in the dataset.The nature group displays thus a relatively high frequency of the definite article with global unique referents, but since the category is a small part of the dataset, it does not change the fact that local unique referents relative to global referents strongly favour NPs with a definite article.
While the model controls for the variable of Domain, the variable of Unique type is highly significant confirming the hypothesis that NPs with unique referents fulfilling both familiarity and uniqueness conditions (i.e.local uniques) are significantly more likely to appear in the definite form or OTHER forms (which predominantly include other determiners as we have mentioned in section 5.1) than in bare forms.We now turn to a closer examination of a particular context in the corpus texts, namely co-ordinated NPs of the type to the king, the bishop and the district, which are regularly found in legal prose, stipulating to whom taxes or fines were to be paid.Since the payments were typically divided three-ways, such co-ordinated NPs join a number of referents which are 'differently' unique.In Period I NPs with unique referents tend to be unmarked and appear as BNs, irrespective of the scope of their uniqueness, as in 17, where two local unique referents are followed by a global referent.
(17) Swedish (Äldre Västgötalagen KB: 7, 1225) Uerdher kyrkia brut-in oc maess-u fat stol-en […] þat er be.prs church break-ptcp and mass-obl plate steal-ptcp this be.prsniv march-a sak kyrky swa haereþe sva konogge.nine mark-pl fine church so district so king 'If a church is broken into and the Mass plate stolen […].It is nine marks fine, to church, and so to district and so to king.' can 2sg give 1sg.reflnewspaper-def 'Can you give me the newspaper?' (7) Swedish Han ha-de en hund.Hund-en het-te Bella.3sg.m have-pst indf dog dog-def call-pst Bella 'He had a dog.The dog was called Bella.'

Figure 1
Figure 1 The raw frequencies of NP types (DEF, BN and OTHER) across time among NPs with unique reference in Danish.

Figure 2
Figure 2 The raw frequencies of NP types (DEF, BN and OTHER) across time among NPs with unique reference in Swedish.

Figure 4
Figure 4 The raw frequencies of NP types (DEF, BN and OTHER) across time among NPs with direct anaphoric reference in Swedish.

Figure 3
Figure 3 The raw frequencies of NP types (DEF, BN and OTHER) across time among NPs with direct anaphoric reference in Danish.

Figure 5
Figure 5 The raw frequencies of NP types (DEF, BN and OTHER) across time among NPs with direct anaphoric reference in Icelandic.

Figure 6
Figure 6 Scatter plot with regression lines for predicted probability of DEF by Period in two contexts: NPs with unique referents and NPs with anaphoric referents.

Table 1
An overview of the NPs in North Germanic.Piotrowska and Skrzypek Författar-en skriv-er riktigt bra.1sg read-prs indf new book author-def write-prs really good 'I am reading a new book.The author writes really well.'

Table 3
Bare nouns and the suffixed definite article in NPs with unique reference.

Table 6
OTHER NP types in NPs with direct anaphoric reference.

Table 7
Binary logistic regression models for NPs with unique referents.

Table 9
Figures1 and 2).Because NPs with a definite article are few compared to the number of bare NPs, finding lexical items that display a clear development from being predominantly bare in Period I to being predominantly definite in Period III is not feasible.For each lexical item among NPs with unique reference that occurs in the corpora more than once, such as king, heaven, hell, bishop, etc., bare NPs constitute a majority in each period.The only exceptions are lexical items djaevel 'devil' and waeruld 'world', both of which are predominantly definite in the corpus.obl land-gen lord and world-def.gensaviour 'Joseph, the land of Egypt's lord and the world's saviour.' This hypothesis is, however, based again on the local/global distinction; the domain of law has the highest proportion of local unique referents compared to other domains, 9 thus the variable of Domain if significant might be epiphenomenal to the variable of Unique type.To check these hypotheses we use a multinomial logistic regression model with Article type as the dependent variable (with three categories: BN, DEF and OTHER).Multinomial regression allows us to choose a reference category of the dependent variable to which the other two categories are compared; here we choose BN as the reference group.The model tests the probability of definite forms (DEF) occurring relative to BNs, and then OTHER forms occurring relative to BNs, as a function of different predictor factors such as Language, Genre, Domain and Unique type.What is important is that the regression model provides information on which of the variables is the most prominent, while it simultaneously controls for all other factors in the model.
Table 10 illustrates the regression model for NPs with unique referents.Statistically significant results are in bold.9% (12 out of 243) of all NPs are annotated as local.In the nature domain 6.4% (3 out of 47) of all NPs are annotated as local.
9In the domain of law 22.5% (87 out of 387) of all NPs are annotated as local.In the domain of religion 4.