1 The basic meaning of ‘perspective’

The basic meaning of ‘perspective’ refers to the insight that in visual perception an object is visible only in certain aspects and cannot be viewed in its entirety without changing the observer’s point of view. In the real world, we can see either the inside or outside of a house, so the observer has to move his position if he wants to take another perspective. In an abstract sense, ‘perspective’ can thus be seen as a two-point-relation between an observer’s origo and (the aspects of) an object in focus, whereas the process of perspectivization refers to the selection of viewpoint that effects a restriction of the aspects seen (cf. e.g. Graumann 1989: 96).

In this very general sense, the concept of perspective has been metaphorically transferred to the domain of language: language is perspectival in that various options on the lexical, grammatical, contextual, and conversational level allow for conceptualizing the same extra-linguistic situation in different manners so that every utterance is based on a set of choices by the speaker. As a result, the concept of ‘perspective’ has been applied to various linguistic phenomena in their different aspects (see for an overview e.g. Klein & von Stutterheim 2002; Verhagen 2007; von Stutterheim & Carroll 2007). Furthermore, recent studies have shown that even “seemingly ‘innocent’ or ‘viewpoint-neutral’ lower-level constructions” such as determiners or negation can function as viewpoint markers (e.g. Dancygier 2005; 2012a; b; c; Dancygier and Sweetser 2012; Sweetser 2012; Dancygier & Vandelanotte 2016). This insight in the ubiquity of viewpoints has led to the fact that the investigation of multiple viewpoints and their constellations has become a major research topic.

The chapter is based on these studies by taking for granted the ubiquity of viewpoints and the necessity for investigations of multiple viewpoints and their constellations (Dancygier & Vandelanotte 2016). Yet, it is committed to a slightly different perspective. It is not the aim to study different linguistic phenomena with respect to their specific viewpoint potential in discourse, but rather focus on the core concept of perspective itself. In this respect, the main thesis is that the concept of perspective-taking is not sufficient in order to capture the specific aspects of linguistic and cognitive perspectivization but requires the concept of a meta-perspective that is capable to integrate more than one single point of view at a time. So while in reality the local position of the observer restricts her view to perceiving either the inside or outside of a house, cognitive and linguistic representations do not only enable us to free ourselves from the here and now of the actual perceptual situation but also open up the possibility to take and evaluate an internal and external view simultaneously. As a result, a model of L-perspectivization has to go beyond the concept of perspective-taking by taking into account the integration of confronting perspectives.

In order to pursue this line of argumentation, the chapter is organized as follows. Based on the assumption that perspectivization in its abstract sense is a fundamental concept which is shared by both language and cognition, Section 2 argues that its basic mechanisms can be isolated by a comparison of cognitive and linguistic perspectival tasks of different complexity. With respect to C(ognition)-perspectivization, Section 3 shows that the intricacy of Theory of Mind (ToM) tasks does not primarily lie within the capability of switching from one perspective to another, but to evaluate confronting perspectives from an additional, external viewpoint. This leads to a differentiation between the concept of perspective-taking as a binary relation between a viewpoint and an object in focus and the ternary concept of confronting perspectives. Such refinement is, as Section 4 argues, also needed in order to model the different degrees of perspectival complexity in language. By a comparison of propositional attitude ascriptions (4.1), epistemic meanings of modal verbs (4.2), and narrative discourse comprehension (4.3) it is shown that these different phenomena share the fact that they integrate both external and internal viewpoints on potentially truth-incompatible perspectives. Section 5 will bring the perspectives together by deriving the commonalities between cognitive and linguistic perspectivization. In sum, the analysis leads to a multi-stage-model of different degrees of perspectival complexity that not only refines the concept of perspectivization but also offers a tertium comparationis for complex viewpoint constellations in language and cognition.

2 The intersection of cognitive and linguistic perspectivization

It is well known that in the perceptual domain, children only gradually come to understand that an object can be perceived in a different way from a different point of view, and it is even later that they can judge how an object is seen from a different angle (cf. e.g. Flavell 1992; Michelon & Zacks 2006; Moll & Meltzoff 2011a; Surtees et al. 2013). In tradition of the studies of Flavell (e.g. Flavell et al. 1981; Flavell 1992), the acquisition of visual perspectivization is commonly described by referring to the distinction between Level 1 perspectivization, i.e. the ability to understand that someone else might not see an object that oneself can see, and Level 2 perspectivization, i.e. to understand that “an object simultaneously visible to both the self and the other person may nonetheless give rise to different visual impressions or experiences in the two if their viewing circumstances differ” (Flavell et al. 1981: 1). While Level 1 perspective-taking is evident by the age of 24-months (Moll & Tomasello 2006), Level 2 perspectivization develops later at around four years of age (Flavell et al. 1981).1 The later stage of visual perspectivization has been seen in close relationship to the emergence of cognitive non-spatial perspectivization tasks, most prominently referred to in terms of Theory of Mind (ToM) (cf. for an overview Farrant et al. 2006). ToM refers to the capability of ascribing mental states to other people’s minds and is as such a genuine task of perspective-taking. It is, however, not an unproblematic concept since it is controversial which perspectival capabilities in particular are lying behind the understanding of others’ beliefs.2 Furthermore, ToM has been challenged by approaches which put emphasis on the fact that understanding the belief of others is just one specific aspect of intersubjectivity and has to be seen in context of other intersubjective capabilities like comprehension of emotions, attentional foci, intentions, and social embodied interaction (cf. Gärdenfors 2008; Zlatev et al. 2008). One of the main differences in contrast to classical ToM accounts lies within the fact that, instead of focussing on the ‘great divide’ at four years, also early capabilities like joint attention are seen as capabilities of perspectivization. ToM is thus not seen as a “monolithic” ability that “species either do or do not have” (Tomasello et al. 2003: 204), but rather “as configurations of features that constitute a family of related perspectivization capabilities of different degrees of complexity” (Verhagen 2008: 139).

Although the concepts of ToM and ‘intersubjectivity’ differ in several respects (see Zlatev et al. 2008), the gradual development of perspectival capacities is seen as significant in most of these approaches,3 as seen in the multi-stage-models proposed to account for the different degrees of perspectival tasks (cf. Perner 1991; Tomasello 1999: 179; Gärdenfors 2008; Zlatev 2008). All these models take for granted a crucial developmental step at the age of around four years.4 Whereas children under the age of four already manage shifts of perspective and hypothetical representations (Astington 1990: 157), it is at this time that children begin to pass the verbal test of ‘false belief’ that assesses whether they can attribute a false belief to another person and reason about the mental states of others (cf. e.g. Astington 1990; Leslie 2000; Rosenthal 2000; Gallagher & Hutto 2008). Second, this step is seen in close relationship to the development of more complex visual Level 2 perspectivization and linguistic perspectival tasks such as the usage of propositional attitude verbs, syntactic embedding, and grammatical means of epistemicity and evidentiality (cf. e.g. Harris 1996; Leslie 2000; Papafragou 2001; Astington & Baird 2005; de Villiers 2005; Moll & Meltzoff 2011a; b; c; Leiss 2012) as well as narrative comprehension (cf. e.g. Astington 1990; Feldman et al. 1990; Nelson 2003; Gallagher & Hutto 2008: 29; Goldie 2007).

In order to address the question of what triggers the increase of perspectival complexity, it seems thus promising to look at the common features between complex linguistic and cognitive perspectivization tasks. The basic idea in the following is thus that if we “break down both language and theory of mind into more basic components” (Lohmann, Tomasello & Meyer 2005: 263), this will offer insights with respect to the basic principles of perspectivization shared by language and cognition and their different degrees of perspectival complexity.

3 The basic components of C(ognition)-perspectivization

In order to determine the basic components of C-perspectivization a view on developmental studies seems instructive where the gradual unfolding of perspectival abilities is described in multiple stages. According to these models, a child’s viewpoint is at the primary level linked to the here-and-now of the real situation, so that “[t]he perceiver has no option of representing anything but current reality” (Perner 1991: 66f.). So while young children are able to share affective and perceptual experiences in terms of intersubjectivity and joint attention (cf. e.g. Gallagher & Hutto 2008: 21; Zlatev 2008: 224; Moll & Meltzoff 2011b: 394), such early forms of interpersonal interaction do not allow any detachment from the real context situation. A relevant step in the acquisition of perspectival capabilities is hence the mental decoupling from the real world, as seen in children’s capabilities of differentiating between present and past as well as switching between actual and imaginary hypothetical situations. This could be seen as a form of perspective-taking since the decoupling of reality allows for alternate viewpoints: a banana can be seen either as a banana in the real world or a telephone in an imaginary setting as well as the child can choose to be a prince or a princess or someone else in a play. Children are thus able to select one possibility out of multiple possible situation models by choosing either one option or another (cf. Perner 1991: 50).5

While children at this age are thus capable of taking perspectives, they yet experience difficulties in more complex perspectival tasks such as the verbal false belief test.6 This standard test relies on an experimental set-up with two characters A and B and an object which is located at a specific place X. When A leaves the room, B moves the object from X to another place Y. When A comes back, the testee is asked to predict where A will look for the object and where the testee thinks that A believes the object is. In their answer, four-/five-year-olds usually consider the state of knowledge of the character who has left the room and point at the location where the object is meant to be according to A’s knowledge. Younger children, however, tend to point at the location where the object really is according to their own knowledge. These different reactions are not trivial to explain since children younger than four to five years of age are able to leave their own perspective behind, and even to apprehend that others may see things differently (cf. Moll & Meltzoff 2011a; b; c).

According to Moll & Meltzoff (2011b; c), the intricacy in managing the false belief task does not lie in the “the ability to take another’s perspective on an object” (Moll & Meltzoff 2011c: 299) but in the more challenging competence “to confront two (or more) perspectives on the selfsame object at the same time” (Moll & Meltzoff 2011b: 403; emphasis in original). Referring to Flavell’s terminology, Moll & Meltzoff thus draw a distinction between Level 2a and 2b of perspective-taking, whereby only level 2b requires confronting perspectives (Moll & Meltzoff 2011c: 299). The false-belief test requires this more complex level 2b, since the testee has to handle two diverging systems of knowledge: she knows that A thinks that the object is in location X, while B knows that the object is in location Y – and she knows at the same time that only one possibility is correct, since she knows where the object really is. ToM tasks thus “require the ability to simultaneously represent conflicting information: the protagonist’s (or one’s) own belief about a past situation and the current true situation” (Plaut & Karmiloff-Smith 1993: 70). In order to give the right answer, she does not only have to suppress the own state of knowledge but needs to evaluate the two conflicting perspectives from her external stance.

With respect to the concept of perspectivization, this has important implications. First, a model of perspective as a two-point-relation becomes insufficient since it has to be complemented by a third reference point from which the contrasting viewpoints are evaluated, cf. Figure 1.7

Figure 1 

Concept of perspective as two-point vs. three-point relation.8

Second, the three viewpoints are not equivalent but display a hierarchical difference, since the third reference point is superordinated to the other alternate ones. The hierarchy of the viewpoints requires the simultaneous activation of both levels of the viewpoint constellation and their relationship to each other. According to Perner (1991), perspectivization is thus closely linked to metarepresentation as “a representation of a representation as a representation” (Perner 1991: 35), since it represents the representational relation itself:9

[…] a child capable of mental metarepresentation who, for instance, represents that a picture is a representation needs to construct a mental model containing two substructures and their relationship. One structure has to represent the picture (as a physical entity) and the other what the picture depicts (its interpretation), and, very importantly, the model has to include links between these two structures representing how the picture relates to the depicted. (Perner 1991: 83)

For complex perspectivization tasks it is thus not sufficient to select one out of different equivalent viewpoints but to hold the whole viewpoint constellation in mind by evaluating contrasting alternatives from an external reference point with respect to their relationship to each other. As a result, the relationship between the different viewpoints itself becomes the object of the perspectivization process. The observations on the development of cognitive perspectivization tasks thus show that what such tasks require is “not perspective taking but the explicit acknowledgement that a given object may be seen in alternative ways” (Moll & Meltzoff 2011b: 408).

The ‘full’ concept of perspective thus comprises different degrees of perspectival complexity (cf. Table 1): The primary prerequisite is the givenness of more than one viewpoint that offers equivalent alternatives (i). While perspective-taking implies that one viewpoint is picked out by rejecting the other possible options in the sense of either – or (i.e. either taking the banana as a banana or as a telephone), more complex perspectival task display more complex constellations regarding the relationship between the viewpoints involved. This is already the case in choosing between real and hypothetical situations, where the viewpoint shift involves a decoupling process from the actual situation and can in this respect be paralleled to Flavell’s Level 1 of visual perspectivization; yet, the original viewpoint is not totally cancelled but suppressed as seen in the fact that children are much aware that a banana is not a telephone in the real world. That is to say that the primary origo remains active in the background as a reference point (ii). This is the prerequisite for a hierarchical constellation of perspectival embedding of viewpoints at different levels since it presupposes that one viewpoint is superior to the other (iii). Yet, the complexity of ToM tasks as the false belief test requires more than taking the perspective of another person. In the ToM task, one has to evaluate the diverging perspectives (of Sally, Ann and her own) with respect to their truth values, whereas the truth value in imaginary scenarios is quite clear. The processing effort of different perspectives thus requires the reflection about the constellation of viewpoints (in analogy to Level 2b of visual perspective taking in terms by Moll & Meltzoff 2011a; b). The qualitative difference between the viewpoints is linked to the emergence of a metarepresentational awareness with respect to the given potential of possible alternatives, i.e. an external third reference point arises from which the diverging viewpoints are evaluated and integrated in an overall picture, which corresponds to Level 2b of visual perspective taking (iv). This is also reflected by the results of recent studies on non-verbal tasks of false belief test (Onishi & Baillargeon 2005). As shown in Rubio-Fernández & Geurts 2013, children are able to pass the test (in the test design with a success rate of 80%) if the children are encouraged to continuously track the perspective of the protagonist, while any kind of disruption risks the failing at the test. This indicates that the test becomes more complicated if the mono-perspective is disturbed.

In sum, several degrees of complexities can thus be distinguished, cf. Figure 2.

Figure 2 

Degrees of perspectival complexity.

These different degrees of perspectival complexity can also be observed in the emergence of pictorial perspectivization tasks. As Doherty & Wimmer 2005 have shown, the perception of ambiguous pictures such as the famous rabbit/duck representation by Wittgenstein is at first restricted to one perspective only: Children perceive the picture as either a duck or a rabbit, even if they are instructed about the ambiguity of the picture (Doherty & Wimmer 2005: 408f.). It is only later that they experience them like adults as oscillating between two possible interpretations. According to the authors, this is based on the same capabilities that are also necessary for more complex perspectival tasks, since it requires the understanding of the representational relationship between the figure and its two interpretations (Doherty & Wimmer 2005: 418). In a comparable way, drawings by young children represent individual objects from one particular perspective only. Entities like fishes for example, are commonly depicted in profile view, while lizards are drawn as seen from above. Such early pictures thus juxtapose different viewpoints that would be seen as contrasting from an overall integrating viewpoint. Yet, the different perspectives are not perceived as conflicting. It is in fact a rather late development that children integrate the different perspectives under one superordinated viewpoint like it is the case for linear perspective (cf. Köller 2004: 54).

It thus becomes clear that also with respect to pictorial representations perspectivization is more than perspective-taking in the sense of a selection of a certain viewpoint that restricts the aspects of the object in focus. My claim in the following will be that for L(anguage) perspectivization, the same mechanisms of viewpoint embedding and viewpoint integration as outlined in Figure 2 are necessary in order to account for viewpoint constellations in grammar and narrative discourse.

4 L-perspectivization

Against the background of the previous section, it will be shown that complex perspectival phenomena of L-perspectivization such as propositional attitude ascriptions, grammatical means of epistemicity and evidentiality, and narrative comprehension share similar viewpoint constellations and are based on the mechanism of confronting perspectives.

4.1 Viewpoint constellation of propositional attitude ascriptions

It is a commonplace that language as “an elaborated metarepresentational device” (de Saussure 2013: 68) is closely interrelated with cognitive capacities such as perspective-taking and Theory of Mind, and various studies have argued for a correlation between linguistic abilities of propositional attitude ascriptions and the performance on ToM tasks such as the false-belief test (cf. for overviews Astington & Baird 2005a; Jacques & Zelazo 2005: 144–146; Paynter & Peterson 2010). While it is controversial which capacities exactly can be paralleled with ToM capabilities, there is a close link between the structural components of ToM tasks and syntactic embedding under non-factive mental state verbs. This parallelism is already indicated in the definition by Dunbar (2006: 172):

Theory of mind is the ability to mind read or imagine how another individual sees the world. It is encapsulated in the statement: “I believe that you think the world is flat.”

The analogy seems quite obvious since complement clauses like the one in (1)

    1. (1)
    1. I believe that you think the world is flat.

seem to reflect ToM as the ability to imagine another individual’s mental state insofar as the embedding structure of two mental state predicates puts two different subjects of consciousness on stage (i.e. I believe vs. you think). This suggests that there is a viewpoint switch from the speaker-I to another person’s stance, and, hence, perspective-taking. However, there are two aspects that go beyond a perspective switch. First, taking another person’s stance does not mean to take another person’s position and forget about one’s own. Rather, it is ‘taken’ while maintaining the relation to one’s own state of belief. The primary viewpoint and the second viewpoint are hence not equivalent options, but display a qualitative difference whereby one viewpoint is subordinated to the other. Second, both primary and secondary viewpoints are based on the complement structure of propositional attitude ascriptions: non-factive verbs of perception, cognition, or communication in the matrix clause indicate the perspective on the event, while the embedded clause contains the perspectivized entity (cf. Astington & Baird 2005b: 165; Verhagen 2005: 78). This is seen in the fact that utterances like (2) can be evaluated with respect to two different veridicality attributions but that only (2b) would be a natural reaction in discourse.

    1. (2)
    1. He believes
    2. modus
    1. that
    1. p [the world is flat]
    2. dictum
    1. a.
    1. “Yes, he does.” (referring to the character’s intensional state)
    1. b.
    1. “No, it is not.” (referring to the truth value of the embedded proposition)

This complement structure is yet not a given. When children start producing sentences like I think p between the age of 2 and 3 years, mental state predicates are first used like epistemic stance markers (Diessel & Tomasello 2001). The subordination of the complement under the main clause is thus a later development. As Verhagen argues, it is only when this ability has developed that “it also becomes possible for a conceptualizer, in uttering I think, to construe his own thinking as an object of conceptualization for specific purposes, as in I think he will arrive on time, but I am not sure/but John is skeptical […]” (Verhagen 2007: 71). This is a crucial step since it allows for the detachment of one’s own viewpoint and to consider one’s own perspective (i.e. the relation between the origo and the perspectivized entity) as an object of evaluation. As a result, the own viewpoint becomes the origo and (part of) the object of evaluation at the same time. This recursive structure calls for a three-point concept of perspectivization, since the evaluation of the own viewpoint presupposes the awareness that there are other possible perspectives available, in other words: that p [the world is flat] could be false. According to de Villiers et al. (2014: 229), it is thus the recognition that the complement is false and hence the contrast of truth-values between clauses that triggers the recursion of complementation structures.

Note that the observations so far hold for first order sentences like in (2) which refer to one subject of consciousness only. In (1), the structure is even more complex since it involves the embedding of another person’s mental state and thus multiplies the relations between the different viewpoints. There is (i) a potential contrast between the two speaker beliefs (I believe vs. you think), and (ii) a potential contrast with respect to the embedded proposition that can be either true or false. While both contrasts are relevant with respect to the utterance, they do not display the same perspectival constellation. Contrast (i) between The world is flat vs. The world is not flat refers to two possible options that are structurally equal, but contradict each other in that only one of them can be true at the same time (see also Dancygier & Sweetser 2005; Dancygier 2012b for modelling the concept of ‘alternativity’ in mental space theory). The difference between I believe and you believe, on the other hand, relies on a contrast, but not a contradiction, as trivially, I can believe something other than you do – in the same way as it may be raining in Paris and not raining in New York. According to Perner et al. (2003), ‘real’ perspective problems are hence defined by the fact that the different viewpoints cannot be aligned by coordination.

    1. (3)
    1. a.
    1. The world is flat and the world is not flat. → contradicting
    1. b.
    1. I believe that the world is flat, and you believe that it is not. → contrasting

The concept of perspective-taking as defined in Section 3 presupposes that there are (at least) two alternative equal viewpoints to choose from. That is to say, there is a choice between either option A or option B, as given in (3a). The structure in (3b) integrates both the contradiction on the propositional content level as well as the contrast between the two states of mind which shows that embedding structures can integrate more than one viewpoint at the same time. This becomes clear by the fact that the second viewpoint can be filled out by an external ‘speaker’ which is not explicitly marked within the sentence, cf. the example in (4).

    1. (4)
    1. Little Red Riding Hood
    1. a.
    1. believes
    1. that the wolf is her grandmother.
    1. b.
    1. believes:
    1. *The wolf is my grandmother.
    1. internal perspective (PoV = Little Red Riding Hood)
    1. external perspective (PoV = speaker)

In (4), Little Red Riding Hood believes that the one lying in the bed is her grandmother, whereas the reader knows that this is not in line with the reality of the story world. The denotation of the one lying in the bed as “the wolf” thus cannot reflect the perspective of Little Red Riding Hood, but an external viewpoint which has to be reconstructed from the communicative context. This is seen in the fact that the denotation as “the wolf” is not possible in the direct speech construction in (4b) that reflects Little Red Riding Hood’s perspective. (4a) thus integrates two different perspectives: the viewpoint of Little Red Riding Hood whose belief system does not contain the fact that the one lying in her grandmother’s bed is the wolf, and an external point of view which contains the knowledge about the fact that the one lying in bed is actually the wolf. The latter is not explicitly marked in the linguistic structure but can be reconstructed from ‘outside’.

At this point, it is important to be precise about the notion of internal vs. external. As applied to example (4), the distinction refers to the structural difference between two embedded viewpoints. In this sense, internal1 captures the fact that the viewpoint is a second, displaced viewpoint that is viewed from outside by another external1 viewpoint that is situated at a different level. This distinction is independent of the fact whether the viewpoint is linked to e.g. proprioceptive, kinaesthetic, and cognitive processes ‘inside’ one’s mind, i.e. to private feelings and thoughts that can be accessed directly via introspection (e.g. ‘I have a headache’) (i.e. internal2). As will be seen in Section 4.3, both distinctions are relevant with respect to the perspectival structure but can lead to different evaluations of the viewpoint constellation.

In sum, it is thus trivially true that (1) is perspective-taking in the sense that a speaker uttering it chooses a certain perspective. But that is not the interesting point. Rather, the look on propositional attitude ascriptions shows that linguistic perspectivization is more than perspective-taking for more than one reason. First, as pointed out in the literature (Dancygier & Sweetser 2012; Dancygier & Vandelanotte 2016), even rather straightforward examples display not only one, but different viewpoints. Second, as also shown by Dancygier & Vandelanotte (2016), these viewpoints display hierarchical contrasts. With respect to the constellation of viewpoints, it is thus necessary to differentiate between equal alternatives that allow for perspective-taking in terms of an either or distinction (i.e. stage (ii) in Figure 2) and vertical relations of embedding (iii). Third, the integration of viewpoints requires a third external reference point from which the potentially contrasting viewpoints can be evaluated (iv). This calls for a three-point concept of perspective. As will be shown in the following section, the same is true for perspectivization in grammar.

4.2 Viewpoint constellation of epistemic modality

In a wide sense of the term, grammar is genuinely perspectival in that every grammatical paradigm offers a choice of alternatives. In a more narrow sense, grammar is linked to perspectivization insofar as it locates the speaker’s deictic origo that determines the ‘view’ of the verbal event situation (cf. the approaches by Leiss 1992 and Diewald 2010 in tradition of Bühler and Jakobson). In cognitive linguistics, this is captured in the premise that “[i]nherent in every usage event is a presupposed viewing arrangement, pertaining to the relationship between the conceptualizers and the situation being viewed” (Langacker 2001: 16). It follows from this that grammatical markers do not indicate a certain viewpoint, but rather a viewpoint constellation (see Verhagen 2016; Dancygier & Sweetser 2012; Sweetser 2012; Dancygier & Vandelanotte 2016 for examples). This can be illustrated by a look at the category of tense. At first glance, tense markers like the simple past are “shifters” (Jakobson 1957) since they relocate the reference point from a present utterance time to the past. Hence, they could be seen as instances of perspective-taking. Moreover, such a displacement of the origo in the sense of “Deixis am Phantasma” (Bühler 1934 [1999]) relies on the persistence of the primary origo in the background. The temporal perspectivization thus refers to the relation between primary and secondary reference points and, hence, a multiperspectival constellation between viewpoints of different qualities. As such, it requires a three-point description, as reflected in the ternary system of Reichenbach 1947 that models tense as relations between the time of event and the time of speech as perspectivized by a third reference point (cf. in detail Zeman 2015; see also Evans 2005 with respect to the multiperspectivity of complex tenses).

Nevertheless, tense markers are prototypical shifters insofar as they shift the focus on the displaced viewpoint whereas the primary origo is only maintained in the background. This is different for markers of evidential and epistemic modality that display a more complex perspectival structure. As laid out in the introductory section, there seems to be a strong link between the development of ToM capabilities and the comprehension of epistemic and evidential modality that is a rather late development in language acquisition (Perner 1991: 150; Papafragou 1997: 16; Papafragou 2001; Leiss 2012). This becomes understandable by the fact that epistemic and evidential meanings share with propositional attitude ascriptions the feature that they are modifiers of propositions, and as such, overt markers of speakers’ attitudes. Both evidentiality and epistemicity thus display a similar perspectival structure by embedding veridical statements that may be true or false. This is seen in the fact that only (5b) but not (5a) can be translated into the structure of an epistemic modal verb.

    1. (5)
    1. a.
    1. I think of cake and wine.
    2. *That must be cake and wine!
    1. b.
    1. I think that the one lying in her grandmother’s bed is the wolf.
    2. The one lying in her grandmother’s bed must be the wolf!
    3. I think that p [the creature is the wolf]

In contrast to (5a), (5b) leads to the potential for viewpoint contrasts between the level of the sentence subject and the level of the speaker, and as such, the potential for an internal1 and an external1 view in the structural sense (i.e. inside vs. outside the proposition). This is also a crucial difference in comparison to the perspectival structure of root modals, cf. (6).

    1. (6)
    1. a.
    1. Little Red Riding Hood must walk through the woods in order to see her grandma.
    2. → ‘She is obliged to do so.’
    1. b.
    1. Little Red Riding Hood must be walking through the woods right now.
    2. → ‘I (i.e. the speaker) guess that she is doing so right now’
    3. (but reality could teach us otherwise).

Root modals like in (6a) are characterized by their biphasicness, since they refer to two different time intervals, the temporal interval of the modal in the real world (i.e. the time for which the obligation holds) and the time interval of the event denoted by the infinitive complement in a possible world. As such, they display the potential for a focus shift from the present viewpoint to the future event, as seen in the fact that modals constitute a grammaticalization source for future tenses (cf. Zeman 2013). The main difference with respect to the epistemic example (6b) is yet that it introduces the speaker’s viewpoint, and, hence, an additional potential contrast between two belief systems (external2 vs. internal2): while the obligation in (6a) holds for the subject, i.e. Little Red Riding Hood, the modal meaning in (6b) scopes over the whole proposition p [Little Red Riding Hood is walking through the woods] and thus requires an external viewpoint of evaluation.

    1. (7)
    1. Biphasic structure of (present tense) modal verbs:
    1. a.
    1. root meaning
    1. b.
    1. epistemic meaning
    2. (I = intervall; ts = speech time; te = event time; P = proposition)

The perspectival complexity of (7b) is thus based both on the structural embedding of viewpoints (i.e. inside vs. outside the proposition), the distinction between two belief contents, and a contrast of veridicality on the propositional level, i.e. the fact that the proposition could be false. So once again, the construction integrates external and internal viewpoints at the same time. As such, they display a pattern of viewpoint integration (i.e. pattern (iv) in Table 1; Level 2b in the terminology by Moll’s & Meltzoff’s 2011). The difference in degree of complexity between (7a) and (7b) is also reflected in language acquisition. As laid out in Section 4.1, complementation structures like think that p are a later development than constructions like think of p that can be analysed in terms of perspective-taking. In analogy, verbs of volition are used prior to verbs of belief, and the usage of root modals precedes the comprehension of epistemic and evidential modals (cf. de Villiers 2005; Perner et al. 2005). In sum, it is not the potential viewpoint switch – and thus: not perspective-taking – that is crucial with respect to the difference between both constructions, but the viewpoint constellation that is based on the contrast between two different hierarchical levels (internal1 vs. external1), two different belief systems (internal2 vs. external2), and the contrast with respect to the veridicality attribution on the propositional level (true vs. false).

4.3 Viewpoint constellation in narrative discourse

The degrees of perspectival complexity are also identifiable for perspectivization in narrative contexts. Narratives are perspectival structures par excellence since they integrate viewpoints of different characters and narrators and thus offer a set of possible alternate perspectives that allow for viewpoint switches within the text. However, also narrative perspectivization is more than perspective-taking for at least two reasons. First, the different viewpoints are not equivalent alternatives since the viewpoints of characters and narrators can be embedded in each other (cf. Dancygier 2012a; Dancygier & Vandelanotte 2016). This is in particular relevant with respect to the structural difference between narrator and character level since narration is characterized by the fact that the narrator level scopes over the event level of the protagonists (cf. also Zeman 2016; to appear).10 Second, these different viewpoints require in addition a global perspective from which the different perspectives are organized (see also Dancygier 2012; this volume; Dancygier & Vandelanotte 2016):

[…] viewpoints are organized hierarchically and in terms of a network, with local viewpoint choices achieving overall coherence in what one might call a top-level or ‘Discourse Viewpoint’ space, from which lower-level viewpoint choices are overseen. (Dancygier & Vandelanotte 2016: 14)

As a result, different layers of perspectivization arise. This can be best illustrated by a look at Free Indirect Discourse (FID), a perspectival pattern that is restricted to narrative discourse mode only and thus allows for some insights with respect to the structure of narration in general, cf. (8).

    1. (8)
    1. She was glad. Tomorrow was finally the day she was to see her granny again!

As (8) shows, FID blends together two different viewpoints (cf. Pascal 1977; see also Fludernik 1993: 316ff. for an overview; Vandelanotte 2009: 246–251 for discussion of ‘dual voice accounts’). While deictic elements such as tomorrow and the emotive choice of kinship terms like granny match the character’s perspective, personal pronouns and tenses are linked to the level of the narrator and thus allow for the reconstruction of the narrator’s viewpoint (cf. in detail Schlenker 2004; Sharvit 2008; Maier 2012; Eckardt 2014). The relationship between these two perspectives has been accounted for in terms of double context and is preferably analysed as (partial) context shift. In this respect, Eckardt (2014) has proposed to evaluate sentences in FID relative to a pair of contexts: “an external context C that is shared by the narrator and the reader, and an internal context c corresponding to a situation in which a protagonist of the story talks or thinks, as part of the story” (Eckardt 2014: 60).

As laid out before, the ‘internal’ vs. ‘external’ can refer to two different aspects, namely (i) the hierarchical difference between the communicative levels of discourse, whereby the narrator has naturally an ‘outside’ view on the character on the discourse level (whether this character is referentially himself or not) and (ii) the question of whether the contents on the propositional level are ‘thoughts’ that can be accessed directly (‘internally’). Due to its functional ‘outside’ position, the perspective of the narrator allows for simultaneous knowledge about his commitment towards the proposition and the course of the story which can include the mental contents of the protagonists. The narrator knows what the protagonist knows, while the reverse would lead to metaleptic structures. In a structural sense then, FID integrates both internal1 and external1 viewpoints at the same time.

On the other hand, however, FID is also characterized by the fact that on the discourse level, only one perspective is foregrounded, i.e. the character’s point of view, while the narrator’s voice remains ‘invisible’ and can only be traced back within the grammar, i.e. the use of tenses and pronouns.11 As a result, the reported propositional content in (8) is perceived as dependent on the character’s viewpoint – which could be reliable or not. So while FID structurally integrates two different perspectives (i.e. internal1 vs. external1), it is at the same time committed to the perspective of just one subject of consciousness that is restricted to mental contents of speech and thought (i.e. internal2) (cf. Vandelanotte 2009: 246–251 for the argument that FID could be considered both as “bivocal” and “univocal”). As a result, grammatical constructions that are bound to the narrator’s perspective are incompatible with FID. This can be exemplified by the German ‘future of fate’ (FoF) construction that displays the complementary perspectival effect of FID:

    1. (9)
    1. Am nächsten Tag sollte Rotkäppchen seine Großmutter wieder sehen.
    2. ‘The next day, Little Red Riding Hood was to see (literally: ‘should see’) her grandma again.’

While FID foregrounds the perspective of the character, the modal verb construction in (9) restricts the perspective to the narrator’s viewpoint. He knows what will happen next, whereas the character on the story level is unaware of the events to come. Such a view from outside is incompatible with the character perspective which seems to be the cause why the FoF-reading is blocked in FID (cf. Eckardt 2012: 185). In a sentence like She was glad. Tomorrow she should see her again! the event to come is focalized from the character’s perspective, linked to a less degree of certainty than the proleptic prediction by the narrator. The hierarchy of perspectives is seen in the fact that the character is subordinate to the force of the narrator’s point of view whose controlling authority is grounded on its functional role within the narrative discourse. The authority of the narrator makes the event realization highly certain and rules out any intervention of the character on the story level. Hence, both constructions – FID and FoF – act on the double context of narration and structurally integrate a potential contrast of viewpoints, but the viewpoint constellation is reverse in that one perspective is suppressed in favor of the other on the discourse level.

So is this perspective-taking? Yes and no. Apart from the trivial fact that it is perspective-taking on the side of the speaker who selects a certain perspective in choosing either FID or FoF (or any other option), one could argue that it is perspective-taking in the sense that the perspective either of the narrator or the character is ‘taken’. On the other hand, it is more than just perspective-taking, since both (8) and (9) refer to more than one subject of consciousness (i.e. internal2 PoVs that are linked to mental agents) and rely on a hierarchical contrast between external1 and internal1 PoVs (in the structural sense of the term).

Furthermore, both constructions are based on a contrast between the narrator’s and the character’s viewpoint. In FoF, the divergent states of knowledge trigger a narrative tension since the narrator (and the reader) knows more than the character about the things to come. In FID, the report from the character’s perspective could turn out as unreliable or create ironic effects in contrast to the narrator’s viewpoint. In both cases, the potential contrasts thus call for a third evaluating viewpoint on a global level from which the perspectives are monitored. This is in line with Klein & von Stutterheim (2002) who distinguish between local perspectives within the text and “a globally established perspective” (Klein & von Stutterheim 2002: 83) that maintains the quaestio of the story. With respect to narrative discourse structure in particular, Dancygier (2012a), this volume, has argued that narratives are structured in different narrative spaces that require “a high-level story-viewpoint (SV) space, which structures the text-wide viewpoint of the story” (Dancygier 2012a: 129f.). That is to say, a global viewpoint is required that monitors the different perspectives in order to establish a coherent view on the story.

The establishment of such a global viewpoint is also a fundamental developmental step, as it is well known that stories of young children commonly lack a coherent macro-organization (cf. von Stutterheim & Carroll 2007: Fn. 3, with reference to Berman & Slobin 1994; see also Büttner 2016 with respect to disorders of macro-structural planning from a neurolinguistic perspective). This does not only affect the production, but also the reception of stories. According to Goldie (2007), the capability of taking an “external perspective”, i.e. thinking about another person in terms of “‘he thinks that p’ and ‘he thinks that if p then q’” (Goldie 2007: 70), constitutes the prerequisite for the appreciation and evaluation of diverging perspectives in narratives (cf. similarly Zunshine 2006 with respect to the role of ToM-task in narrative appreciation). Thereby, the comprehension of ‘dramatic irony’ displays more than a shift of perspective since the reader has to take into account (at least) two diverging viewpoints simultaneously in order to appreciate the story: the reader has to understand that the one in bed that Little Red Riding Hood is seeing is actually the wolf while knowing at the same time that Little Red Riding Hood thinks that it is her grandmother. Dramatic irony thus displays the same viewpoint constellation as laid out in Section 3 for false belief understanding: The beliefs of two mental subjects as alternate viewpoints that have to be evaluated from a third reference point as contrasting ones with respect to their propositional content and evaluated with respect to their relation to each other.

5 Modeling perspectival complexity in language and cognition

The comparison between propositional attitude ascriptions, epistemic modality, and narrative discourse structure has thus shown that the different linguistic phenomena display the same perspectival principles as complex cognitive perspectivization tasks. For all instances, it would thus be too simplifying to describe the perspectivization process as taking a perspective in the sense of selecting one viewpoint out of several ones available that restricts the aspects of the object seen. Rather, it is the constellation of potentially contrasting viewpoints that is relevant with respect to linguistic perspectival phenomena, whether we focus on grammar, complement structure or narrative discourse. Based on these observations, we can thus derive the following scale of perspectival complexity as reflected in both language and cognition (cf. also Table 1 below).

(i) Alternate viewpoints

For both cognitive as well as linguistic perspectivization, the crucial prerequisite is the givenness of alternate, i.e. divergent viewpoints. This constitutes the basis for perspective-taking in the sense of selecting one option out of equivalent possible alternatives in terms of either – or.

(ii) Viewpoint shift

As soon as there is more than one viewpoint available, the possibility of viewpoint switch arises. This mechanism is reflected in hypothetical representations as well as instances of direct discourse (Perner et al. 2002: 1466) and the difference between present and past that require a decoupling from the ‘real world model’ (Perner 1991) and the hic et nunc origo in Bühler’s (1934 [1999]) term of ‘Deixis am Phantasma’.

Such forms of ‘shifting’ (Jakobson 1957) and ‘displacement’ (Bühler 1999[1934]) constitute already more complex tasks since the principle of ‘Deixis am Phantasma’ (Bühler 1934 [1999]) implies (i) that the original viewpoint is not necessarily cancelled but maintained, and (ii) that the real primary and the secondary displaced origo are not equal viewpoints on the same level but dependently related to each other. As such, they provide the potential for viewpoint embedding.

(iii) Viewpoint embedding

The qualitative difference between primary and secondary viewpoints can lead to a hierarchical difference when one viewpoint is in the scope of another, as for example seen above in the distinction between character and narrator level in narrative discourse. This leads to a structural difference between internal1 and external1 viewpoints, in other words, “between a metarepresentation about a ‘belief world’ and a simulated assertion made ‘from within’ a belief world which we assume” (Recanati 2000: 61). ‘Shifts’ between hierarchical viewpoints are thus more complex than perspective-taking since they are based on a recursive structure where the external perspective includes the internal one by definition.

(iv) Viewpoint integration

Whereas viewpoint embeddings are more complex viewpoint constellations than viewpoint shifting, FID and FoF constructions, false-belief tasks, and narrative comprehension display an even more complex perspectival structure, since they are based on the maintenance of (at least) two diverging viewpoints and require the capability to process the diverging perspectives of internal1/2 and external1/2 contexts at the same time. This integration of (potentially) contrasting viewpoints requires a third external reference point from which the constellation is evaluated. It is thus again the process of confronting perspectives that lays the foundation for perspectival complexity.12

                                    C-perspectivization L-perspectivization

Alternate (divergent) viewpoints
(either or)
Alternate (divergent) viewpoints
(either or)
Viewpoint shift
Detachment of the ‘real world model’ (Perner 1991): past vs. present, real vs. hypothetical situations
Viewpoint shift
‘Deixis am Phantasma’: grammatical shifting (tense, mood)
Internal vs. external perspective: second-order beliefs
Internal vs. external perspective: propositional attitude ascriptions, grammatical shifting (epistemicity, evidentiality), character vs. narrator distinction
Simultaneous activation of internal and external viewpoints (and, at the same time): false belief, metarepresentation
Simultaneous activation of internal and external viewpoints (and, at the same time): FoF construction, Free Indirect Discourse, ‘dramatic irony’

Table 1

Degrees of perspectival complexity in C- and L-perspectivization.

6 Conclusion: Perspectivization as a three-point structure

At the beginning of the chapter, we started with the rather trivial and uncontroversial fact that perspectivization can be seen as a selection of a viewpoint that restricts the aspects of the object in focus. In this sense, perspectivization seems at first to be a matter of either or: a house is seen either from outside or inside, with respect to either its front or its back. This is true for perceptual viewing situations in the real world. As it comes to mental, pictorial, and linguistic conceptualization, however, perspectivization is more than just taking one perspective. In this respect, it has been shown that the different degrees of perspectival complexity are determined by (i) the qualitative difference between alternate viewpoints that are linked to a structure of subordination (e.g. primary real vs. second displaced viewpoints) and (ii) the establishment of a third reference point of evaluation that allows to maintain more than one perspective on different hierarchical levels at the same time. The main point of the paper has thus been that, if we metaphorically map the notion of perspective on mental, pictorial, and linguistic representation systems, we have to take into account not only the two-point-relation between an observing subject and an observed entity, but also the meta-relation that integrates the two-point-relation(s) under a third reference point as an evaluating instance.

As a consequence, phenomena of L- and C-conceptualization such as false belief understanding, propositional attitude ascriptions, epistemic modality, and narrative discourse have been characterized by the fact that they maintain external and internal perspectives at the same time and thus require the capability of confronting perspectives. In this respect, they could be compared to cubistic paintings that iuxtapose different contrasting viewpoints of one object on the canvas. What makes L-perspectivization even more complex than cubistic pictures is yet the fact that, as seen above, it is necessary to distinguish between different local and global levels in order to account for the perspectival viewpoint constellations. L-perspectivization thus appears to be more comparable to the works of M.C. Escher that integrate local deviant perspectives which cannot be true at the same time but yet constitute coherent pictures as seen from a global level. So, in sum, perspectivization seems to capture nothing less than the relational architecture of language. Seen from this viewpoint, perspectivization is indeed fundamental.