Do worlds have (fourth) walls? A Text World Theory approach to direct address in Fleabag

This article examines direct address, or ‘breaking the fourth wall’, in the BBC TV series Fleabag. It applies Text World Theory to telecinematic discourse for the first time and, in doing so, contributes to developing cognitive approaches in the field of telecinematic stylistics. Text World Theory, originally a cognitive linguistic discourse processing framework, is used to examine how multimodal cues contribute to the creation of imagined worlds. We examine three examples of direct address in Fleabag, featuring actor gaze alongside use of the second-person you or actor gaze alone. Our analysis highlights the need to account for the different deictic referents of you, with the pronoun able to refer intra- and extradiegetically. We also explore viewers’ ontological positioning because ‘breaking the fourth wall’ in telecinematic discourse evokes an addressee who is not spatiotemporally co-present with the text-world character. We therefore propose the concept of the split text-world, which assists in accounting for the deictic pull that viewers may feel during direct address and its experiential impact. Our analysis suggests that telecinematic direct address is necessarily world-forming but can ontologically position the viewer differently in different narrative contexts. While some instances of direct address in Fleabag position the viewer as Fleabag’s narratee and confidant, there is increasing play with direct address in the show’s second series and a destabilisation of this narratee role, achieved through the suggestion that Fleabag’s addressee may be more psychologically interior than they first appear.


Introduction
Fleabag (BBC Three, 2016-2019) is a television series written by, directed by, and starring Phoebe Waller-Bridge. The show received critical acclaim, with Waller-Bridge winning several awards including a British Academy of Film and Television Arts (BAFTA) award for Best Female Performance in a Comedy Programme for the first series. Fleabag tells the story of an unnamed woman (whom viewers assume is 'Fleabag') who lives in London. Fleabag is a troubled character, navigating various doomed romantic and sexual relationships, grieving the suicide of a close friend and maintaining awkward relationships with her father, sister and stepmother following her mother's death. The show is an example of what Palmieri (2016) calls 'sad comedy', a genre that represents 'abject' characters which disrupt viewer's sympathies (Woods, 2019: 198) as well as evoking laughs.
Fleabag has not yet received much scholarly attention, with the exception of Wilson-Scott's (2020) feminist consideration of the show's trope of the dead mother and Wood's (2019) discussion of the use of direct address. Indeed, as well as playing with the darker sides of comedy, the show is renowned for its use of direct address to camera, which is both loved and hated by viewers (see, for instance, the range of comments by Guardian readers in Obordo, 2019). This feature of the show is even emphasised in BBC iPlayer's (2019) description of the series, which reads: Meet Fleabag. She's not talking to all of usshe's talking to you. So why don't you pop your top off and come right in? Woods (2019) and Birke and Warhol (2017) identify the use of direct address as a trend in contemporary narrative television. Woods argues that the direct address in Fleabag contributes to the affective dimensions of its comedy, creating both a sense of complicity with the main character and intense discomfort, weaving an 'intricate dance of closeness and detachment that constantly shifts our engagement with narrative action ' (2019: 197). We build on Woods' observations by analysing three specific examples of direct address from across the first and second series of Fleabag and examining its experiential effects through a cognitive stylistic lens.
Although stylistics has always taken a keen interest in dramatic texts, early studies were confined to linguistic analysis of the script; the rationale being the constancy of the written text in comparison to theatrical performance, which is inescapably unstable and changeable (Cruickshank, 2014;Macrae, 2014;Short, 1998). More recently, stylistics has taken a multimodal turn with many scholars arguing for the importance of accounting for the totality of the text; thus, it is essential to also attend to the visual and haptic in literature (Gibbons, 2012;Nørgaard, 2019) and the sonic in music (Morini, 2013;West, 2019). The same is true for stylistic attention to drama, with McIntyre (2008) and Richardson (2010) explicitly calling for an integrated multimodal approach, leading to the development of a telecinematic branch of stylistics (Hoffmann and Kirner-Ludwig, 2020;Piazza et al., 2011). Current approaches in telecinematic stylistics tend to adopt corpus methods or take pragmatic and/or discourse analytical approaches, with cognitive stylistics currently only gestured to as an avenue for future research (Hoffman, 2020: 13). Whilst existing work has necessarily represented critical advancement through multimodal analysis, the scarcity of cognitive investigations means that the experiential aspect of telecinematic discourse has been relatively neglected. An exception is Harrison's (2020) discussion of the voice-overs in and visual production of The Handmaid's Tale TV series. Harrison adopts conceptual metaphor theory, arguing that it is through recurrent split self and 'container' metaphors that The Handmaid's Tale creates interpretive effects, which characterise the story world as oppressive.
This article also takes a cognitive stylistic approach to telecinematic discourse, though we use the analytical framework of Text World Theory. Whilst we discuss preceding studies of drama that utilise the framework, Text World Theory has not previously been employed to analyse film and TV as multimodal products. As such, our primary aim in this article is to apply Text World Theory to multimodal telecinematic discourse and, in doing so, demonstrate its explanatory power in accounting for telecinematic experience. Through our analysis of Fleabag, we show that Text World Theory can uncover the complex mechanics of telecinematic direct address, how it positions viewers and creates shifting relationships between character(s) and viewers.
In the next section of this article, we discuss prior research on direct address in film and TVas well as typologies of the second-person pronoun and its cognitive effects. In Section 3, we introduce the principles of Text World Theory and outline how it can be applied to telecinematic discourse. Our analysis of Fleabag, in Section 4, evidences the viability of Text World Theory as an approach to telecinematic texts and examines the shifting functions of direct address in the show.

Direct address, second-person reference and breaking the fourth wall
Direct address within fiction and drama is a form of ontological metalepsis, defined by Alber as 'jumps between narrative levels that involve actual transgressions or violations of ontological boundaries ' (2016: 203). Such violations can cross the boundary between the real and fictional world(s) or between different fictional worlds. In the context of theatre and drama, metaleptic direct address to the audience can be traced back at least as far as Elizabethan theatre. It is colloquially referred to as 'breaking the fourth wall' whereby 'dramatic conventions governing the separation of real and fictional worlds are deliberately violated so that, for example, a character comments on story events in an aside to the audience or an omniscient narrator reports story events directly to the audience as fictional events' (Thomson-Jones, 2007: 92). The 'fourth wall' thus stands as a metaphor for the invisible stage boundary separating actors from audience in the theatre or for the spatiotemporal and technological gap, generated by the camera and television screen, between on-screen actors and viewers.
Within film studies, Brown's monograph Breaking the Fourth Wall: Direct Address in the Cinema (2012) offers an important account of telecinematic direct address. Brown defines direct address as a phenomenon in which on-screen characters 'appear to acknowledge our presence as spectators; they seem to look at us' (2012: x). Brown outlines that whilst it 'is often assumed that, for narrative film-making, this [direct address] destroys the illusion of the story world and, by acknowledging the technology behind the cinema (i.e. the camera), distances us from the fiction' (2012: x), telecinematic direct address can also work, to the contrary, to 'intensify our relationship with the fiction' (2012: x). In relation to this claim, Brown argues that there are seven predominant functional effects of direct address in film fictions: creating intimacy between character and audience; providing a particular character with principal agency in and of the narrative; placing a character in a superior epistemic position with the fictional world; acting as a gesture or expression of honesty; instantiation in terms of instilling felt immediacy into the interaction, alienation in the sense used by Bertolt Brecht to represent the audience's experience of estrangement to evoke sociopolitical reflection; and lastly stillness when direct address occurs in a moment of reflection or a narrative pause in the action (2012: 13-18). Brown also highlights that not all looks to camera are instances of direct address since there exist examples in which 'the camera being looked at is meant to occupy the position of a person or an object within the film world' (2012: xi). This world internal form does not 'break the fourth wall'; it is not an instance of metaleptic direct address since an ontological boundary is not crossed.
Brown focuses on what he calls 'wordless examples' (2012: x): it is the actor's gaze to the camera, unaccompanied by dialogue, which signals the direct address. As such, Brown does not analyse verbal resources. In contrast, Birke and Warhol (2017) offer an account of direct address using second-person you in contemporary TV, ultimately distinguishing three types: documentary, narratorial and dramatic. The documentary mode happens in shows such as Modern Family and is world internal with characters addressing 'an interlocutor who is present on the scene, whether on or off-camera (as in a talk show or some kinds of documentary) or the apparatus itself (as in news programmes)' (Birke and Warhol, 2017: 148). As with the above discussion of world internal looks to the camera, we do not consider this mode of you to be metaleptic direct address. The narratorial mode occurs through disembodied voice-over with a character commenting on the events depicted on screen, as in Sex and the City. In our view, whilst the TV show is a multimodal text, in the narratorial mode, the direct address itself is not multimodal as it occurs only within dialogue. Finally, the dramatic mode imitates stage performance in the sense that you aligns with 'the actual TV viewer' (Birke and Warhol, 2017: 151). In the dramatic mode, direct address is both multimodal and functions metaleptically as it 'seems to pierce the boundary of the television screen and enter the viewer's domestic space' (Birke and Warhol, 2017: 153). Birke and Warhol cite House of Cards as an exemplar of the dramatic mode, a show also explored by Sorlin in her stylistic account of the 'aesthetic manipulation' of the TV audience (2016: 193-215). Sorlin argues that unlike in written fiction, telecinematic second-person address is unambiguous: 'When Frank turns towards the video camera, the viewers are clearly invited to occupy the position of the "you" address' (2016: 201). Sorlin's reference to Frank's gaze at the camera alongside his employment of the second-person pronoun implicitly suggests that the directness of telecinematic address is enhanced through the multimodal combination of textual you and visual gaze.
Sorlin's comparison with the complexities of the second-person pronoun in written fiction relates to a history of scholarship in linguistics, stylistics and narratology on the referential multivalence of linguistic you (e.g. Fludernik, 1994;Kacandes, 2001). Herman's (1994) proposal of the five different deictic functions of textual you in secondperson fiction, which we summarise below, is the most analytically incisive: 1. Generalised you, where the second-person functions impersonally like indefinite one; 2. fictional reference, in which you has undergone a deictic transfer (from I) with you substituting typical first-person narration and signifying a protagonist/character; 3. fictionalised, horizontal address, wherein you is used by characters to address other characters within the fiction; 4. apostrophic, vertical addressin essence, direct address from a fictional entity to the real reader; 5. double deixis, which entails a simultaneous superimposition of two or more of the above deictic references of you, one of which must be internal to the fiction (types 2, 3, and, depending on usage, also 1) whilst the other must be external to the fiction (types 4 and 1, again the latter being context-dependent).
Because of its indefiniteness, type (1) can refer either internally or externally, whilst types (2), (3) and (4) have consistent ontological reference points. Although Herman speaks of type (2), fictional reference, as a case of 'displaced deixis' (1994: 392), we would emphasise that in telecinematic discourse (unlike in written fiction), the visual presence and/or spoken delivery of you makes the function of self-referential address more apparent. The use of you by a protagonist as self-address has previously been noted by Fludernik, who perceives it as occurring in both written and oral narratives and claims that although it is not common in literature, it tends to be used in interior monologue passages, instances of free indirect discourse and psychonarration (1993: 238-239). Also notable in Herman's outline of types (3) and (4) is his adoption of the spatial metaphor of horizontal and vertical address. Whilst fictionalised horizontal address is not metaleptic because of the absence of an ontological boundary crossing, apostrophic address 'exceeds the frame (or ontological threshold) of a fiction to reach the audience, thus constituting "vertical" address' (1994: 380). Herman's innovative contribution to research on second-person you is type (5), double deixis, 'in which we get a superimposition of virtuality (the fictional protagonist) and actuality (the reader) ' (1994: 387). We draw on these categories of textual you in our analysis of Fleabag in order to generate a more fine-grained stylistic account of telecinematic direct address than hitherto provided. Sorlin, for instance, does not make a distinction between the different types of you that may be at work in House of Cards. As mentioned, she finds the use of you in the multimodal combination of telecinema less polysemous than in written narratives, arguing that 'the reference in House of Cards seems clear ' (2016: 201). She does, though, go on to cite Herman in her discussion of the impact of direct address in House of Cards, 1 subsequently arguing that Frank's asides have 'the double contradictory effect of bringing the viewers in[to] the series and of making them aware of their being outside what constitutes fiction' (Sorlin, 2016: 202). Sorlin does not, however, use the term 'double deixis'. This is important, in our view, since although doubly deictic you is possible in telecinematic discourse, we do not consider the direct address discussed by Sorlin in House of Cards to be doubly deictic because it does not also take an observable fictional referent; rather, it is apostrophic in Herman's terms. The felt difference between the apostrophic address of telecinematic discourse and of written second-person usage is precisely the telecinematic context, whichas Sorlin notescomplicates the viewer's ontological relationship to the character: although not doubly deictic per se, telecinematic apostrophic direct address therefore has the experiential effect of seeming to address viewers both beyond the fiction in their domestic contexts and within the fictional world. In our ensuing analysis, we therefore explicate how direct address in telecinematic discourse achieves this strong sense of ontological duplicity.
Sorlin also makes brief use of terminology from Text World Theory (the framework is discussed in more detail in Section 3) to capture viewers' experience of the direct address in House of Cards, claiming that the metaleptic breach of the fourth wall 'disrupts the traditional fictional contract that institutes a clear separation between what Text World Theory calls "Text World" (the situation depicted by the Text) and "Discourse World" (the "situational context" surrounding the Text, including the interaction between Discourse participants like writer-readers for instance)' (2016: 201) 2 . Sorlin additionally cites Gavins' claim that a reader of second-person fiction 'transcends the ontological boundaries' of the text world (Sorlin, 2016: 201;cf. Gavins, 2007: 85). In fact, Gavins suggests that there are two possible reader experiences of second-person fiction (2007: 84-87): readers can feel utterly addressed by textual you, thus accepting the force of the identification, or if readers feel at odds with this you, they will not identify but will nevertheless have to 'follow the invited projection into the text-world and inhabit the deictic centre being described by the second-person references ' (2007: 86). Gibbons makes the same point about you in multimodal printed fiction (2012), and we believe this also to be the case when viewers appear to be looked at and addressed as 'you' by a character: the gesture of breaking the fourth wall has a deictic pull.
Whilst Gavins uses Text World Theory to give a nuanced account of the projection relations of second-person you in written fiction, she does not consider multimodal or telecinematic forms; Sorlin, in comparison, offers a stylistic take on telecinematic you but does not develop a Text World Theory account of metaleptic direct address beyond this, opting instead to follow and advance film study descriptions based on communicative levels. We believe that Text World Theoryas an approach grounded in and developed from insights into the cognitive sciencesis best placed to account for the ontological complexity of direct address in telecinematic discourse as well as how telecinematic direct address is experienced by viewers. Text World Theory is therefore the framework we use to analyse Fleabag, and it is outlined in the next section.

Applying Text World Theory to telecinematic discourse
Text World Theory is a cognitive linguistic model of discourse processing first devised by Werth (1999) and later augmented, most prominently, by Gavins (2007). The strengths of Text World Theory include its ability to combine detailed linguistic analysis with consideration of the sociocultural contexts of interpretation and the experiential effects of discourse for participants. It is also particularly useful for analysing the ontological aspects of texts as it considers the status of text-worlds in relation to the discourse-world or the status of the imagined worlds referenced by the text in relation to the actual situational context of the discourse (Gavins, 2007;Gibbons, 2012Gibbons, , 2014Gibbons, , 2016. In this section of the article, we introduce the key principles of Text World Theory as a cognitive stylistic framework (Section 3.1) and offer some clarifications and augmentations for its use in the analysis of telecinematic discourse (Section 3.2).

Text World Theory: Key principles
Text World Theory was initially designed for the analysis of linguistic discourse, that is linguistic communication between two or more human participants. Its main tenet is that processing linguistic cues prompts discourse participants to create rich mental representations, called 'text-worlds'. It also recognises that linguistic communication always occurs in context, and participants' mental representations of the communicative situation, including their relevant knowledge and perceptions, form the 'discourse-world' from which text-worlds originate. Discourse involves continual and dynamic interaction between discourse-world and text-world levels. It is rare for discourses to comprise a single text-world; the norm is for there to be multiple worlds referring to different times and places, or different perspectives and attitudes. Texts might involve flashbacks, flashforwards, hypothetical scenarios, negated scenarios, modalised assertions or shifts in location or perspective, all of which cue the creation of distinct text-worlds. Where such worlds are created to express attitude, they are known as 'modal-worlds'. For example in a statement such as 'I might come to the party', the modal auxiliary verb 'might' indicates that the scenario is unrealised and being held up for comment by the speaker. In order to process such modality, discourse participants must conceptualise the speaker's uncertain attitude and also, in a separate modal-world, conceptualise the propositions being modalised (Gavins, 2005: 13). The multiple world structure thus reflects the different ontological levels of the discourse.
The text-worlds of all discourse are constructed from a combination of 'worldbuilding' and 'function-advancing' elements. In Werth (1999), world-builders are linguistic cues indicating the time, place, entities and objects of a represented scenario and the relationships between them. Linguistically, world-builders might include: spatial or temporal locatives and adverbs, variations in verb tense, definite articles, demonstratives, noun phrases and personal pronouns (Gavins, 2007: 35-52;Werth, 1999: 180-190). Whilst world-builders set out the deictic parameters of the text-world, function-advancers portray actions or processes which 'propel a discourse forward' in some way (Gavins, 2007: 56), and thus, in language, are often verb phrases. Werth (1999) draws a clear-cut distinction between world-building and functionadvancing propositions, but the strictness of this division has been subject to criticism in later applications: Gavins points out that in descriptive texts, certain textual elements can have both world-building and function-advancing roles (2007: 63); Lahey (2006) suggests that in lyric poetry, function-advancers can play a role in world-building, arguing that Werth's (1999) model is biased towards literary narrative. When applying Text World Theory to multimodal telecinematic discourse (see Section 3.2 below), a clear distinction between world-building and function-advancing elements is also difficult to uphold, though it is still useful to consider the way different elements of composition can signal both the background setting and the foregrounded action of a scene.
Since Werth's (1999) initial exposition of the framework, Text World Theory has been applied to a wide variety of linguistic communication (e.g. see: Gavins, 2007;Gavins and Lahey, 2016). However, the focus is typically monomodal linguistic texts, particularly written texts or transcripts. Of the handful of studies applying Text World Theory to drama and/or film, these tend to perpetuate the linguistic bias of stylistic approaches to drama by analysing the written play text rather than performance (Cruickshank and Lahey, 2010) or screenplay rather than film (Lugea, 2013). By restricting their focus to written texts, these studies do not account for the full range of compositional features that construct the textworlds of drama, film or TV in performance. In contrast, our aim here is to use Text World Theory to study telecinema as multimodal discourse. Indeed, although Werth developed the framework through discussion of written narratives, the communicative model underpinning Text World Theory is based on the prototype of face-to-face interaction and thus encompasses multimodal communication. For instance, Werth argues that (1999: 212): [in order to] construct a text world, the recipient must use all the information available, information which is presented first and foremost through the medium of the text. In face-to-face interaction, there may be other perceptual clues: body language, facial expression, situational circumstances and the like.
Consequently, when processing a multimodal text, text-worlds are generated from the combination of communicative modes at work.
In an article advocating a multimodal stylistic approach to the analysis of drama, McIntyre (2008: 322) draws on some of the foundational principles of Text World Theory. Borrowing concepts from film studies, McIntyre (2008: 313) posits that in addition to linguistic cues, 'non-linguistic contextual cues' in filmsuch as aspects of the mise en scene (setting, costume and make-up, lighting and staging including movement and acting) -'effectively act as visual world-building elements' for a viewer's construction of the fictional worlds of televised drama (2008: 313). McIntyre also notes, in an instance of direct address in the film version of Richard III, that 'the position of the camera gives the illusion of there being a direct connection between the discourse world and the text world of which Richard is a part ' (2008: 325). Despite using this Text World Theory terminology, McIntyre does not provide a detailed discussion of the applicability or usefulness of Text World Theory in the study of film/TV (as we do here), though his work is an important precursor to our discussion as it establishes the value of these concepts when approaching multimodal texts.
Text World Theory has been applied to multimodal discourse, primarily by Gibbons in her studies of multimodal printed literature (2012), mobile narratives (2014) and immersive theatre (2016). Even so, the multimodal application of Text World Theory remains a developing area to which the present article contributes. In her study of immersive theatre, Gibbons (2016) devotes much of her analysis to considering the text-worlds produced by linguistic dialogue. Nevertheless, she also considers the physical environment and copresence of actors and audience in Text World Theory terms. In doing so, she draws on Cruickshank and Lahey's (2010) concept of the staged-world and offers a new term, a representation text-world. A staged-world is defined by Cruickshank and Lahey as 'a conceptual space which corresponds to a performative enactment of the play' (2010: 72). Since Cruickshank and Lahey conceive of the staged-world in relation to reading a written play text (rather than viewing or participating in a performance), they classify this conceptual space straightforwardly as a text-world generated by linguistic world-builders in the play text, such as act and scene numbers, and stage directions. Gibbons claims that when audience members notice aspects of the performance or choreography, their focus shifts to the stagedworld which in performance is 'a frame of representation anchored in the discourse-world' (2016: 83). Gibbons additionally proposes her concept of a re-presentation text-world, which signifies 'a mental representation of the discourse-world ' (2016: 75), and claims that this is necessary for conceiving of the performative and fictional nature of live theatre since it 'represents the discourse-world as a text-world and thus a mental construct ' (2016: 75). This is particularly relevant to immersive theatre because the audience is often the recipient of direct address and expected to respond and interact with the actors. As such, the re-presentation world signals a cognitive construct whereby audience members acknowledge that both their own actions and those of the performers are part of the dramatic pretence. Re-presentation worlds are, therefore, also necessarily part of watching TV, film, theatre and drama and are foregrounded in moments in which viewers notice, for instance, the quality of the acting (the disruption of the pretence thus showing up the dual-world structure with the re-presentation world overlaying the discourse-world). 3 Our analysis is not concerned with this layer of representation though and so this type of text-world is not explicitly treated.
Whilst Text World Theory has, therefore, begun to turn its attention to theatre, drama and telecinematic discourse, this work remains at an embryonic stage. In this article, we develop and apply Text World Theory to account for telecinematic discourse generally and for the use of telecinematic direct address within Fleabag more specifically. This requires some augmentation to Text World Theory.

The text-worlds of telecinema
There are multiple discourse-world participants involved in the communicative situation of telecinematic discourse, including the audience/viewers and also the producers, directors, actors, editors, writers, cameramen and other personnel involved in crafting the TV show. In film and TV scholarship, Richardson uses the term 'dramatists' to refer to the collective of producers, writers and directors involved in the creation of the textual whole (2017: 38) and Brown acknowledges that the very term 'direct address' does not transfer from predominantly linguistic text types to telecinematic media smoothly since '[l]ooking at the film audience is never "direct" in any material sense' (2012: x), not least because a performers' gaze is mediated through the material presence of camera and director. In his cognitive approach to political discourse, Browse also notes that film has manifold authorship (2018a: 33) and that the collaborative nature of political texts requires audiences to conceptually model multiple discourse producers (professional politicians, policy advisors, public relations officials, speech writers, etc.) (2018b). The discourse world of telecinematic discourse is split in a similar way to that of written communication (Gavins, 2007: 27) as the audience/viewers do not occupy the same spatiotemporal coordinates as the production team (and indeed, there may be multiple splits in the discourse world if the show has passed through several phases of production involving different teams). When focusing on the reception of the discourse, the intricacies of all splits and participants are not necessarily needed in analysis. As Browse notes, despite the 'real dispersed nature of its authorship, the film is perceived by the viewer as a finished product and is received as a unified whole' (2018a: 33).
Applying Text World Theory to telecinematic texts requires some adjustment to Werth's and Gavins' definitions of world-builders and function-advancers. As McIntyre (2008) suggests, world-building and function-advancing information is communicated via multiple modes, not only of linguistic elements but also paralinguistic and non-linguistic visual and audio cues which set the scene and create progression in a narrative (see also Toolan, 2001: 104). Table 1 outlines the multiple modes involved in telecinematic discourse. This is developed from McIntyre's extension and application of stylistic analysis to dramaitself drawn from film studies and multimodality studiesthough framed in relation to Text World Theory by providing some examples of possible worldbuilding and function-advancing cues. The listed features may perform either a building or advancing role depending on context. For example, the non-linguistic audio cue of ambulance sirens could function as a world-builder if used to indicate an urban inner city environment, or, in a scene involving an accident, could operate as a function-advancer signalling the arrival of relevant characters. The particular audio qualities of the sirens in relation to other cues would contribute to the role that they play: as a world-builder, ambulance sirens are likely to be backgrounded; whilst as a function-advancer, they are more likely to be foregrounded and become louder to indicate motion.
Telecinematic discourse has the potential to create an array of text-worlds via a number of modes. Consider this example from the opening scenes of Fleabag (to which our analysis will return in more detail in Section 4). Figure 1 depicts a still of the opening shot of the first episode in Series 1. 4 Here, visual world-building information suggests that we are in the interior of a hallway: there is a light switch, a door and the edge of a picture frame visible; it is dark outside so possibly night time. The shot is moving in a handheld camera kind of way, and this is suggestive of a point-of-view (POV) shot. Audio world-building information intimates there is an entity present because viewers can hear breathing but cannot see any entities. Visual and audio world-builders therefore work together to construct this initial text-world (T-W1). The next shot, shown in Figure 2, changes perspective to suggest even more strongly that the initial shot was indeed a POV shot from a character's perspective. There is visual and audio world-building continuity with the previous shot which suggests a continuation of the space and time established in shot 1: the lighting has not changed, we are still  located in a hallway, there are coat pegs and an interior door frame visible. The character is breathing heavily and her small movements can be heard, so the audio has not changed. However, we are located in a different viewing position in the hallway. The edge of the wall is close to us, out of focus, and an enactor is clearly visible: this is Fleabag. Fleabag is looking directly ahead (that is, not at the camera, but in the direction she is facing), presumably at the front door we saw in the previous shot. This perspective shift is worldswitching, we argue, because we move out of the character's point of view and see her instead from the side. This second shot therefore establishes a new text-world (TW-2) that is not filtered through the perspective of Fleabag, unlike the focalised epistemic modalworld which opened the scene.
Section 4 develops this application of TWT to Fleabag, paying particular attention to instances of direct address in Series 1. For the purposes of this article, our analysis works from transcripts of the television show that we have producedin the style of McIntyre (2006McIntyre ( , 2008 (see Supplemental Appendices)rather than the TV script itself, but when cross-referenced, our identification of direct address in the TV production coincides with the stage directions 'to camera' in the official scripts (Waller-Bridge, 2019: 7-8, 266-267, 307-308).

Analysis: The shifting functions of direct address in Fleabag
Our analysis of direct address in Fleabag considers three extracts: the opening to Fleabag in which direct address is first established and two scenes which we see as pivotal in playing with and (re)negotiating the function of the direct address. The first extract begins where the above discussion left off.

Extract 1: Apostrophic 'you' and the creation of a 'split text-world'
As discussed, the initial text-world is located in a hallway, at night time, in a first-person point of view (see Supplemental Appendix 1 for extract transcription). The perceptual shift by which viewers subsequently see Fleabag in a third-person mode creates the first world-switch to T-W2. The shifting between the different point-of-view shots then has a toggling effect between these two initial text-worlds. It is only when Fleabag begins to speak that further text-worlds are created (Fleabag, Series 1, Episode 1): You know that feeling when a guy you like sends you a text at 2 o'clock on a Tuesday night asking if he can 'come and find you' and you've accidentally made it out like you've just got in yourself, so you have to get out of bed, drink half a bottle of wine, get in the shower, shave everything, dig out some agent provocateur business suspender belt the whole bit and wait by the door until the buzzer goes…And then you open the door to him like you'd almost forgotten he was coming over… And then you get to it immediately.
Enactor discourse (that is dialogue and other meaningful paralinguistic communication) can relate to either the text-world that is visually displayed on the screen or to other situations, thus linguistically cueing separate text-worlds. We therefore propose a Text World system of representation that acknowledges that this discourse takes place in the visually framed text-world while potentially simultaneously generating other text-worlds. Because of the multimodal nature of telecinematic discourse, T-W2 is maintained visually whilst viewers also cognise further T-Ws from the dialogue. In Figure 3, T-Ws emanating from Fleabag's dialogue are coded in grey.
Fleabag's discourse is characteristic of telecinematic direct address, in that it uses second-person you accompanied by visual gaze at the camera. In doing so, it fulfils several of the functions suggested by Brown about filmic direct address: a feeling of intimacy is created between Fleabag and viewers and the immediacy of the interaction is instantiated. Fleabag's direct address also quickly establishes her as 'the principal agent of the narrative' (Brown 2012: 13). Fleabag's use of you, however, is multifarious, thus substantiating our earlier claim (made in Section 2) for the need to account for different deictic referents of you in telecinematic contexts. The direct address inevitably means that you is functioning apostrophically (addressing the viewer). However, when Fleabag asks, Figure 3. Text World diagram of extract 1 somewhat rhetorically, 'You know that feeling…' the you has two further functions: it is generalised, implying a plural subjective experience, and it is self-addressing fictional reference, suggesting that this will be a feeling that Fleabag herself has experienced. As such, this first you is doubly deictic and thus triggers at least two separate text-worlds: one that causes the viewer to feel apostrophically addressed and another in which the self-reflexive and generalised you enacts Fleabag's developing narrative. In this narrative, a succession of shifts, linguistically cued by changes in time and modality, create a series of related text-worlds across which the growing specificity of the narrative action prompts an interpretation of the second-person pronoun that is increasingly self-reflexive. In other words, viewers progressively interpret Fleabag's dialogue here as a rhetorical description of what she herself has done, prior to this moment.
The doorbell buzzer goes on cue after Fleabag's narratorial reference to it so, at this point, one of the functions of the direct address is to demonstrate Fleabag's superior epistemic position in the fictional world. As Brown puts it, 'characters who perform direct address generally know moreor are in a position of greater knowledge within the fictionthan other characters ' (2012: 14). Viewers are returned, through visual means, first to the initial text-world of Fleabag's first-person perspective and then once again to the second text-world in third-person perspective. Fleabag's last words before opening the door, 'And then you open the door to him like you'd almost forgotten he was coming over', are figurative (through the simile "like" structure), and therefore create another fleeting text-world, before the action of 'get[ting] to it' begins in the second, visually maintained text-world.
In this analysis, we have suggested that apostrophic you with visual gaze to camera creates a text-world in which the viewer feels directly addressed by Fleabag. We believe that telecinematic direct address is necessarily world-building because it implicates the viewer in communicative discourse and suggests that the viewer is ontologically, but not physically, co-present with the speaker. In the Text World diagram shown in Figure 3, the text-world for the apostrophic address is represented as adjoined to the text-world in which Fleabag is visually represented in the scene. The join is depicted using a dotted line. This is because such 'breaking the fourth wall' in telecinematic discourse creates what we call a split text-world. Our conception of the split text-world relates to Gavins' account of the split discourse-world (2007: 26). In the latter, discourse-world participants are separated by time and space, whereas in our conception of the split text-world, it is text-world enactors who are not co-present. 5 In Fleabag, there also seems to be a split form of communication taking place: Fleabag speaks from within the hallway in T-W2 but the apostrophic referent of you is not co-present with her. This is confirmed later in the scene because the male love interest does not acknowledge the camera and/or the viewer's presence (acting as another indication of Fleabag's greater narrative agency and superior epistemic position). Viewers' sense of immersion in the text-world(s) of Fleabag is, then, complicated by the direct address. Since the love interest appears not to be able to see Fleabag's addressee, viewers are somewhat alienated from the text-world in which Fleabag speaks. Instead, they project into a different text-world space, the apostrophic side of the split text-world, which through the direct address is connected to Fleabag's side not literally but ontologically.
No information is provided in the text concerning the world-building coordinates of the apostrophic side of the text-world. Whilst viewers could draw on discourse-world knowledge to fill this in, perhaps imagining that it resembles their own environment (e.g. their living room), this is not specified in the text, and thus, we conceptualise this side of the split as what Lahey termed an 'empty text-world' because minimal textual information means that the world in question 'is, in effect, deictically empty ' (2004: 26). As in Gavins' argument for the text-world dynamics of second-person fiction, viewers can similarly feel self-implicated and thus identify with this you (as Fleabag's ideal narratee and confidant), but regardless they must track the deictic coordinates, in the process projecting into the empty side of the split text-world. This split text-world, and its relative emptiness on the side to which the apostrophic you refers, accounts for viewer's experience of the dual ontological force of apostrophic telecinematic direct address (including the effect noted by Sorlin in her discussion of House of Cards and discussed in Section 2).
In these opening shots, then, viewers are positioned through direct address as Fleabag's narratee. However, in the following extract, direct address works to create a different effect.

Extract 2: Destabilising the addressee
This second extract in our analysis comes from much later in Fleabag, specifically from episode 2 of series 2 (see Supplemental Appendix 2 for transcription). In this scene, Fleabag has gone to see a therapist after being given a voucher for the session by her father. The therapist tries to get Fleabag to open up about her emotional state by asking a series of questions which Fleabag does her best to evade. The therapist then offers a description of Fleabag as: 'Just a girl with no friends and an empty heart', and Fleabag attempts to counter this description.
Once again, the visual cues create a text-world (shown as T-W1 in Figure 4). Visual world-building devices set the scene in what quickly becomes apparent is a therapist's office which includes objects such as chairs, a coffee table and the ironically placed box of tissues. There are two enactors visually present in this text-world: Fleabag and the therapist (performed by Fiona Shaw). Discourse between these enactors again creates linguistically cued text-worlds. To differentiate the worlds created by each enactor, we have used a colour-coding scheme in Figure 4 with dark shading for the therapist and lighter shading for Fleabag. It is worth noting that although this does not occur here, we believe that enactors can also co-create and jointly elaborate text-worlds in much the same way as observed in collaborative reading group talk (Peplow et al., 2016: 179-187).
The performed discourse in this scene from Fleabag is rather combative. The therapist offers text-world accounts of Fleabag in continued attempts to elicit honest responses from her. The first text-world of this kind is marked in the diagram as T-W2 and the accompanying negation ('no friends', 'empty heart') creates negative-worlds (T-Ws 3 and 4), just as in the analysis of written discourse. Fleabag responds to this defensively, asserting 'I have friends' and in doing so creates a fifth text-world. Both of the therapist's two follow-up questions generate text-worlds (T-Ws 6 and 7) that are epistemic in nature, questioning the veracity of Fleabag's assertion, whilst Fleabag's verbal responses elaborate T-W5.
Fleabag's response to the therapist's first follow-up, 'Oh, so you do have someone to talk to?' (T-W6), is 'yeah'. However, this is followed by Fleabag's shift in gaze from the therapist to the camera, at which Fleabag winks and makes a sucking sound with the side of her mouth before smiling and then returning her gaze to the therapist. Despite the absence of second-person you (or indeed linguistic dialogue), Fleabag's discourse herespecifically the combination of her direct gaze and paralinguistic gesturegenerates another apostrophic text-world (T-W7) that is joined to and split from T-W1. The conspiratorial nature of the wink suggests that we as viewers are the 'friends' to which Fleabag previously referred and to whom she can 'talk to'. This direct address thus has the functions of creating felt intimacy between Fleabag and the viewer as well as acting as a gesture of honesty, 'to express something internal to the character's fictional world (that is their own personal thoughts and feelings)' (Brown, 2012: 15). When the therapist asks, 'Do you see them a lot?' (T-W8), Fleabag replies 'Oh they're… they're always there… they're always there'. Fleabag is initially smiling and nodding uncomfortably in response to the therapist. However, as she speaks the second 'they're always there', her gaze again shifts to camera before returning to the therapist. Once again, this look suggests that we as viewers, in the text-world formed by her gaze, are these friends. However, her choice of pronoun ('they') is a distancing device as is her distal use of the spatial adverb 'there'. Fleabag thus appears to communicate with the viewer by acknowledging the text-world in which we feel addressed whilst also establishing a deictic distance because her dialogue is directed to the therapist and thus excludes us. The temporality of the adverb 'always' suggests a disconcerting constancy: friends cannot literally be with each other all the time. In this scene then, the mixture of Fleabag's description of us as constant friends, and direct address that simultaneously invites viewers into communication whilst holding them at a distance has sinister implications. Although Fleabag's gaze at the viewer right from the start of the show indicates our constant presence alongside her, the narrative context of Fleabag's admission to the therapist in this scene nevertheless causes viewers to reinterpret their role. Viewers are repositioned in terms of their relationship to Fleabag: we no longer appear to be her intimate narratorial confidant but rather an inescapable persona within her psyche. Viewers flesh out the details and parameters of empty text-worlds through their own interpretive assumptions. It is precisely because the apostrophic side of the split text-world is empty that viewers can revise and 'repair' (Gavins, 2007: 141-142) their interpretations of their role as Fleabag's addressee. What changes in our experience of Fleabag here, then, is our sense of the ontological positioning of the direct addressee; rather than being only an extradiegetic narratorial confidant, viewers also become intradiegetic as a character Fleabag herself has perhaps imagined. At this point, therefore, even without the use of textual you, the telecinematic direct address is doubly deictic, signalling both to an apostrophic viewer and positioning them inside the fictional world as a figment of Fleabag's imagination. The final extract in our analysis occurs in the episode after extract 2 and shows even more play with direct address, confirming this unsettling repositioning of the viewer.

Extract 3: Further ontological play
In this third extract for analysis, Fleabag is chatting on a bench with the sustained love interest from series 2, who is a priest (see Supplemental Appendix 3 for transcription). Although there seems to be a mutual attraction and connection between the two characters, the priest has thus far rebuffed Fleabag's sexual and romantic advances in favour of his vow of celibacy. The extract opens with the priest reaffirming the platonic nature of their friendship (Fleabag, Series 2, Episode 3), saying 'I'd really like to be your friend though'. Fleabag agrees, and then turns to the camera to offer a sardonic comment about her chances of sexual conquest: 'We'll last a week'. Then, unexpectedly, the priest asks Fleabag, 'What was that?' followed by 'Where did you… Where did you just go?' and finally 'You just… went somewhere'. This is perturbing because, unlike other characters, the priest seems to notice some aspects of Fleabag's asides. Although he does not appear to have heard what she said, he senses a departure from her engagement in the conversation with him.
Because Fleabag remains physically on the bench with the priest, the spatial deixis in his utterances seems metaphorical and suggests that Fleabag has 'gone' somewhere psychological. This plays into the sense that we may be internal to Fleabag's consciousness as established in Extract 2. In fact, this psychological interiority was already intimated in the opening shot to the entire programme (discussed in 4.1 above) in which the initial text-world was Fleabag's POV. However, through scenes such as those we have discussed in 4.2 and 4.3, this interpretation becomes increasingly foregrounded in series 2 and causes viewers to repair their sense of the deictic and diegetic positioning of the empty side of the split text-world of the direct address. Moreover, our proposal for a split textworld with an empty side to represent the referent of the direct address is able to explain this doubly deictic effect; although viewers feel the force of direct address, the lack of textual information allows both for the apostrophic experience of being directly addressed and for the impression that the direct addressee is a fictional protagonist within Fleabag's psyche.

Conclusion
Previous studies have explored direct address, created through gaze alone or in combination with the second-person pronoun, in film and TV (e.g. Brown, 2012;Birke and Warhol, 2017;Sorlin, 2016), but these studies have not taken into account the polysemous and changing reference of textual you or made use of the stylistic apparatus of Text World Theory. In this article, we have pioneered the application of Text World Theory in analysing telecinematic discourse. In doing so, we have offered augmentations to the framework to account for the co-presence of text-worlds simultaneously evoked by different modes, specifically those created through visual world-building and those created through enactor discourse. Because of its cognitive foundations, Text World Theory augments existing stylistic scholarship on TV and drama in enabling a more nuanced account of the experiential effects of telecinematic direct address and the sometimes shifting ontological positioning of viewers.
As our analysis of Fleabag demonstrates, apostrophic direct address is necessarily world-forming, but direct address can position the reader in ontological terms differently in different narrative contexts. Our analysis utilised Herman's categories of the various deictic references of textual you. This not only provided an incisive stylistic account of you in telecinematic dialogue but also shed light on viewers' interpretations both of you and of the ontological grounding of the direct addressee. Additionally, we posited that in TV Drama, direct address (in the form of gaze and/or apostrophic textual you) produces split text-worlds. Our concept of the split text-world, derived from Gavins' split discourse world (2007: 26; see also Werth, 1995: 54-55), acknowledges the spatiotemporal disjunct between Fleabag and her direct addressee. The 'emptiness' of the addressee side of this split text-world permits interpretive openness and therefore allows for and explains readers' felt sense of being addressed by a fictional character as well as occasions in which the addressee is positioned doubly deictically, for instance both co-opted as Fleabag's confidant and an instantiation of her troubled mind.
Whilst we devised the split text-world to account for the effects of telecinematic direct address, we believe that the concept has further explanatory potential. For instance, in telecinema, when a split-screen shows two (or more) connected scenes simultaneously, such as when both sides of a telephone call appear on screen, the split text-world would also apply (though in this instance, neither side would be empty). In written discourse, the split text-world could also explain the narrative representation of telephone calls as well as instances of apostrophic address, by text-world narrators or character enactors, seemingly to real readers. In the latter case, the apostrophic side of the split text-world would, once again, be empty with readers infilling details based on their interpretations. Whilst the 'split' of split text-worlds represents disconnection in time and space, in the case of direct addressparticularly as it occurs in telecinema, drama and theatrethe split is synonymous with the so-called breaking of the fourth wall. Rather than breaking the fourth wall, then, telecinematic direct address invites viewers into split text-worlds.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/ or publication of this article.

Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.

Supplemental material
Supplemental material for this article is available online. Notes 1. Although Sorlin (2016: 201) attributes the words to David Herman, she is actually citing his translation of a quote by Jürgan Habermas about the second person in Italo Calvino's If on a Winter's Night a Traveller (Herman, 1994: 399). Nevertheless, Herman uses this quote in his discussion of the ontological duplicity of doubly deictic you. 2. In this quote, we have preserved Sorline's original capitalisation of 'Text' and 'Discourse'. 3. Re-presentation worlds are more overt in immersive theatre when the interactive nature of the performance foregrounds the fictive role-playing undertaken by actors and, indeed, audience members. However, the visuality of telecinematic discourse means that re-presentation worlds are also applicable to theatre, film and TV drama since characters are necessarily performed by actors: viewers must, therefore, re-present that segment of the telecinematic (split) discourseworld that is captured on camera in text-world ontologies as part of their suspension of disbelief. In this way, viewers accept that a discourse-world participant is acting the role of a character enactor. 4. Complying with the principles of fair use of copyrighted materials, stills from Fleabag (Figures 1  and 2) are used in this article only for academic purposes.
5. Our concept of the split text-world also differs from Gibbons' notion of the re-presentation world. There is a re-presentation world at work here, of course, since viewers of Fleabag do have to re-present Waller-Bridge's words and motions in the discourse-world as fictive, choreographedand thus text-worlddialogue and actions. Nevertheless, viewers are not able to interact with Waller-Bridge in the manner that audience members do in immersive theatre. In fact, our concept of the split text-world has an altogether different focus: it relates to a viewer's mental representation when they are addressed by Fleabag and their acknowledgement that this is somewhat impossible because they do not occupy the same spatiotemporal coordinates as her.