The Heartbeat of Poetry: Student Videomaking in Response to Poetry

This article contributes to an emerging body of scholarship on multimodal composition in the poetry classroom through a study of Finnish lower secondary students’ digital videomaking in response to poetry. The study explores students’ use of semiotic resources in their interpretive work in transmediating a poem into a digital video, with a particular interest in their use of sound elements. Based on social semiotic theory of multimodality, the analysis shows how the students in a variety of ways used sound elements, together with other semiotic resources, to explore their interpretation of the poetic text. Sound elements in particular became a key resource in the interpretive work, giving the students the opportunity to elaborate on topical issues of interest and importance to them while reinforcing their social agency. The study demonstrates the relevance of sound elements in students’ digital composing and explorations of poetry. Furthermore, it reveals how the students showed a capacity as well as a willingness to act, to have influence, and to make substantiated claims for recognition regarding critical issues related to sexuality and society.

A growing body of scholarship has explored the influence of new media on the research and teaching of literature (e.g., Romero et al., 2018). While literary interpretation through traditional written genres has generally remained foundational to literature curricula, particularly in secondary education, there has been a concurrent call for an expanded view of literacy, one that acknowledges meaning-making using multiple modes and media (Smith, 2019). To bridge the critical gap, Smith (2019) notes that researchers have begun to examine how students can interpret literature through digital multimodal composition. Composing multimodal texts in response to literature offers students opportunities to combine images, music, sound, and narration as layered and interwoven in creative ensembles (Schmidt & Beucher, 2018). This article contributes to an emerging body of scholarship on multimodal composition in the poetry classroom by reporting a study of eighth-grade students' use of semiotic resources during a multimodal composition process in which they transmediated a poem into a digital video. The guiding research questions are: How do students use semiotic resources, and sound elements in particular, in their digital video to respond to a poetic text, and what are the possible implications for educational practices as well as research on multimodal literacy and digital composition more broadly?
Although scholars have focused on poetry education since the beginning of the 20th century (Dressman & Faust, 2014), interest in researching poetry in educational contexts has gained critical momentum in recent years. Studies from both the United Kingdom and Sweden show that in certain classrooms and among certain teachers, poetry is alive and flourishing (Dymoke, 2017;Sigvardsson, 2020), and studies have explored students' and teachers' conceptions of poetry and poetry education (e.g., Benton, 1999Benton, , 2000. However, research has described how teachers often feel inexperienced as poetry readers (Hughes & Dymoke, 2011), how students have perceived poetry as inaccessible and difficult (Dressman, 2015;Jones & Curwood, 2020), and how poetry teaching may in fact be in decline (Creely, 2019).
Recently, both researchers and educators have recognized the need to reenergize traditional poetry teaching. Creely (2019) has called for "a more radical and disruptive pedagogy for bringing poetry to the classroom" (p. 116), while Dressman (2015) has foregrounded investigations that contribute to renewing and reinvigorating the place of poetry in school curricula. Such attempts to engage young people in poetry-related activities in the classroom have included research into spoken-word poetry (e.g., Dymoke, 2017;Jocson, 2006;Jones & Curwood, 2020) and different arts-based responses to poetry (e.g., Jusslin & Höglund, 2021). To answer this call, researchers have also begun to examine how students can interpret literature through digital multimodal composition (Smith, 2019). Several scholars have acknowledged the interdisciplinary and multimodal character of poetry, opening up new approaches to poetry that incorporate the affordances of digital media (e.g., Curwood & Cowell, 2011;Dymoke & Hughes, 2009;Kovalik & Curwood, 2019).
In the following sections, I review previous research on digital, multimodal composition in response to literature, particularly poetry. I then present a theoretical framework and describe the research context, procedures for data production, and analytical approach. Following this, I present the findings, before discussing the potential of digital, multimodal composition in students' interpretive work with poetry and its implications for educational practices as well as research on multimodal literacy and digital composition more broadly.

Digital Multimodal Responses to Poetry: A Literature Review
Previous research has examined how adolescents interpret literature through digital multimodal composition, considering, for example, students' compositions from novels to slide shows (Jocius, 2013;Ringler et al., 2014) as well as canonical drama texts to spoken-word performances (Anglin & Smagorinsky, 2014) and digital video compositions (Dallacqua & Sheahan, 2020;Vasudevan et al., 2010). Scholars have also studied the fusion of popular culture texts with canonical literature (e.g., Bowmer & Curwood, 2016;Burn, 2021), students' digital story compositions in response to historical fiction (Kesler et al., 2016), and adolescents' perspectives on their multimodal composing goals and designs when creating digital projects in response to renowned short stories (Smith, 2018).
Studies have highlighted a number of positive outcomes from multimodal student compositions in response to literature and, in turn, the pedagogical potential for multimodal literary interpretation. For example, studies have shown how digital, multimodal composition has enabled students to explore literary devices, characterization, and themes through various digital projects (e.g., Curwood & Gibbons, 2009;Jocius, 2013;Smith, 2019). Digital video composing, for instance, has been shown to offer students different ways to engage with literary texts, including analysis, synthesis, symbolic and metaphoric thinking, and thematic abstraction (Miller, 2011). Studies have examined students' multimodal composing goals during a literary analysis unit (Smith, 2018) and have shown how students have been attuned to choices related, for example, to video transition effects and juxtaposition of images with lyrics (Carey, 2012;Jocius, 2013). Kesler et al. (2016), meanwhile, have argued that project-based digital responses in collaborative groups have been beneficial to student comprehension of literary texts.
The research literature also includes studies that address multimodal composition in response to poetry. Previous research has shown how such an approach facilitates an in-depth exploration of the poetic text and fosters analytical and interpretive thinking. For example, in a study of student responses to a canonical poetic text through the use of drama and filmmaking, Coles and Bryer (2018) noted how students were afforded opportunities to look closely at the structure of the text, which enabled both close attention to the details in the text and critical interpretive acts. Elsewhere, Bryer et al. (2014) highlighted the range and depth of students' interpretative responses during a filmmaking project in response to canonical poetry. However, studies have also described how these projects were often time consuming, given the extended time needed for students to analyze and compose across modalities (Callahan & King, 2011;Curwood & Cowell, 2011).
Researchers have also pointed to how digital multimodal responses to poetry can prompt meaningful interaction between students and encouraged student engagement (Curwood & Cowell, 2011). Changes to the poetry curriculum have been shown to boost students' interest in poetry and their engagement and motivation in the poetry classroom. Similarly, interpreting poetry through digital composition has enhanced both students' and teachers' appreciation of poetry (Callahan & King, 2011;Cowan & Albers, 2006). According to McVee et al. (2008), students with negative feelings about poetry redirected their emotions more toward communicating their understanding, which moved them away from fears of not producing a "correct" interpretation and promoted a more inclusive poetry education as a result.
Additionally, a multimodal approach to literature instruction has been considered crucial to increasing student agency, supporting them in interpreting literary texts and addressing social issues related, for example, to sexuality and social status (Ajayi, 2015). Studies have reported how responding to poetry through digital, multimodal composition can engender a sense of personal relationship with and ownership of the poem in question (Hirsch & Macleroy, 2020;McVee et al., 2008). These approaches empowered students to critically reflect on and extend their perspectives on themselves and their lives (Curwood & Cowell, 2011;McVee et al., 2008). Significantly, they also served as a counternarrative for students: a novel critical means to explore, analyze, and "push back against the master narratives" (Curwood & Gibbons, 2009, p. 60) related to, for example, race, class, gender, and sexual orientation, allowing them to (re)present their own identities. Vasudevan et al. (2010) observed the ways in which students adopted authorial stances in their multimodal composing practices, which the authors define as the practice of taking on literate identities and claiming a presence as an author and narrator of one's own experiences. Similarly, Schmidt and Beucher (2018) explored embodied literacy practices and multimodal composing processes in response to literature of Black girls and demonstrated the negotiations of power-laden racial discourses around school curriculum that both enabled and constrained student agency. These studies indicate the potential of multimodal composition to address issues related to social agency, where social agency here broadly refers to the ability or capacity to act, to have influence or to transform, to make representations of the self, or to make substantiated claims for recognition (see Trauger, 2009).
Although previous studies have contributed important insights into how multimodal composition in response to literature offers both close attention to the literary text as well as opportunities to address social issues, in the context of a rapidly changing digital culture, and increasing calls to address issues related to equity and justice, further research is needed to understand adolescents' digital, multimodal responses to poetry and their implications for literacy education.
The increased use of digital technologies and the dominance of visual media have led researchers to focus on communicating with a variety of modes. When it comes to previous studies of youth-produced digital video, Hull and Nelson (2005) greatly influenced the analysis of multimodal compositions based on their detailed analysis of a digital video, focusing on the visual and textual modes of a digital story. With their fine-grained analysis of the artist Randy Young's video, Hull and Nelson (2005) articulated the multimodal power of the video "Lyfe-N-Rhyme" that combined poetry with rap embedded with social critique. Halverson (2010) extended the analytical focus to include the mode of sound, turning to film theory to develop a coding scheme to support the analysis, and presenting a framework for analyzing young people's films (also see Halverson et al., 2012). Recently, Smirnov and Lam (2019) developed a fine-grained multimodal transcription scheme to examine how youth use media production to represent and reimagine the complex of social practices related to racial profiling of men of color in the United States.
Zooming in on the different phases of the video composing process, Ranker (2008) examined how the interface of a video editing program influences the composing process of two students. Similarly, Gilje (2010) investigated the role of semiotic tools during the phases of writing synopsis and making a storyboard in students' filmmaking practices. In a study of writing within and across modes in filmmaking, Gilje (2015) also investigated a group of students redesigning a canonical drama play into a music video. He specifically emphasized how digital editing software has changed postproduction in filmmaking, turning it into an iterative and replicating process as we think in tandem with the affordances of "the multimodal remixing desk" (Gilje, 2015, p. 159). Likewise, Burn and Parker (2001) have emphasized the availability of the tools for digital production, or what they refer to as digital inscription, as it affords composition that is highly plastic, fluid, and reversible.
Researchers have more recently begun to focus on sound elements when analyzing students' multimodal composing (Shanahan, 2012). For example, Wargo and Clayton (2018) examined how adolescents use sound as a modal resource for design to leverage school-based social action. Wargo (2017) also has studied how sound operates as a tool for attuning to cultural difference and community literacies and encourages educators to consider the modal affordances of sound composition. Sound can help communicate information, serve as a symbol or motif, situate time and place, and play an integral role in creating atmosphere or stimulating emotional investment (Collins & Kapralos, 2018). Yet, the communicative and interpretative possibilities of sound elements remain underresearched, especially in response to poetry. But why is this important, and how can these poetry-related and video composing literacy practices develop the notion of literacy and inform the work of literacy researchers and educators?
The present study aims to further the body of knowledge on multimodal composition in the poetry classroom by exploring students' use of semiotic resources in their interpretive work in transmediating a poem to digital video, with a particular interest in their use of sound elements. Following are the guiding research questions: How do students use semiotic resources, and sound elements in particular, in their digital video to respond to a poetic text, and what are the possible implications for educational practices as well as research on multimodal literacy and digital composition more broadly?

Theoretical Framework
This study approaches poetry not as static texts with an inherent "correct" meaning, but as dynamic texts with rich potential for multiple interpretations. In her influential work on transactional theory, Rosenblatt (1938Rosenblatt ( /1994 already countered the notion of a predetermined text, given that the text is read differently by different readers; the subsequent meaning of any text lies not in the work itself but in the reader's transaction with it. Reading literature thus becomes a performative activity (Meyer & Rørbech, 2008) to be viewed as a relationship between text and readers, in a theoretical paradigm shift that has been advanced and incorporated in both research literature and teaching practices. In the present study, literary responses are understood as performative: meanings, interpretations, and texts are continuously constructed in and negotiated by social groups and communities (see, e.g., Meyer & Rørbech, 2008;Rørbech & Hetmar, 2012). Following a performative approach, literary interpretation is understood as a meaning-making process that is intertwined with social and cultural factors and will vary in time and space.
There are several ways to foster literary interpretation, and in recent years, researchers have found a growing interest in using different art forms to respond to literary texts (see, e.g., Jusslin & Höglund, 2021). Some scholars have referred to the recasting of meaning from one mode to another as transmediation, a process offering both the exploration of the original text and the creation of a representation in another sign system (Siegel, 2006). Research on transmediation of literary texts has demonstrated how transmediation processes expand the interpretive potential of the text under examination (e.g., Carey, 2012;Miller, 2011). To explore this process in detail, I adopted the social semiotic theory of multimodality to analyze student digital videomaking in response to poetry.
Scholarship on multimodality suggests that meaning emerges from the coordination of multiple semiotic resources or modes (Jewitt, 2009;Jewitt et al., 2016). One of its fundamental principles is that representation, communication, and interaction consist of multiple modes, all with the potential to make meaning. All modes are shaped by their cultural, historical, and social contexts (Jewitt, 2009), meaning they are subject to change over time and with use. Accordingly, the various resources that make meaning are not distinct, but instead combine into an integrated multimodal ensemble (Jewitt et al., 2016). This phenomenon becomes even more apparent with the development of digital technologies, which enable people to combine resources more efficiently and affordably than ever before (Jewitt et al., 2016).
Multimodality is a broad concept and, for scholars like Jewitt (2009), represents a research field rather than a theory or discipline and can be approached using different theoretical perspectives. The social semiotic theory of multimodality, specifically, is strongly associated with Gunther Kress and Theo van Leeuwen and extends Halliday's theories of social semiotics and systemic functional grammar to a range of modes (Kress & van Leeuwen, 2006; see also Jewitt et al., 2016). It focuses on the sign-makers, the process of meaning-making, and their social agency in order to understand the situational decisions people make to communicate and create meaning using the wide range of semiotic resources available to them (Jewitt et al., 2016).
In social semiotic theory of multimodality every text consists of three functions that are always performed simultaneously. These are referred to, following Halliday, as metafunctions and are termed representational, interactive, and compositional (Kress & van Leeuwen, 2006). Representational meaning focuses on the what: what people, places, actions, and things are represented through different semiotic resources. Interactive meaning focuses on the how: how relations between the digital video and the viewer are created. It deals with matters such as choice of camera angles, shot types, and camera movement, as well as different sound element and written text elements. Three factors are crucial at the interactive level of analysis: contact, distance, and point of view; the use of semiotic resources have the potential to bring people, places, or situations close to the viewer, or to create remoteness or affect the notion of distance. Also, the choice of viewpoint implies the possibility of expressing subjective attitudes toward issues of representation. Compositional meaning also focuses on the how: how the structure of the text is created. A digital video is consequently composed by both temporal and spatial aspects; the video is created partly by the actions represented in the individual clips and the merging of individual clips into a whole, and partly by spatial composition of the individual clips in terms of what is placed where in the image frame. Temporal aspects deal with the semiotic rhythm and how different modes are organized to structure the digital video coherently. Spatial aspects emphasize the two central principles for composition: information value and salience. The principle of information value deals with the placement in the various "zones" of the image: left or right, top or bottom, center or margin. The principle of salience deals with how attention is drawn to or realized, using resources such as color, sound, light, or zooming.
According to Burn and Parker (2003), the metafunctions offer a valuable analytical tool for exploring meaning-making at different levels in communicating through film and other genres that incorporate moving images. Building on previous research on young people's engagement with the moving image, they wanted to complement these accounts with a theory of signification. They specifically widened the scope of analysis to include the moving image, coining the term "kineikonic mode" to expand Kress and van Leeuwen's work on visual design (Burn & Parker, 2001, 2003. The moving image applies to filming and editing, which Burn (2016, p. 313) refers to as the "orchestrating modes" of the moving image. The arrangement of these modes occurs in both spatial and temporal dimensions: spatial logic dominates the composing of an individual frame when filming, whereas temporal logic dominates the editing stage. The nature of the moving image is the relation between these two modes, the two modes that Burn (2016) refers to as Filming and Editing. These two overarching modes in turn include what Burn (2016, p. 313) refers to as "contributory modes." Contributory modes to Filming, for example, include framing, camera angle, and camera movement as well as setting, possible actors, and action. Contributory modes to Editing include temporal framing, transitions between cuts as well as the assembling of, for example, sound effects and music.
From the perspective of semiotics, van Leeuwen (1999) deliberates on the communicative use of sound. The present study draws on two concepts in particular that complement the metafunctions and kineikonic mode: perspective and social distance. According to van Leeuwen (1999), perspective hierarchizes certain elements by placing some in the foreground and some in the background, for example, in film where there might be dialogue in the foreground and music in the background. He distinguishes between Figure, Ground, and Field, where if a sound is positioned as Figure, it is treated as the most important sound and the listener must react to and/or act upon. If sound is positioned as Ground, it is treated as a minor or less involved way, whereas if sound is positioned as Field, it is treated as existing but not in the listener's social world. Social distance creates relations of different degrees of formality between what is represented and the viewer or listener, such as intimacy, informality, or formality. Both perspective and social distance in van Leeuwen's theory of sound have close connection to the metafunctions but are more elaborated regarding the semiotic resources of sound.
The present study draws particularly on social semiotic theory of multimodality for multiple reasons, including its emphasis on the agency of signmakers and its focus on modes, their affordances, and the social uses (Jewitt et al., 2016;Kress & van Leeuwen, 2006). As a framework for exploring student multimodal design in videomaking practices, it specifically provides an attentiveness to students' usage and perspective as well as the interplay between different modes and their affordances. Together with principles of the kineikonic mode (Burn & Parker, 2001, 2003 and the semiotic resources of sound (van Leeuwen, 1999), it formed the basis for an analysis of students' use of semiotic resources in their videomaking work in response to a poem, with a particular interest in their use of sound elements.

Research Design, Data, and Analytical Approach
The study took place in an eighth-grade classroom at a Swedish-speaking school in Finland. Inspired by poet Molly Peacock's reference to poetry as "the screen-size art" (Hughes, 2008, p. 149), with its brevity and concision of form but not of content, I was interested in exploring the use of visual responses as a means of interpreting poetry. Then again, as Duncum (2004) points out, "visual culture isn't just visual" (p. 252): contemporary cultural forms that are viewed in everyday speech as visual in fact feature multiple communication modes. Film and digital video are more than images; they also include music, sound effects, and spoken voice, and as such are multimodal by design. Investigation of how students use semiotic resources to respond to literary texts required a research design that gave recognition to and acknowledged student meaning-making using a multiplicity of modes. The multimodal approach functioned as an entry point to explore poetry in lower secondary education through a project titled Video Poetry. The project ran over a 5-week period and comprised five 90-minute lessons.
Data for this study was produced collaboratively by two teachers and me as a researcher. The teachers were voluntarily participating in the study and recruited by answering positively to a request sent to the school. Material for analysis took the form of video observations of a group of four students, their collective process of digital videomaking, and their resulting digital video. This material was produced by students in the eighth grade (aged 14-15 years) and followed the guidelines and ethical principles of research with human participants settled by the Finnish National Board on Research Integrity (2019).
Throughout the process of data production, I was interested in the composing process, not only the final digital video output. Therefore, I chose to train the camera on a focus group of four students and attached a wireless microphone to one of the students in the group. The choice here was grounded in research ethics: these students and their parents had granted them permission to take part and for the data to be used as examples in research publishing, teacher education, and/or in-service teacher training. For this particular study, the empirical material for analysis is the students' digital video (for analyses of the collective working process, see Höglund, 2017).
During the video observations, I kept a low profile and leaned toward the observer side of the participant-observer continuum (Schwartz-Shea & Yanow, 2012), mostly managing the recording equipment. During the actual video recordings, I did not intervene if I was not directly addressed. However, I would not consider myself as an "invisible" or "unnoticeable" researcher; on the contrary, the students were well aware of my presence, and I interacted with them before the lessons started in situations such as attaching the microphone. My presence and role as a researcher are thus not to be considered either "invisible" or "disturbing"; rather, they follow an interpretive research approach that "accompanies the researcher's physical, cognitive, and emotional presence in and engagement with the persons and material being studied" (Schwartz-Shea & Yanow, 2012, p. 98).
The students participating in this study, Catrin, Linda, Casper, and Philip (all names are pseudonyms and photos presented have been sketched and blurred to secure personal integrity), worked with the poem "I want to meet . . ." 1 by the Swedish poet and novelist Karin Boye , first published in The Hearths (Härdarna) in 1927. The focus on this particular group of students was ethically grounded; the parents had granted their permission for these particular students to be part of research reporting. The students had individually, based on a previous assignment, chosen a poem beforehand that in some ways spoke to them, and the final choice of poem was made collectively in the group. The students had been able to make their individual choices based on the teacher's selection of poems in different styles and from different periods gathered in a booklet. During the lesson, they were to agree on one poem to work with; the group settled on the Karin Boye poem almost immediately. Their rationale was not further elaborated at this point; however, Casper acknowledged the message or statement of the poem. This final choice was made without further discussion, but with what seemed to be common agreement.
Preceding these lessons, the teachers introduced literary concepts such as imagery, metaphor, and simile and discussed different format of poems as well as rhythm, rhyme, and tone. The teachers emphasised an open approach to interpreting poetry and emphasised the figurative meaning of poetic language. They also assigned the students to compose the digital video by going through four different phases: initial responses and writing a synopsis, making a storyboard, filming, and editing. Besides these instructions and some explanations on the format of storyboard and a short technical introduction to the camera and editing software, the students were not given any particular guidelines for the task; rather, they were given free rein and room for initiatives throughout the project. The students had, as far as I know, no prior experience of this type of project.
The analytical framework for this study is based on the metafunctions of text developed within the social semiotic theory of multimodality (Kress & van Leeuwen, 2006), including insightful elements from other studies of digital video produced by young people (e.g., Burn & Parker, 2003;Gilje, 2010;Halverson, 2010;Ranker, 2008). In the analysis at the representational level, I focused on how different modes were used and how various persons, settings, and objects were represented through different semiotic resources. In the analysis at the interactive level, I focused on how relationships were forged between the digital video and the viewer. Besides the students' choice of camera angles and shot types, close attention was paid to the students' choices of voice-over, sound effects, and other means of interacting with the viewer in relation to the crucial parts of the interactive level: contact, distance, and point of view. In the analysis at the compositional level, meanwhile, I focused on how the digital video was structured to compose a cohesive text, specifically the structure and composition of both temporal and spatial aspects. The analysis of the temporal aspects considered the semiotic rhythm and how different modes were organized to structure the digital video coherently. The analysis of the spatial aspects emphasized the two central principles for composition: information value and salience.
For transcription of the students' digital video, I developed a transcription system based on the kineikonic mode (Burn, 2016;Burn & Parker, 2003) with inspiration from the way Halverson (2010) applies the kineikonic mode in the analysis of youth films. The transcription system was structured around the two central representational systems within the kineikonic mode: filming and editing. The transcription attended both to the content and the form of the students' digital video. Based on the interest of the study, I found that by approaching the digital video from the two representational systems of filming and editing, the interplay of the semiotic resources was more relevant than the breakdown of smaller elements. That is indeed the very focus of the multimodal approach: the way that different modes interact with one another and what is created as a result of their interaction (Jewitt et al., 2016). This analytical choice made it possible to attend the students' digital video openly without locating modes made up beforehand.
In the transcript filming, I noted resources possible in the filming phase, for example, camera movement, camera angles, length of shot, audio (such as dialogue or sounds occurring during the filming phase), cuts, settings, actors, and the action taking place. In the category editing, I noted resources possible in the editing phase, for example, audio (such as music, voice-over, or sound effects), written text, transitions, and special effects (such as slow motion or freeze-frame).
The analytical approach used for the digital video acknowledges that the students' work is based not only on content but also on how they choose to represent it. But more importantly, it acknowledges social agency (Jewitt, 2009): students are viewed as active meaning-makers, acting according to their interests in a specific situation and their contextual use of semiotic resources.

Exploring Issues of Identity
Analysis at the representational level focused on the choices of persons, settings, and objects, and the semiotic resources through which they are represented, with particular emphasis on their use of sound elements. The video opens with a clip of a black screen on which the white text "I Want to Meet . . ." (the title of the poem) suddenly fades in from left to right. The text remains on screen for a few seconds before it tones out from left to right. There is no sound on the audio track. The silence is then broken abruptly by a transition to the first scene, a hard cut with added audio.
The first scene (Figure 1), which includes both exposition and rise of action, begins with a loud, intense, monotone sound that can be described as threatening or intimidating. This sound continues throughout the scene. Shortly after the sound starts, a male voice starts to recite the poem: "Armed, erect and closed in armour." Simultaneous to the loud sound, the image trace showsusing a full shot from a slightly high angle view-a group of six persons gathered around a teenage girl. The setting is a stairwell in a school environment. The school itself holds meaning potential since it represents a social setting in youth culture where constructions, explorations, and expressions of identities are exposed. Furthermore, the stairwell suggests a public and social space or arena within the school environment. The intimidating sound, together with the group's encircling of the girl, implies that she is somehow exposed or in danger. The male voice continues ("forth I came-"), and at the same time, the girl turns around to face the group. The narration continues ("but of terror was the mail-coat cast/and of shame") simultaneously as the people in the group flinch backward as if reacting strongly to something the girl has said.
One among the group then pulls his hoodie over his head, and three others cross their arms: two explicit gestures that signify distancing or dissociating oneself from something. The girl looks at them for a moment, then turns around and runs off while the group watches her closely. At the same time, the male voice continues: "I want to drop my weapons,/sword and shield./All that hard hostility/made me cold." Together, the combination of these three elements-the threatening sound effect, the nondiegetic narration of the poem, and the characters' actions-create a representation of a serious situation marked by disapproval and rejection. The image fades to the following scene, scene two, where the setting is somewhat different (Figure 2). The environment is still a school, but now a corridor with lockers in the background, a more withdrawn space than the stairwell's public arena. The frame depicts the girl sitting huddled up on a bench with her head down. The intimidating sound used in the previous scene still plays on the soundtrack, at high volume. The girl's body position, together with the sound effect, evokes exposure and vulnerability. Suddenly, the sound effect fades out, and another teenage girl approaches the girl on the bench from behind. The newcomer places her arm around the shoulders of the huddled girl, who looks up and smiles. The other girl leans against her in a gesture like an embrace, and the voice-over recites the lines: "Mightier than iron/is life's tenderness,/driven forth from the earth's heart/without defence./ The spring dawns in winter's regions,/where I froze." Together with the recitation, the girls' actions and the fading out of the intimidating sound create an atmosphere of care and considerateness.
The image fades to the third and final scene, in which the two girls stand facing each other (Figure 3). From this point on, the scene is played in slow motion. As the voice-over continues ("I want to meet life's powers/ weaponless"), the girls reach out, clasp each other's hands, then turn their backs to the camera and run off hand in hand down the corridor. After a couple of steps, they jump up in the air, and while they are still airborne, the clip is cut, a black frame appears, and the credits start to roll. When the girls take each other's hands, the voice-over finishes the recitation and the sound of a beating heart-present since the beginning of the scene-becomes distinct as all other sounds are silenced. The girls' actions, the sound of the heartbeat, and the narrated lines from the poem combine to create a sense of hope and change.
The relationship between the two girls can be interpreted as friendship, the storyline likewise as one of finding friendship or acceptance. The sound effect of the pounding heart can be interpreted as a signal of care and consideration in a nonromantic sense. However, the sequence of "failed scenes" that follows the main narrative intimates that the relationship between the two girls is something more than friendship. This thematic choice becomes apparent when listening to the students' discussions during the videomaking (see Höglund, 2017, pp. 133-134), but is left more ambiguous in the digital video.
In the first sequence of the "failed scenes," a reference to marriage is apparent in the use of the sound effect of church bells. During the editing of their digital video, the students explored different sound effects, and finding the sound effect of church bells led them to an elaboration on the poem's topicality in contemporary society; they relate their interpretation of finding and showing one's true self in relation to sexuality and comment on the relationship between the church bells, marriage, and Finnish marriage law which, at the time, did not allow people of the same sex to wed 2 (see Höglund, 2017, pp. 132-133). The students' reference to marriage through their use of church bells is an example of their awareness of how sound effects offer narrative elements and carry particular social meanings.
The most explicit reference to, and standpoint on, homosexuality occurs in the final clip of the video, a frame of white text on a black background: "Thanks to Karin Boye (who was gay)" accompanied by the sound effect of applause. Using various semiotic resources, the students have created their thematic interpretation of the poem as a storyline about a queer person who triumphantly reveals her sexual orientation. Thus, the digital video brings up explorations of identity, exemplified by, but not limited to, the act of coming out.
Over the course of the video, the students use the acting, sound, and visual effects to represent a storyline of a person revealing her sexual orientation. They also use written text mainly to substantiate and clarify a given thematic interpretation. Significantly, though, it is their use of sound elements that forms a central part of the digital video and, together with the other modes used, serves to narrate the story.

Creating Closeness and Taking a Stand
Analysis at the interactive level focused on how the relationship between the digital video and the viewer is bound by three key analytical concepts: contact, distance, and point of view. The students use various camera angles, camera movement, and shot types to create contact with the viewer by reducing distance, for example, by drawing the viewer closer (Höglund, 2017). However, sound elements are also used as part of this dialogue. Using the sound effect of the pounding heart in the third scene, for example, creates a sense of intimacy, both physically and emotionally; the viewer is brought in so close that she or he can hear the girls' heartbeats. The students also use sound effects to suggest a specific desired viewpoint, such as the use of applause that greets the text "Thanks to Karin Boye (who was gay)." In combination with the sound effect of applause, the comment in the written text not only suggests an attitude the viewer should take toward the issue but also clarifies the students' own standpoint. During the students' videomaking process (see Höglund, 2017), the students comment on the relationship between the church bells, marriage, and Finnish marriage law, which at the time of composition was subject of widespread debate in both media and politics. The students hence use sound effects as a means of taking a stand on a topical matter, as well as creating closeness and contact with the viewer. In this way, the analytical concepts of contact, distance, and point of view are closely intertwined.
In sum, the students use different semiotic resources to create contact and distance: variation in frame shots creates and reduces distance; the sound effect of a pounding heart evokes proximity and intimacy; and the inclusion of failed scenes serves as a metacomment on both the video and their working process. Their use of point of view allowed the students to foreground certain viewpoints and also take stand on issues related to homosexuality and marriage laws in Finland.

Creating Cohesiveness and Salience
Analysis at the compositional level focused on the composition of the video as a whole, specifically how written text, sound, scenes, and clips are structured to produce a cohesive ensemble. The analytical approach emphasized the structure and composition of both temporal and spatial aspects of the digital video. The analysis of the composition of temporal aspects focused on the semiotic rhythm: how the video is structured into a coherent piece. The analysis of the spatial aspects, meanwhile, focused on information value and salience, with particular attention paid to sound elements.
The digital video has a clear narrative structure and follows the typical conventions of exposition, conflict, rise of action, denouement, and coda. The coda is placed in what the students call the "failed scenes," which appear after the credits. These failed scenes are recorded, but unsuccessful, clips from the filming phase; the students use some of these clips as a metacomment partly on the videomaking process and partly on the poem and their interpretation of it.
As demonstrated above, three different scenes move the video ahead in narration and time and create a narrative structure. In the first scene, the viewer is confronted with the conflict: the students represent this by staging an intense, confrontational situation in which the girl faces rejection by her peers. This intensity is engendered through the disapproving actions of the group, together with the intimidating choice of sound effect and the voiceover recitation of the poem. In the second scene, resolution to the conflict is presented to the viewer; the third scene stages a somewhat open ending, at least from a narrative point of view, where the two girls grab each other's hands and run away in slow motion together to the sound of a beating heart.
Transitions play an important role in weaving together the various scenes and clips that make up the digital video. The most visible transition is in space: the locations and settings are different in all three scenes. However, there is also a transition in time, even though the length is not specified. The time that passes between the first two scenes-between the girl's announcement of her 'secret' (first scene) and the contact between the two girls (second scene)-is open for interpretation. The transition in time could be considered as minutes or weeks, for example. Technically, the transitions are mostly hard cuts, but the sound on the audio track connects the different scenes into a cohesive text. This cohesion is further reinforced by the voiceover reciting the poem throughout the three scenes.
The matching of the voice-over to the visual took up much of the students' time and effort. The students were meticulous about the pace and rhythm of the edits in matching the voice-over with the visual, adjusting the length of the clips to match the reading of the poem. They tested different alternatives, and Casper reread the poem aloud several times to sync the stanzas with the different scenes to get the accentuation the way they wanted, as well as to achieve the right sound quality on the recording of the reading.
Consequently, semiotic rhythm is established using the narrative structure, filmic compositions such as title page and credits, transitions in time and space, and particularly the voice-over recital of the poem, which hold the digital video together as a cohesive text. Notably, the coda is placed outside this semiotic rhythm, indicating its function as a metacomment that should be viewed distinctly from the main narrative.
Regarding the spatial aspects, the students use different semiotic resources to create salience, and this is especially noticeable in their meticulous editing work (see Höglund, 2017). For example, they use the special effect of slow motion to draw attention to the two girls as they reach out to each other and run off together in the closing scene (Höglund, 2017). The intimidating sound effect used in the first scene is a result of the students testing different elements; when Casper finds the intimidating and threatening sound effect, he instantly relates to the first scene portraying the rise of action. The students use of sound effects show their awareness of how auditory elements can establish tone and mood.
Consequently, the students use sound effects to foreground specific issues: including the use of the intimidating sound to evoke tension in the first scene; the sound of a beating heart to indicate care, intimacy, and possibly love between the two girls; the use of church bells to signify the connection to marriage; or the sound of applause as an interpretive standpoint on issues related to gay marriage and homosexual relationships in the Finnish context.

Discussion
These findings demonstrate the ways in which the students use various semiotic resources in their digital video and how sound elements represent a key meaning-making resource in the students' digital, multimodal response to the poem. Sound elements are used to make meaning in various ways throughout the digital video, including the threatening sound in the opening scene, the beating heart in the second, the church bells in the third, and the applause in the coda. The use of voice-over to recite the poem forms a central part of the digital video and, together with the other modes used, serves to narrate the story. Also, the students' use of sound elements serves as a means of reducing the distance to the viewer and of establishing tone, mood, and social meaning. Interestingly, the use of the heartbeat performs all these functions: it reduces the distance to the viewer by creating a sense of intimacy and closeness; it serves as a narrative device since it becomes louder as the girls grab each other's hands and run off together; and it ascribes social meaning, as it can be interpreted as indicating/supporting love between two people of the same sex.
The use of sound elements is also pivotal in creating salience; the students use sound effects to draw attention to several issues. Perhaps most significantly, they choose the sound of church bells to suggest a connection to marriage and the sound of applause as a standpoint on the issue of homosexuality. As mentioned earlier, at the time of composition the issue of same-sex marriage was subject of widespread debate in both media and politics. Considering the urgency and topicality of this issue at the time of composition, this use of sound is a clear example of topical commentary and social agency.
Previous research has demonstrated how meticulous students are in their digital work in response to literature; for example, they are acutely aware of the choices of video transition effects and juxtaposing images with lyrics (Carey, 2012;Jocius, 2013). This study adds to these previous findings by demonstrating how this meticulousness extends to the sound elements of student digital videomaking. In this study, the students were given tools to explore a poetic text while simultaneously exploring different semiotic resources in their interpretive work. The students went beyond literal meanings and co-constructed and negotiated the poem; they interpreted the poem in terms of intended and unintended decisions about use of sound, acting of the represented participants and linking of voice-over with image sequencing, and they showed awareness of how resources such as sequencing, framing, color, angle, transitions, and sound affected the meaning. In their use of an array of semiotic resources, and sound elements in particular, the interpretive work became a continuous negotiation, a trend that follows similar findings from previous studies in which digital video is seen to transcend mere illustration and move toward close reading and interpretation of new meaning (e.g., Curwood & Gibbons, 2009;Jocius, 2013;Miller, 2011;Smith, 2019).
However, students did not receive explicit instruction about the metafunctions, affordances of different semiotic resources, or how they can be combined for different purposes. Still, they showed a wide array of different ways semiotic resources could be used to make meaning. Importantly, most of the sound elements were explored and applied during the editing phase, stressing the importance of both the editing phase as well as the digital software and the availability of the tools for digital editing, that is, the multimodal remixing desk (see Burn & Parker, 2001;Gilje, 2015).
Given how little attention sound has been given in students' digital composition (Shanahan, 2012), and in response to poetry in particular, this study's findings contribute with valuable insights. This study illustrates how sound elements are an essential consideration of multimodal responses to literary texts; rather than a decorative add-on, they represent a key meaning-making resource in the poetry classroom. Following van Leeuwen's (1999) terminology in sound, the students often used sound elements positioned as Figure, that is, important and something that the viewer/listener must react to and upon, not something positioned in the background. Furthermore, the analysis shows how sound elements mostly was used to create the sense of intimacy or a personal relation; what is presented by the sound is regarded as one would regard someone with whom one is intimate or with whom one can discuss highly personal matters (see van Leeuwen, 1999).
Previous research has shown how student responses to poetry through digital, multimodal composition create a personal relationship with and ownership of the poem (Hirsch & Macleroy, 2020;McVee et al., 2008), and how these approaches can empower students to reflect on and extend perspectives on themselves and their lives critically (Curwood & Cowell, 2011;McVee et al., 2008). These points are further substantiated in the findings of this study. This type of instructional design offers students greater personal engagement in the poetry classroom, which represents a pressing matter in poetry pedagogy (e.g., Bowmer & Curwood, 2016;Creely, 2019). Multimodal composition in response to poetry can thus be viewed as a possible approach to poetry instruction that supports students' interpreting of literary texts and engagement with social issues as well as address issues related to social agency (Ajayi, 2015;Schmidt & Beucher, 2018).
This study demonstrates how videomaking enables student commentary on a social issue of topical interest and importance to them. In emphasizing their social agency, the students become active meaning-makers whose interests in a specific situation inform their situated use of semiotic resources. As well as being an issue commonly related to identity explorations among adolescents, the trope of finding and showing one's true self and the wider theme of homosexuality were, in this case, of particular topical interest in the context of the same-sex marriage debate in Finland. For the students, Video Poetry offered the opportunity to take a clear political stance on an issue important to them, in a display of their social agency. As emphasized in previous research, young people's use of digital media is not just about learning the tools, but to actively using these new tools to represent visions of themselves and the world (cf. Smirnov & Lam, 2019). The students showed a capacity as well as a willingness to act, to have influence, and to make substantiated claims for recognition (cf. Trauger, 2009).

Conclusions
This study furthers the understanding of multimodal composition in the poetry classroom by analyzing the use of semiotic resources among a group of eighth graders in their interpretive work from poem to digital video. It also shows us how students use multimodal composing strategies to engage with pressing matters. Its findings can support both researchers and educators in applying multimodal composition in poetry education. With a particular interest in the students' use of sound elements, this study highlights an aspect of multimodal composition in response to poetry that has not been targeted to any significant degree in the existing literature. In turn, this study points to the value of emphasizing the use of sound in the poetry classroom: Sound elements here represent a key resource in student multimodal composition in response to the poem. These findings open up exciting and fruitful possibilities to work with rhythm and sound-two of the key components of poetryin the poetry classroom, both in research and practice.
Studying how students explore poetry through a process of digital composition is also valuable in its recognition of how students represent what they know and what constitutes multimodal composition and writing more broadly. This study underlines the complexity of videomaking, both in terms of the breadth of representational resources the students demonstrate in their videomaking and the meticulous and deliberate choices that underscore their work. Ultimately, the study recognizes and distinguishes the students' use of a variety of semiotic resources and the considered, intentional nature of their work, particularly regarding sound elements. As such, this study adds to the dialogue on the importance of understanding student digital composition, what researchers has argued as a key part of contemporary writing (e.g., Gilje, 2015) and multimodal literacy (e.g., Smith, 2018).
This study offers methodological considerations for future research focused on adolescents' digital video composing, particularly regarding sound elements. However, methodological attention should be given to further develop ways of studying the complex aspects of adolescents' composing processes with regards to sound. Given the growing interest to include sound elements when analyzing children's and adolescents' multimodal composing (e.g., Wargo, 2017;Wargo & Clayton, 2018), this study suggests a continuing need to further explore and account for adolescents' use of sound in digital composition processes.
Finally, my hope is that this study can raise awareness, among researchers and educators, of the relevance of sound elements in students' digital composing and explorations of poetry. Also, and maybe even more importantly, I hope that this study can raise awareness of how the students' meticulous and various use of sound activated their social agency in commenting on a social issue of topical interest that was clearly connected to their daily lives. As researchers and educators, we must pay attention to issues that resonate with students and that connect to the broader social and cultural contexts in which they are composing. To do so is to listen to students, to amplify their voices, and to attend to the ways they engage with their own lived experiences in the world. Let us pay attention to the issues close to students' hearts and listen to their novel interpretations of the heartbeat of poetry.

Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author received no financial support for the research, authorship, and/or publication of this article.

Notes
1. The poem was originally written in Swedish and titled "Jag vill möta . . ." The English translation is by David McDuff. The translated version, as well as the original in Swedish, is available here: https://www.karinboye.se/verk/dikter/ dikter-mcduff/i-want-to-meet.shtml 2. This issue has changed since the study took place; same-sex couples can now legally marry in Finland.