Developing a Coding System for Sulking Behavior in Young Children

Children’s sulking behavior is a salient yet understudied emotional phenomenon. It has been hypothesized to result from hurt feelings, humiliation, and anger, and might thus function as a nonverbal measure in the behavioral studies of these emotions. We conducted three studies that served to develop a comprehensive coding system for children’s sulking behavior. The first study explored sulking features in an online survey that used parental and teacher reports. In an event-based parental diary study, we reevaluated the importance of each feature based on its frequency across episodes of sulking behavior and analyzed the time course of sulking episodes. Finally, we analyzed YouTube videos and demonstrated that the coding system could be reliably applied. We also determined a minimal number of necessary features as a classification threshold. The resulting coding system includes the following features: becoming silent, distancing, turning away, gaze avoidance, crossing arms, lowering head, pouting lips, lowered eyebrows, and, probably, utterances of illegitimate devaluation, and relational distancing. Thus, all varieties of sulking seem to have withdrawal from an ongoing interaction in common.


Introduction
In a preschool class, we observed the following interaction: Two 4-year-old friends, F. and M. are playing with a toy car. A conflict regarding who gets to hold the toy emerges. Suddenly, F., who is about to lose the conflict, crosses his arms. He then walks away from M. but does not begin a new activity. M. approaches F. and says, "You can have the car." After that, M. gives F. the toy, and F. resumes playing with M. This interaction is puzzling: Without words and without physical force, one child (F.) influences another child (M.) to hand over a beloved toy. F. manages to do this because of his particular behavior, which is referred to as pouting or sulking. Here, we use sulking as referring to a behavioral category and pouting as referring to a facial expression that typically occurs as part of sulking behavior. The Merriam-Webster (2020) Online Dictionary defines sulking as being "moodily silent," and Eibl-Eibesfeldt (1980) describes sulking as a form of withdrawal that signals "that the channels are closed for communication" (p. 68) and thus threatens to sever a bond. This threat might "give rise to efforts toward bond repair" (p. 68) or similarly, influence the other person "until she makes it up to the child" (Mendell, 2002, p. 189). Threats of withdrawal of affection provide us with a paradox as they seem to be used by individuals who feel dependent or are in a less powerful role than the "perpetrator" (Lazarus, 1991;Mendell, 2002). Accordingly, sulking seems to involve bluffing in some sense because breaking up the relationship would also lead to negative consequences for the sulking individual.
From an evolutionary perspective, we argue that sulking can be classified as a resource control strategy, which is one specific strategy that individuals use to gain social influence. Hawley (1999) classified them as either coercive (aggressing, insulting, and threatening) that is "without regard to peer evaluation and current and future social relationships" (p. 109) or prosocial (persuading, helping, and cooperating) aiming at positive relationships. Sulking, although, falls in neither of these categories, we argue. Although it involves threat as a coercive element, it makes sense only within relationships and thus depends on them. In contrast to coercive and prosocial strategies, however, it is an open question of how often sulking behavior helps individuals to achieve their goals and how it influences social status over time.
Apart from this specific question, sulking behavior has mostly escaped the attention of developmental research more generally, although sulking behavior is a pervasive emotional behavior in childhood. To investigate sulking as a type of behavior, to find out about its nature and social functions, and how its frequency is influenced by and influences important developmental outcomes (e.g., social status, likability), it is first necessary to establish a thorough description of it. A detailed coding system on sulking could not only allow us to study the development of anger, but also the development of neglected emotions such as hurt feelings and humiliation as researchers have hypothesized that sulking behavior results from these emotions (Eibl-Eibesfeldt, 1989;Hardecker & Haun, 2020;Lazarus, 1991;Mees, 1992;Mendell, 2002;Tremblay, 2000). As previous studies have shown, these emotions are, presumably, associated with the same appraisals as sulking: devaluation, unfairness, injustice, or illegitimacy (Elshout et al., 2017;Feeney, 2005;Fernández et al., 2015;Fischer & Manstead, 2008;Fischer & Roseman, 2007;Leary et al., 1998Leary et al., , 2006Lemay et al., 2012;Van Dijk & Zeelenberg, 2002). Moreover, a detailed coding system could potentially allow us to infer which of these emotions, in particular, is present. These emotions have essential communicative functions and seem to have a substantial impact on the regulation of social interactions (e.g., Walle et al., 2017). Concerning social development, these emotional reactions frequently occur in parent-child and peer-to-peer interactions (e.g., MacEvoy & Asher, 2012;Mills et al., 2002;Whitesell & Harter, 1996) and seem to have a tremendous impact on children's development (Holodynski & Friedlmeier, 2006). However, the literature has not provided comprehensive descriptions of sulking behavior although some studies have included sulking as a behavioral category (Cole et al., 2006;Shipman et al., 2003). Thus, in contrast to similar studies that have focused on the development of a coding scheme for a particular emotion (e.g., Tracy & Robins, 2004), we could not rely on extensive suggestions in the literature (see also Witkower & Tracy, 2018, for a recent review of available coding systems for behavioral expressions of emotions). We, therefore, selected a naturalistic approach, which allowed us to explore this behavior in its entirety. With the naturalistic approach, we also aimed to develop a coding system that would potentially allow us to reliably distinguish sulking behavior from anger (e.g., showing teeth, pressed lips, and energetic body movements), disappointment (e.g., corner of the mouth lowered, inner eyebrows up, and gaze downward), and shameful behavior (e.g., corners of the mouth lowered, rolling lips inward or biting the lips, and gaze avoidance) in children. It would thus provide a potential extension of EMOS-a coding system that allows identifying anger, disappointment, shame, and pride in young children (Holodynski, 1992(Holodynski, , 2006. As a part of our investigation, we first partook in preliminary naturalistic observations in which we collected possible features of sulking in spontaneously emerging interactions between children in day care centers. After that, we distributed an online questionnaire in which parents and teachers answered, based on which features they would recognize sulking. Using the data from an event-based parental diary study (Study 2), we determined the relevance of each feature and analyzed combinations and sequences in which they occur. Finally, to judge the quality of holistic ratings, to determine a minimum number of features that are sufficient to identify a behavior as sulking, and to show that those features can be reliably observed, we analyzed YouTube videos with sulking, disappointed, and angry children (Study 3).

Study 1: Parental and Teacher Reports Through an Online Questionnaire on Sulking
Because sulking behavior has not yet been empirically studied in detail, we constructed an online questionnaire, both in English and German, for parents and teachers of children between the ages of 1 and 8 years. Parents and teachers interact with children daily; therefore, we expected them to have a concept of sulking, which is grounded in extensive experience.

Method
Questionnaire construction. To create an initial list of potential features, we first observed sulking behavior in one of its natural contexts: day care center. One of the authors visited a day care center in a midsized German city with children aged between 1 and 6 years. We followed an opt-out parental consent procedure and none of the parents objected. We took part in a continuous behavior sampling, that is, we watched groups of children and noted what came across as sulking behaviors during a specified period (Martin & Bateson, 2007). Observations were coded live. Children were observed for 9 days over 3 months, for a total of 14 hr. In total, 14 potential episodes of sulking were observed, with 11 episodes among children aged between 3 and 5 years (3;0-4;11 years, months) and three episodes among children aged between 5 and 6 years (5;0-6;11 years, months). From those observations, we constructed a list of features that could be associated with sulking. This list formed our starting point for the studies that followed. Together with features of anger, disappointment, and shame, we created a first version of the questionnaire. During piloting, the questionnaire was improved and enriched based on suggestions from parents and teachers. Apart from the general hypothesis that sulking involves withdrawal from an interaction, we did not formulate specific hypotheses but provided participants with a large pool of items.
Participants. We distributed the online questionnaire through email and social media. More specifically, we posted the questionnaire in Facebook groups that included large numbers of parents or teachers. One hundred thirty-nine participants, comprising 107 parents (86 from Germany, 15 from the United States, and seven from other diverse countries, namely, France, India, Israel, Netherlands, South Africa, and the United Kingdom; 92 females and 15 males) and 31 teachers (13 from Germany, 1 from Austria, and 17 from the United States; 29 females and two males; with an average class size of 19.3 children [SD = 6.1]) completed the questionnaire over a predefined period of 5 months. Fourteen parents were excluded from the analysis because they either reported that their children had not sulked or that their child was older than 8 years of age. We informed the participants that their answers would remain confidential. Consent was obtained by virtue of completion and submission of the questionnaire.
Material. We asked parents how often their children would sulk on average (rarely, once every month, once every week, multiple times per week, once every day, or multiple times a day) and we asked teachers how many sulks they observed on average per day in their class (open question). Centrally, we asked parents and teachers how they would recognize a sulking child first in an open question, and then by presenting them a list of features with a total of 38 items. For each item, the answer options consisted of a 4-point scale (disagree, somewhat disagree, somewhat agree, and agree). Aside from a list of features of which some might be indicative of sulking, we included three positive control items ("is happy and open-minded," "laughs," and "jumps around") that informed us of whether participants had carefully filled in the questionnaire. Three test items could potentially falsify the general hypothesis that sulking consists mainly of withdrawal ("turns toward the one who caused him or her to sulk," "seeks eye contact," or "widely opens eyes"), four items were for anger ("lowers his or her eyebrows," "narrows his or her eyes to slits," "makes a fist," or "stomps his or her foot"), three items were for disappointment ("lowers the corner of his or her mouth," "slouches his or her shoulders," or "cries"), and six items that were typical for shame ("bites on his or her lips," "rolls his or her lips inward," "lowers the corner of his or her mouth," "sinks down [body posture slumps]," "hunches his or her shoulders," or "avoids eye contact"; Holodynski, 1992). Another open question in the questionnaire read as follows: "What are typical sentences your child says when he or she sulks?" A native English speaker and psychologist proofread the English version of the questionnaire.

Results and Discussion
Sixty-five percent of the parents reported that their child would sulk multiple times a week or more, 23% reported that their child would sulk once every week (n = 23), and only 10% reported that their child would sulk once a month or less frequently. Teachers reported that they observed 4.4 sulks per day on average (SD = 6.9). These results indicate that sulking is a quite frequent emotional phenomenon in children's lives.
Next, the first author and a research assistant coded the answers to the open question, "How would you recognize your/a child is sulking?" of 120 participants (parents and teachers) who had provided answers and split them into 438 versus 445 units first. A unit was defined as the smallest facial, gestural, vocal, or behavioral part of the answer and was largely derived from punctuation (commas, bullet points) or conjunctions (AND, OR) in the answers. The first author and the research assistant reviewed differences between their codings and the smallest units chosen (n = 458). Units that related to utterances (n = 36, Cohen's κ = .86), and units that were classified as too broad or vague (n = 139, Cohen's κ = .81) were excluded from the analysis. Of the remaining units, 18.7% (56) did not match any feature provided in the closed question of the questionnaire (Cohen's κ = .75). Of these, the following were reported more often than 3 times: throwing something or oneself at the floor (10), slamming the door (4), pausing/freezing (5), hitting (6), and shouting (5). Second, we grouped the answers for each item of the closed question into two categories (agree/disagree) and separately calculated one-sided binomial tests for both parents (n = 93) and teachers (n = 31), with the hypothesized probability of success at .5 and a significance level of p = .05. As this is the first study that explores the features of sulking, we did not want to increase the probability of Type II errors and therefore did not correct for family-wise errors. Results are represented numerically in Table 1 (except for the control items). The three positive control items resulted in 100% disagreement from both parents and teachers, suggesting that participants carefully filled in the questionnaire.
In addition, the control items that could potentially falsify the hypotheses that sulking involves withdrawal ("turns toward the one who caused him or her to sulk," "seeks eye contact," "widely opens eyes," and "leans forward") resulted in an average of 78% disagreement, which was significant for both parents (x = 72, n = 92, p < .001) and teachers (x = 21, n = 29, p = .012). The following facial and gestural features were significantly associated with sulking for both parents and teachers: pouting (pushes lower lip forward), pushes both lips forward, crosses arms, lowers corners of mouth, and lowers eyebrows. Notably, the latter two features are also characteristic of other negative emotions. The following behaviors were also significantly associated with sulking: turns away, turns head sideways, goes away, leaves the area, talks less, stops talking, talks with a strained voice, refrains from participating in joint activities, cries, lowers head, and avoids eye contact. The last three features are not specific for sulking as they are also indicative of disappointment and shame. However, those listed are consistent with the idea that sulking is related to withdrawing from an interaction. Interestingly, the results for six items were significant for teachers but not for parents, but no item was significant for parents but not for teachers. This suggests that teachers may have either a more detailed representation or a broader concept of sulking than parents. Among the items, the following features, which are typical for shame and disappointment, were only significant for teachers (see Table 1): hunches his or her shoulders, slouches his or her shoulders, and sinks down (body posture slumps). Furthermore, "narrows his or her eyes to slits" and "gazes upwards with head tilted toward the ground" were also only significant for teachers. The following items that are characteristic of shame and disappointment were not associated with sulking at all: rolls his or her lips inward, bites on his or her lips, and purses his or her lips.
Typical statements related to sulking, as reported in the online questionnaire, included 194 utterances in total, as provided by 103 parents and teachers. To construct categories, we used typical sentences, that is, sentences that appeared more than 2 times: "I want to/don't want to" (20), "That's mean/unfair" (19), "No" (15), "You're mean/unfair" (14), "Leave me (alone)" (18), "Go away"(11), " We are no longer friends" (9), "I won't play with you any longer" (4), and "I don't mind" (4). Thus, a sulking utterance expresses either 1. Perceived unfairness and devaluation-related to a situation, an action, or a person (e.g., "That is unfair," "You are unfair") or some kind of injustice-related statement (e.g., "It's your fault," "It's always me"). 2. Autonomy, protest, or defiance (e.g., "But I want . . .," "No"). 3. Relational distancing, that is, a threat to end the interaction or relationship with the perpetrator ("I will not invite you to my birthday") or demand for the other person to distance himself or herself ("Go away," "Leave me alone").
The first author performed the initial coding. One research assistant coded all of the sentences a second time. The interrater agreement was high (Cohen's κ = .89). The frequencies for these categories are shown in Table 2. Based on our findings, sulking seems to correspond with certain kinds of statements, two of which (illegitimate devaluation and relational distancing) are in line with theories on sulking (Eibl-Eibesfeldt, 1980, 1989Mendell, 2002).

Study 2: Event-Based Parental Diaries
We reviewed event-based parental diaries in part to reevaluate the importance of sulking features in the coding scheme and also to explore typical sequences of sulking features. Diary studies are, in general, high in ecological validity (Bolger et al., 2003) and reduce recall biases that might have been present in Study 1. An additional advantage of this method is that the sulking individuals (i.e., children) did not keep the diary themselves; instead, their parents were the observers. As such, this diary study is observational rather than self-reporting/introspective but thus also susceptible to observational biases.

Method
Participants. To address a broad range of potential participants, we laid flyers in six pediatrician practices, placed online advertisements on Facebook groups that primarily included parents, and phoned parents from the department's database. Within a predefined period of 6 months, we could recruit 23 German parents (21 mothers, two fathers) with a total of 40 children (M = 4;2 years, months; age range = 1;4-7;5 years, months; 17 females), who volunteered to participate in the study. The parents were highly educated (higher education [German Abitur] 87.5%; secondary school certificate 12.5%; 66% also had a degree from university). Accordingly, the sample of children was not representative concerning the educational background of the parents and constrained to a sample from one European culture. All parents who attended the introductions session stayed involved for the full duration of the study. We obtained written informed consent from the parents at the end of the introduction session.

Materials.
As there is no superior diary format (Takarangi et al., 2006), we decided to use paper diaries instead of digital diaries, thus following the advice of parents in the pilot study. The diary sheets contained 13 questions about the event (e.g., "Describe the situation immediately before the sulking started. What did elicit the sulking?" "Presumably, who did elicit the sulking in your child?" "How did your child react and if they were talking, what did they say?" "How did others respond to the sulking?" and "How did your child react next on their part?"). In one closed question, parents had to rate which of 48 features (behaviors, postures, and facial expressions) were present. We took these features not only from Study 1 (both from the open and the closed question), but also another coding scheme regarding body posture (Dael et al., 2012) and collegial feedback.
Procedure. Parents attended a 90-min introduction meeting in groups of three to seven individuals. We informed them that the study aimed at describing sulking behavior and that sulking appeared to be often associated with withdrawal. Furthermore, we notified them that we assume sulking to be different from disappointed, angry, and shameful behaviors, although it might often overlap with these. We carefully trained the parents to fill in standardized diaries sheets. First, we explained to them each question on the sheet. Second, we instructed them to use descriptive-observational language (e.g., "child goes away," "child looks away") instead of interpretative (e.g., "child wants to be alone," "child feels lonely") or normative-evaluative language (e.g., "child does something immature") for the open questions. Subsequently, parents completed an exercise in which they rated whether each of six sentences would count as descriptive or not. The mean accuracy rate was 82% per person. Third, we trained the parents to fill in the closed question on facial, postural, and behavioral features by presenting them a picture showing a disappointed adult. In the test phase, parents rated which features were shown by an adult who posed an angry expression. The mean accuracy rate per person in the test phase was 83%. We instructed parents to observe their children for a total of 21 days and to fill in, as soon as possible, a standardized diary sheet not only every time they identified their child as sulking but also when they were uncertain whether they should classify an instance as sulking. After they made their first entry, participants contacted the principal investigator and discussed the entry on the phone. At a final meeting shortly after the 21 days, parents were asked to assess their diary-keeping critically. They answered how easy or difficult it had been for them to recognize the sulking scenes, how easy or difficult the writing had been for them, how many observations they did not manage to write down, and whether they thought their diary-keeping had any effect on the child's sulking behavior. We stressed that adequate answers to these questions were of central interest in this study.

Results and Discussion
In this section, we analyze whether all candidate features of sulking are sufficiently independent of each other. We report the probability that a child shows a particular feature when sulking, as well as temporal patterns of sulking sequences.
Quality of diary keeping. On a 5-point scale ranging from 1 (very easy) to 5 (very difficult), parents reported that on average it had been easy to recognize sulking behavior (M = 2.1, SD = 0.91) and that it had been easy to write down what they had observed (M = 2.04, SD = 0.71). Four parents reported that they had not written down every episode they observed (n = 14 episodes). Three parents thought their diary-keeping had an influence on the frequency of the sulking of their children in the way that their children sulked less than usual. Furthermore, parents documented the event times and the time at which they wrote down the episode. They wrote down 92% (n = 98) of the episodes on the same day, 7% 1 day later (n = 8), and only 1% (n = 1) of the episodes 2 days later. The average time delay was 4:30 (hr:min), with a standard deviation of 5:08 (hr:min).
Overall, these cues indicate that parents seemed to maintain their diaries effectively.
Frequency of features. Ten of 40 children did not show any sulking behavior; eight of them, according to their parents, had never sulked before, presumably due to their young age. Thus, only children who had at least one sulking episode were included in the analysis. Eight episodes were dropped because they did not involve a single sulking behavior, which was significant in the first study. The 30 children included had an average of 3.8 episodes (SD = 3.1) in the 3 weeks of the study and a total of 107 episodes. Sulking was equally frequent across gender (r = −.02) but increased with age (r = .46). As the data were nested within individuals, we first calculated the mean frequency of each sulking feature for every child and averaged them across individuals. The following means thus represent the average likelihood for a child to show this feature in a sulking episode.  Table 1with the exception of the verbal features, which are represented in Table 2.
Patterns of event sequences of sulking behavior. Following up on the idea that the defining feature of sulking is a withdrawal from an ongoing interaction (Eibl-Eibesfeldt, 1980), we hypothesized that children's distancing would unfold and intensify over the sulking episode. Dynamic systems approaches have used state space grids to represent and describe dynamic time-course phenomena in a two-dimensional grid (Lewis et al., 1999). Although the diary episodes varied with length and did not entail specific time and duration codes, the diary protocols were prestructured in sequential order of discrete behavioral units. This structure allowed us to describe the succession of sulking behaviors with time as an ordinal variable.
Episodes were composed of 1 to 4 segments (M = 2.1, SD = 0.76). We coded each segment along the following two dimensions of sulking at the behavioral level: First, the spatial-geometrical dimension, which we defined as the physical alignment of two individuals in face-to-face interaction. It relates to two questions: Are the heads and bodies of two interaction partners directed toward each other? How close to each other are the interaction partners positioned? We coded the following levels: (a) interaction aligned faceto-face, (b) vertical cutoff: lowering the head and/or lowering the gaze, (c) horizontal cutoff: turning away, and (d) global cutoff: going away. We perceived levels with a higher number as more distanced than lower numbers. Second, the communicative dimension, which included two levels: (A) uttering sulky sentences and (B) becoming silent. Here, B was defined as more distanced than A. One sequence with three segments was then coded, for example, as A1 (uttering sulking sentences, looking at the other person), B2 (becoming silent, lowering head), and B3 (becoming silent and turning away). Intercoder agreement was high (κ = .86). Figure 1 represents the frequencies of transitions that occurred more than once.
To test our hypothesis that sulking comes along with distancing, we scored each episode by adding up the distance changes. The communicative distance from A to B was coded as 1; the spatial-geometrical distance according to the natural numbers, for example, from 1 to 2, was coded as 1. The sequence A1 to B4 was thus, for example, coded as +4, the sequence A2 to A1 as −1. One-sample Wilcoxon signedrank tests showed that distancing scores per episode were significantly higher than zero, both at the geometric dimension (V = 1,805, p < .001) and at the communicative dimension (V = 1,170, p < .001), as well as taken together (V = 2,361, p < .001). These findings support the hypothesis that sulking involves a tendency to withdraw, which unfolds over time. Note. x-axis: Spatial-geometrical distance, y-axis: Communicative distance. The size of points represents the frequency of this category across all episodes; the width of arrows represents frequency of each transition across episodes. Solid lines represent transitions that increase relational distance; dotted arrows represent transitions that decrease relational distance. Transitions that occurred only once are not depicted.

Study 3: Applying the Coding System: A Validation Using YouTube Videos
To evaluate whether the candidate criteria for the coding scheme at this stage were reliably applicable to real sulking episodes and to specify our coding scheme in a way that allows for the identification of sulking, we conducted a YouTube video analysis. The latter task requires identifying "true" sulking behavior, however, without having criteria for this identification. It is the very goal of this coding system to allow for a reliable identification of sulking behavior based on explicable criteria. Thus, we identified "true" sulking by intuitive-holistic codings of the first author and naive coders.
We recognized that using unstandardized internet videos could potentially yield many problems: unclear selection bias, differences in video quality, or high variability of contexts. On the contrary, such videos have high ecological validity as well as high variability. Thus, they served as an initial critical test for our coding system. YouTube videos have previously been the object of scientific investigations (e.g., Lewinski, 2015), and several studies across scientific disciplines have incorporated YouTube as a data source (Giglietto et al., 2012;Packheiser et al., 2018).

Method
Video selection. The first author searched YouTube for the following emotion terms combined with "child" or "kid": "sulking," "pouting," "disappointed," "angry," "mad," "furious," and "ashamed," and selected those videos that showed real children in negative emotional states (n = 60). We stopped our YouTube search after 150 videos (three pages, each with 50 videos) when no further results and no relevant suggestions appeared. Ten videos were dropped due to their low quality, short duration of the video, the young age of the child (<2 years of age), or because the video involved a performance. The average duration of the videos was 52.76 s (SD = 30.13). The different languages spoken in the videos (62% English, 10% Asian, 6% European Languages, and 22% unclassifiable) implicate at least some cultural diversity in the videos.
Coding. Researchers have often found that naive coders do well when classifying emotions based on their holistic impressions (e.g., Camras et al., 1988). Thus, we used the headings of the YouTube users as one naive rating, and two naive research assistants coded the material holistically as well. These assistants were unaware of the aims of the study. The first author did another holistic coding, which was considered as that of an expert based on his general training in psychological research and his expertise in theories of sulking and emotional development. The categories of the holistic coding scheme included "sulking," "anger," "sulk-anger blend," "sadness/disappointment," "desperation," and "no emotion." The holistic coding was necessary to determine which videos were "truly" sulking and which were not. In the naive consensus, a video was classified as sulking when at least two of three coders rated the video as sulking or as a sulk-anger blend. We subsequently referred to the result as the naive rating. We refer to the holistic coding of the expert alone as an expert rating. Apart from the holistic coding (the naive and expert ratings), which determined the "true" sulking scenes, we performed an analytic coding using the investigated features of Studies 1 and 2. We then summarized several items (e.g., distances himself or herself, leaves the area, withdraws, gaze avoidance, and lowering of gaze) into categories and included more specific descriptions of the categories. Gaze avoidance, for example, was only coded if it lasted a minimum of 2 s; it was not coded when a child turned away or left altogether. Going away/distancing was only coded when it was not shown in an energetic way. Those adaptations were, in part, made after a first round of codings done by the first author and a research assistant, during which they used differences to disentangle and clarify the categories. Afterward, the first author and another research assistant coded every video about the presence (yes/no) of each feature.

Interrater reliability
Holistic coding: Consensus rating and expert rating. For all four coders, the interrater agreement for sulking versus non-sulking was moderate (Fleiss' κ = .57). Notably, if the agreement had been substantial, the need for an analytic tool for the identification of sulking behavior would have been weakened. We concluded that sulking is not always apparent to naive coders and is thus in need of a clear behavioral definition. Therefore, a coding system that goes beyond holistic coding is a necessary tool for scientific investigations of sulking. Interrater reliability in all of the emotion categories was equally high (Fleiss' κ = .56). The naive raters classified 15 videos as sulking, 13 videos as angry, 15 videos as disappointed, two as a mix of sulking and anger, two as desperation, one as no emotion, and two as nonagreement. The expert classified 24 videos as sulking, nine videos as disappointed, nine videos as angry, three videos as a mix of sulking and anger, one video as desperation, and one video as no emotion. For the subsequent analysis, we merged the sulking category and the sulking-angry blend category.
Analytic coding. For the analytic coding, we first eliminated categories that came up fewer than 4 times in the videos (inward roll of lips, slouched shoulders, utterances of illegitimate devaluation, or relational distancing) or that we could not reliably observe (Cohen's κ <.4: strained voice, sulking-related prosody, or hunched shoulders). We could observe all other features with moderate to high interrater reliabilities (Cohen's κ >.59). Table 1 lists all interrater reliabilities.

Sulking features.
To determine which features corresponded with sulking, we calculated the odds ratio for each item using Fisher's exact test for small sample sizes, based on the consensus rating as well as on the expert rating. An odds ratio of 1 meant that an event was likely to occur whether or not a particular feature was present. We, therefore, included the items that were significantly higher than 1 in the naive or the expert rating. Note that these items were the same for both ratings except for "glaring at" (cf. Table 1). We calculated a correlation matrix and looked for highly correlating features to check for the independence of features. Non-responsivity correlated highly with other features (e.g., becoming silent, gaze avoidance, and distancing). Therefore, we determined that non-responsivity might be a higher order concept that has becoming silent as the main feature. Thus, we excluded non-responsivity from the following statistical analyses.
Classification threshold. Ultimately, the coding system is intended to classify episodes of behavior as "sulking" or "not-sulking." The framework of signal detection theory is concerned with measuring the performance of differentiation and allows for the calculation of thresholds for such binary classifications (Stanislaw & Todorov, 1999). Usually, there is no optimal threshold, so the best choice is a trade-off between the relevant criteria that minimize the false positives as well as the false negatives. Such trade-offs can be visualized using receiver operating characteristic (ROC) graphs (Fawcett, 2006). More specifically, we drew sensitivity/specificity plots in R (Version 3.3.2, R Core Team, 2018) using the package ROCR (Sing et al., 2005), by using the expert rating alone as well as by using the naive rating. The ideal point is defined as specificity = 1.0 and sensitivity = 1.0. As can be seen in Figure 2, the optimal cutoff in the expert rating was four features with specificity = 0.91 and sensitivity = 0.79. In the naive rating, the optimal cutoff was at five features, with specificity = 0.85 and sensitivity = 0.76 having the shortest distance to the ideal point. According to these results, the set of videos classified as sulking by the expert can be better predicted based on the sulking features than the set of videos classified as sulking by the naive raters.
To test whether combinations of fewer than four features have high diagnostic value, we performed a multi-model comparison in which we compared models with different numbers of sulking features as predictors. More specifically, we ran generalized linear models with a binomial error structure and logit function (McCullagh & Nelder, 1989). For all combinations of features that had an odds ratio significantly greater than 1, we built models that incorporated the features as categorical predictors (distancing, turning away, gaze avoidance, lowering head, becoming silent, crossing arms, pouting, lowered eyebrows, and glaring at) and sulking versus not sulking as dichotomous responses (as determined by the expert rating). The simplest nine models had only one feature as a predictor. For all combinations of two predictors, we built 36 models (two out of nine features), and so on. Thus, we fitted a total of 512 binomial models in R (Version 3.4.4;R Core Team, 2018). Figure 3 plots the prediction accuracy against the numbers of predictors for all 512 models using the expert rating. It shows that having more than any four features (predictors) does not substantially improve the prediction accuracy. Among the models containing fewer than four predictors, there are already some models that perform quite well. This  supports further explorations of whether more standardized videos would allow researchers to classify an episode as sulking with only two or three features and more detailed investigations of the relative importance of each feature.

Discussion
For our particular purpose, YouTube videos were a good source for the initial validation of our coding system. First, without extensive training, we learned which features can be observed reliably and which cannot (e.g., strained voice, movements of the shoulders). Again, failing to observe certain items reliably confirmed that this study was essential. Second, we were able to develop suggestions for a classification threshold. According to the expert rating, four features, and following the naive rating, five features are required. One might argue, however, that YouTube videos are likely to be rather extreme and thus that we underestimated the classification threshold. This assumption appears yet to conflict with the moderate consensus between holistic ratings. If the videos had been more extreme, we would have expected a higher consensus between the ratings.
We argue that better video material could allow one to infer sulking based on even fewer features as it is the case for other emotion expressions (Holodynski & Friedlmeier, 2006). It must be kept in mind that the sample is likely to be biased and that we may, for example, have failed to observe certain facial expressions when children were not facing the video recording device or because of low-quality videos. Moreover, despite the quality restrictions, several videos were not clear enough to code facial features. Thus, it is likely that "pouting lips" and "lowered eyebrows" were present more frequently than we could observe here and, based on the strong results of previous studies, we included "lowered eyebrows" although the odds ratio did not achieve significance. Furthermore, we could not observe verbal features that appeared to be central in Studies 1 and 2 (utterances of illegitimate devaluation, utterances of relational distancing). Thus, their role in identifying sulking needs to be further explored. Due to these limitations, future studies should also validate the coding system using standardized videos from naturalistic settings or laboratory experiments.

General Discussion
These studies provide a new analytic coding system that can be used to assess children's sulking behavior systematically. This system can be found in the online appendix. In summary, our final coding system includes the following nonverbal features: gaze avoidance, distancing, turning away, arms crossed, silencing, head lowered, lowered eyebrows, pouting lips (lower lip pushed forward, both lips pushed forward), and glaring at. The commonality of the many features tested seems to be an accentuated withdrawal from interaction (Eibl-Eibesfeldt, 1980): features that lead to the breaking of eye contact such as turning away, turning the head sideways, or lowering the head. Most behavioral features, such as physically distancing, turning away, speaking less, and gaze avoidance, can be seen as reducing physical-geometrical or psychological-communicative contact, which is in line with theories of sulking (Eibl-Eibesfeldt, 1980, 1989Mendell, 2002). Utterances of illegitimate devaluation and relational distancing are also in line with these theories but need to be validated yet.

Limitations
One apparent limitation of all three studies concerns the sample sizes. Study 1 involved a total of 124 subjects, Study 2 was based on the sulking episodes of 30 children who were observed by 21 parents, and the third study involved 50 YouTube videos for validating the coding system. Apart from the sample sizes, at least the sample of the diary study was not representative as most of the parents were highly educated.
However, and apart from the sample sizes, Study 1 drew on two different groups of participants (parents, teachers) and found a substantial overlap between both groups. Interestingly, teachers declared to recognize sulking based on some more features than parents. Teachers might have a broader concept that differs qualitatively from the concept of parents, or they might infer sulking based on more features. Based on the substantial overlap between parents and teachers, the latter interpretation seems promising. As teachers observe more children and thus have a richer database than parents, the features that were significant for the teachers only should be investigated further. Study 2 relied on the detailed observations made by parents, mostly mothers, and one might argue that these observations were likely to be biased. Although we cannot rule out observational biases, we trained parents intensively and closely supervised their observations.
Despite the limitations of our studies, the data themselves speak a fairly clear language: In Study 1, several of the sulking features were significant at α < .001 (see Table 1), and the results of Studies 2 and 3 also correspond with the results of Study 1. Thus, taking the three studies together, they provide us with a certain level of confidence in our coding system. Moreover, they are consistent with the theory of sulking behavior as withdrawal from ongoing interaction. Nonetheless, it is inevitable for future studies to examine cultural variations of sulking and the specific issues discussed in the following section.

The Cutoff Value of the Coding System
The expert rating in the YouTube study suggests that we need any four sulking features to classify a behavioral episode as sulking. According to the naive rating, we need five sulking features. Fewer features would be desirable, and we are optimistic that we can set up a lower threshold in future studies because YouTube videos turned out to be difficult material and served as a rather conservative first test of the coding system. We could not always adequately identify facial expressions, and some features likely have a higher predictive value than those found here. Whether one needs four features to classify a behavioral episode as sulking indeed is likely to depend on the specificity of each feature.
However, the appropriateness of the classification threshold might depend on the research context and the relative importance of sensitivity versus specificity. Consider a study that investigates the ontogenetic beginnings of sulking behavior. Either such a study might aim to show the presence of sulking at a given age with high certainty, or it might aim to detect early subtle forms of sulking. In the first case, a stricter classification threshold might be appropriate (five features) and, in the second case, a more relaxed one (four features or even less).

The Specificity of the Features
Some features are certainly shared with other emotions, whereas others seem to be highly specific for sulking. Gaze avoidance, as well as head lowering, can relate to shame and disappointment and are thus not very specific. On the contrary, glaring at and looking while head is lowered could be specific sulking-related gaze behaviors. Unfortunately, "glaring at" was not included in Study 1, did not frequently occur in Study 2, but reached significance in Study 3. Thus, evidence for this behavior is still weak. Furthermore, it would be essential to explore whether gaze avoidance and glaring at occur in the same sulking episodes sequentially or oscillatory, or whether they occur mutually exclusive.
Highly specific features seem to be arms crossed and pouting. They were strongly agreed on in Study 1 (see Table  1) and appeared in more than 38% of episodes in the diary study. Nevertheless, while arms crossed also had a high odds ratio in the YouTube study, this was not true for pouting lips. Nonetheless, pouting lips are generally seen as a means to communicate one's resentment in a confrontational way (Kottonau, 2010) and thus should be viewed as specific. From our perspective, pouting lips rarely occur as part of other emotions. Furthermore, lowered eyebrows were a strong feature in Studies 1 and 2 but had a nonsignificant odds ratio in Study 3. The latter might be due to them being also part of the anger display and thus might not be exclusively related to sulking.
The vocal aspects of sulking should be studied in more detail. As we found, parents and teachers agreed that there is such a specific vocal pattern (Study 1) although we could not reliably code for it (Study 3). The other general behaviors (including verbal utterances) have no counterparts in the descriptions of other emotional behaviors so far (cf., Holodynski & Friedlmeier, 2006;Witkower & Tracy, 2018); thus, we cannot compare them. Nonetheless, distancing, turning away, and becoming silent had high odds ratios in the YouTube video analysis, indicating that they are quite specific for sulking. Crying and energetic movement (such as stomping foot, throwing things) seem to occur in sulking episodes frequently, but seem rather unspecific. Because the videos did not involve shame, work is needed that compares shame and sulking. Finally, of the verbal utterance that frequently appeared in Studies 1 and 2, we could only validate utterances of autonomy in the YouTube analysis and found that they were not specifically related to sulking. Utterances of illegitimate devaluation and relational distancing might be more specific to sulking based on the theory. However, due to their rare occurrence in the YouTube analysis, they still need to be validated.

Inferring Specific Emotions
Future studies should also investigate whether specific emotions could be inferred based on more detailed codings of sulking features. For example, it seems promising to differentiate sulking that results from anger from sulking that results from feeling hurt. In general, anger is associated with a tendency to approach (Carver & Harmon-Jones, 2009) but with a similar communicative function: forcing someone to change to achieve a better outcome (Fischer & Roseman, 2007). As sulking involves the opposite of an approach tendency, it could accordingly only reflect inhibited anger (Lazarus, 1991). Likely, angry sulking behavior could thus be inferred from the specific quality of the behavior, for example, based on tension and speed (Dael et al., 2012;Witkower & Tracy, 2018). Otherwise, we could potentially infer hurt feelings, humiliation, or person-related disappointment. (We argue elsewhere that these three emotions are largely identical (Hardecker, 2019).) The latter has also been associated with withdrawal and with a similar communicative function: to elicit guilt and reparation from the perpetrator (e.g., Hardecker & Haun, 2020;Vangelisti & Sprague, 1998;Lemay et al., 2012). Future studies should also investigate the gaze dynamics of sulking and whether more finegrained details of the gaze behavior would be sufficient to infer the underlying emotion.

Conclusion
Across all three studies, we developed a coding system (see the online supplemental material) and determined a set of sulking features at the behavioral, facial/postural, and verbal level, and provided an overview of the relative importance of each feature (Studies 1 to 3). The common denominator of the features was withdrawal from an ongoing interaction in Study 1, and Study 2 showed that sulking sequences involve an increase in relational distance. Study 3 informed us that across YouTube videos, four features were needed to classify an episode as sulking. At this stage, the coding scheme provided here can help other researchers observe sulking behavior in a more reliable way. Future studies should investigate certain features of sulking in more detail (gaze behavior, sulking prosody, eyebrows, and position of shoulders), should clarify the specificity of particular features (utterances of illegitimate devaluation and relational distancing, lowered eyebrows), and should integrate the temporal sequences of sulking into the coding scheme. Our coding system could potentially help researchers discover more about sulking behavior, as well as about hurt feelings, humiliation, and anger.