Linguistic Studies on Social Media: A Bibliometric Analysis

This study aimed to present the status quo of linguistic studies on social media in the past decade. In particular, it conducted a bibliometric analysis of articles from the field of linguistics of the database of Web of Science Core Collection with the aid of the tool CiteSpace to identify the general characteristics, major strands of linguistics, main research methods, and important research themes in the area of linguistic studies on social media. The main findings are summarized as follows. First, the study reported the publication trend, main publication venues, researched social media platforms, and languages used in researched social media. Second, sociolinguistics and pragmatics were found to be major strands of linguistics used in relevant studies. Third, the study identified seven main research methods: discourse analysis, critical discourse analysis, conversation analysis, multimodal analysis, narrative analysis, ethnographic analysis, and corpus analysis. Fourth, important research themes were extracted and classified based on four dimensions of the genre framework of social media studies. They were the participation nature and technology affordances of social media in the dimension of compositional level, the researched topics of education, (language) policy and politics in the dimension of thematic orientations, the researched discursive practices of (im)politeness, humor, indexicality and multilingualism in the dimension of stylistic traits, and the researched communicative functions of constructing identity, communicating (language) ideology, and expressing attitude in the pragmatic dimension. Moreover, linguistic studies on social media tended to be characterized by cross-disciplinary and mixed-method approaches.


Introduction
Social media "usually refers to any application or technology through which users participate in, create, and share media resources and practices with other users by means of digital networking" (Reinhardt, 2019, p. 3). Though social media is believed to be conceptually related to social networking sites (SNSs), and the two terms are considered to be used interchangeably, it is widely acknowledged that social media is a broader term, and SNSs, "specially associated with the use of sites such as Facebook and MySpace," is regarded as a type of social media (McCay-Peet & Quan-Haase, 2017, p. 15). Other forms of social media include message boards and discussion forums, blogging platforms, microblogging services (e.g., Twitter), media-sharing sites (e.g., YouTube), and instant messaging services (e.g., SMS). Social media has become a platform for social interaction among individuals to communicate, entertain, and share, as well as to help promote social bonds or ambient affiliation (Zappavigna, 2012). Moreover, it has also become a communication tool for corporations for engaging with the public, allowing organizations "to contribute to the creation and maintenance of both their identities and their reputations" (Huang-Horowitz & Freberg, 2016, p. 196).
Social media has become one of the most discussed topics in various fields, such as communications, economics, and information technology (Hjorth & Hinton, 2019). Communication studies on social media have focused on impression and relation management (Bazarova et al., 2013;Benthaus et al., 2016), and privacy protection (Lankton et al., 2017). In the area of economics, studies have been conducted on the use of social media to report financial results (Alexander & Gentry, 2014), and on the impact of information disclosure on corporate performances (Cade, 2018;Schniederjans et al., 2013). Research from the perspective of information technology has approached information and emotion detection using computer-mediated technologies (Gründer-Fahrer et al., 2018;Misopoulos et al., 2014). Since "activities on social media primarily consist of language use" (Gnach, 2018, p. 195), social media has also become the research focus of linguistics .
Though some focused reviews have been conducted, such as socioculturally informed studies on language and new media (Akkaya, 2014), and the application of conversation analysis in online talks through social media (Paulus et al., 2016), it seems that a comprehensive review of linguistic studies on social media (henceforth abbreviated as LSSM) is in need. Therefore, the article intends to conduct a bibliometric analysis of articles published in the field of linguistics in the database of Web of Science Core Collection with the aid of CiteSpace to present the status quo in the area of LSSM. More specifically, the following research questions are to be addressed in the present study.
1. What are the general characteristics (the publication trend, publication venues, researched social media platforms, and languages used in researched social media) in the area of LSSM? 2. What are the major strands of linguistics in the area of LSSM? 3. What are the main research methods in the area of LSSM? 4. What are the important research themes in the area of LSSM?

Methodology
The bibliometric analysis "provides a quantitative method for reviewing and investigating extant literature in a given field" (Mou et al., 2019, p. 221). CiteSpace used in the study, one of the most popular bibliometric tools, is "a Java application for analyzing and visualizing co-citation networks" (Chen, 2004, p. 363). It offers various analyses represented by reference journal analysis and keyword analysis that help researchers identify current and future research trends in a field (Mou et al., 2019). For example, Zhang et al. (2015) generated keywords using CiteSpace to identify research foci of social media studies with the support of visualization of references and topics. First, the database of Web of Science Core Collection was retrieved on December 11, 2020 with queries presented in Table 1 to extract the bibliometric information. To be specific, the purpose of the retrieval was to search the articles related to social media studies in the area of linguistics that were published in English between 2009 and 2020. A total of 794 articles were identified by the queries and their bibliometric information, including article titles, journal titles, publishing years, keywords, abstracts, citation, etc., was downloaded for the follow-up analyses.
Second, the general characteristics in the area of LSSM were detected based on the bibliometric information after using the "analyze results" function of the database of Web of Science Core Collection. It includes the publication trend, main publication venues, researched social media platforms, and languages used in researched social media.
Third, the co-citation network and co-citation frequency list of journals were generated by using CiteSpace to process the imported bibliometric information to identify the major strands of linguistics for researching social media.
Fourth, the keyword list of 394 keywords (Table 2 listing those occurring at least 10 times) was produced also using CiteSpace to pinpoint main research methods of LSSM. The keyword list was searched based on terms of research methods in linguistics and applied linguistics listed in Litosseliti (2010) and McKinley and Rose (2020), such as discourse analysis and corpus analysis, to identify main research methods used in LSSM.
Fifth, the keyword list was further investigated to reveal important research themes. Keywords that occurred at least 10 times (freq ≥ 10) were retained (40 of 394 keywords) as they may best index important research themes. Next, we excluded three types of keywords: (1) meanings of which are too general to be considered as research themes, such as "communication" and "people"; (2) meanings of which are related to research targets of this study, such as "language" and "social media"; (3) meanings of which are related to other research questions, such as "Twitter," "English," and "critical discourse analysis." Then, 18 keywords (words in bold fonts in Table 2) were left for further analyses, together with their semantically relevant keywords (freq < 10), such as "humor" (freq = 11) and "sexist humor" (freq = 2).
The 18 keywords were then categorized based on four dimensions of the genre framework of social media studies (Lomborg, 2014). The framework is adopted because it entails a relatively comprehensive linguistic analysis of social media as emerging communicative genres, "a focus on studying the communicative purpose as manifest in genre conventions of form, style, and content, in recurrent communicative situations or texts" (Lomborg, 2014, p. 26). The first dimension of compositional level examines "the network composition, structures of participation, and activity levels in concrete instances of social media," and analyze "the social organization of communicative practices on social media" (Lomborg, 2014, p. 31). The second dimension of thematic orientations explores "the predominant topics of communication and the associated relevance structures as negotiated and regulated by patterns of responsiveness" (Lomborg, 2014, p. 31). The third dimension of stylistic traits addresses "the specific tone that participants must master in order to enact the genre competently, and be recognized and validated by fellow participants as relevant peers and members, in the ongoing negotiations of genre" (Lomborg, 2014, p. 31). The tones are indicated by "choices about a set of related but distinct conversational practices" (Tracy & Robles, 2013, p. 173), such as speech acts, politeness, and humor. The fourth pragmatic dimension identifies "the communicative functionalities and social practices that characterize the uses of a specific genre" (Lomborg, 2014, p. 31).

General Characteristics
This section presents the publication trend, main publication venues, researched social media platforms, and languages used in researched social media. Table 3 displays the number of peer-reviewed journal publications by year between 2009 and 2020. A linear regression model was fit and results showed a significant increase in the number of articles published in the examined decades (F = 163.711, p = .000, R 2 = .942, Adjusted R 2 = .937). The publication trend ( Figure  1) shows that studies of social media have been receiving increasing academic attention in linguistics, especially after the year of 2017. Table 4 lists the titles of journals that published at least ten relevant articles. Most journals are leading ones in the area of linguistics, such as Journal of Pragmatics and Journal of Sociolinguistics. Some involve cross-disciplinary studies of linguistics and politics, education, psychology, and anthropology among others, as journal titles suggest, such as Journal of Language and Politics and Journal of Language and Social Psychology. Table 5 lists frequency counts of keywords labeling social media platforms and languages used in them. Twitter, Facebook, and Blogs as the most frequently researched platforms are mainly text-based platforms, and thus language use has been the research focus of LSSM, despite the emerging influence of video and photo sharing applications such as YouTube and Instagram. In terms of most frequently researched languages used in social media, it is not surprising that English predominates (with a frequency count of  2009  22  2010  38  2011  36  2012  37  2013  42  2014  55  2015  54  2016  73  2017  70  2018  92  2019  138  2020  137  Total  794 48), not only because it is the lingua franca in social media, particularly in US-based platforms such as Twitter and Facebook, but also because nearly half of 794 articles were published in English-speaking countries including US, Great Britain, and Australia.

Major Strands of Linguistics
"Journal co-citation analysis can be used as an operational indicator for the discipline organization of the sciences" (Hu et al., 2011, p. 658). In visual analysis of the research area of social media and government trust, for instance, three  sub-disciplines (management, environment science, and political science) were identified using journal co-citation analysis (Tong & Song, 2021). The journal co-citation network and journal co-citation frequency list were generated by CiteSpace as shown in Figure 2 and Table 6. Except for the journal Thesis (see Figure 2), the articles of which cannot fall into the domain of linguistic research, the other top six co-cited journals (see Table 6) can be generally categorized into two strands: sociolinguistics and pragmatics. This general division can be confirmed not only by the titles of the journals, such sociolinguistics-oriented journals as Journal of Sociolinguistics, Language in Society, and Discourse & Society and the pragmatics-oriented journal as Journal of Pragmatics, but also by the keywords shown in Table 2, such as "identity," "gender," "ideology," and "humor," which are topics frequently explored in sociolinguistics and pragmatics. A more in-depth analysis based on this categorization reveals that sociolinguistic studies have mainly investigated language changes induced by social media, and the relationship between language variation and society in the digital world. For example, De Decker and Vandekerckhove (2017) investigated the extent to which social variables of gender and age correlate with four linguistic features in social media communication, that is, flooding, leetspeak, abbreviations, and cluster reduction.
Pragmatics, as our analysis reveals, has also played an increasingly important role in social media studies. For instance, Carr et al. (2012) has examined the use of speech acts in the status messages of Facebook and found that expressive speech acts were most frequently used to construct the status messages, followed by assertives, and humor was integrated into almost 20% of the status messages. Table 7 provides an overview of the main research methods used in LSSM and their classification based on Litosseliti  1  Twitter  39  1  English  48  2  Facebook  30  2  French  10  3  Blog  10  3  Chinese  9  4 Instagram 6 4 Spanish 5 5

Main Research Methods
YouTube 5  (2010) and McKinley and Rose (2020). Among them, qualitative methods include discourse analysis, multimodal analysis, critical discourse analysis, conversation analysis, narrative analysis, and ethnographic analysis. Discourse analysis, for example, was adopted by Zappavigna (2015) to investigate the use of hashtags on Twitter, and findings showed that they can serve experiential and interpersonal linguistic functions at the level of lexicogrammar, and enact metacommentary at the level of discourse semantics. Critical discourse analysis (discourse-historical approach) was used by Kreis (2017) to explore the meaning and function of Trump's discursive strategies on Twitter to find that Trump used an informal, direct, and provoking communication style, and employed positive self-presentation and negative other-presentation to construct right-wing populist discourses. Some studies go beyond language symbols and examine non-textual modes as well with multimodal analysis. Texts and photos, for instance, were examined by Hunt (2015) on the Facebook page of a diabetes organization. The results showed that the textual mode, such as personal pronouns, enhanced social relationships among the organization, community members, and the audience, and the visual mode of photos provided representations of successful selfmanagement, charitable volunteering, and need of emotional support of the diabetic individuals. Themistocleous (2015) applied conversation analysis to explore the structure of digital code-switching between Cypriot and Standard Greek and how it was influenced by the medium-and social-specific characteristics of internet relay chat. Narrative analysis and ethnographic analysis tend to investigate the broader issues in social media studies.
Page's (2012) exploration of storytelling styles in the Facebook status updates with longitudinal narrative analysis revealed that self-reports were a dominant type of activity, accounting for approximately 70%, followed by projections and shared stories. Back's (2013) ethnographic analysis of the use of Portuguese on Facebook and the Portuguese level of Brazil students before and after studying abroad found that social media tools, such as Facebook, are playing an important role in improving their Portuguese and helping the acquisition of social media terms in Portuguese.
The quantitative method, corpus analysis, was used by Hardaker and McGlashan (2016) for example, to investigate how sexual aggression is enacted and spread on Twitter by focusing on the frequency, collocation, and keywords. Table 8 presents results of grouping keywords based on four dimensions of the genre framework of social media studies (Lomborg, 2014) as mentioned above, which reflect important research themes in LSSM.

Important Research Themes
Compositional level. In the dimension of compositional level, two important themes are the participation nature and technology affordances of social media. "The potential for users to participate in interaction and to contribute content is perhaps the main defining characteristic of social media" (Landert, 2017, p. 31). In the circulation of visual small stories (iconic image of Alan Kurdi) on Twitter, users are found to participate in sharing and spreading the story by commenting, resharing, and replying (Giaxoglou & Spilioti, 2020).  (57), discourse analysis (11), media discourse (10), political discourse (3), news discourse (2), academic discourse (1), computer mediated discourse (1) Multimodal analysis Multimodality (22), multimodal analysis (3), multimodal critical discourse analysis (1) Critical discourse analysis Critical discourse analysis (17), multimodal critical discourse analysis (1) Conversation analysis Conversation (8), conversation analysis (7) Narrative analysis Narrative (8), small story (2) Ethnographic analysis Ethnography (6), ethnomethodology (2) Quantitative Corpus analysis Corpus linguistics (3), corpus (2) The technology of social media provides affordances for users, such as Twitter's 140-character limit (Bucher & Helmond, 2018), hashtag #, addressing @, emoji, and emoticons, among others. For instance, Matley (2018) examined how users of Instagram strategically employed hashtags as a non-apology marker in a balancing act of (im)politeness that allowed for a level of sanctioned face attack in potentially inappropriate posts. Politicians are found to use @username to refer to themselves and others for identity construction (Coesemans & De Cock, 2017). Matulewska and Gwiazdowicz (2020) investigated the use of emojis in Facebook and blog comments to emphasize emotions toward the community of hunters of ruthless and bloodthirsty animal killers. They found that emojis usually accompany verbal messages to express both positive (support, happiness, and cheerfulness) and negative (anger, hatred, and disgust) emotions.
Thematic orientations. In the dimension of thematic orientations, the frequently explored topics are education and (language) policy. The theme of education often involves the use of social media in language teaching and learning. Social media can afford various developments in language teaching and learning, including digital literacies and self-directed learning among others (Reinhardt, 2019). For example, Prichard (2013) used Facebook as the platform to design several activities (e.g., creating profiles, joining groups, and making posts) to help Japanese EFL learners reach the three goals set by the TESOL technology standards. Results suggested that the training was successful in helping the learners' digital literacies. In another example, Aloraini and Cardoso (2020) investigated students' use of social media platforms in self-directed EFL learning. Findings revealed the groups' strategical choices of SM according to different language purposes and the skills to be learned (e.g., they preferred WhatsApp for communication with family and friends, Twitter for reading, and Snapchat for learning aural skills).
Language policy, often triggered by the tension between an official language and a lesser-used language (Lee, 2017), concerns the production and enforcement of linguistic norms (Vessey, 2018) in social media. Despite the top-down language policy of standardization required by the government, for instance, the indigenous youth endeavored to revitalize Maya, an indigenous language of Mexico, by actively using the language on Facebook (Cru, 2015).
Another central topic emerging in LSSM is politics, indicated by keyword "politics" (freq = 6) and other relevant keywords (e.g., political discourse, ideology, Trump, Europe). For example, McDonnell (2020) examined presidential candidates' political discourse in the 2016 US election on Twitter, and found that both Donald Trump and Hilary Clinton conformed to, and defied, gendered linguistic stereotypes.
Next, indexicality links (non)linguistic signs with the sociocultural contexts and meaning in online communication (Varis, 2016). Indexical devices, such as first person pronouns and third person self-references, were found in politicians' Twitter posts to construct politician identities (Coesemans & De Cock, 2017).
Last, multilingualism on social media focuses on language diversity and multilingual practice. Studies of language diversity often concern the protection or revitalization of minority languages by users' speaking indigenous languages on social media (Leppänen & Peuronen, 2012). For example, Stern (2017) investigated a Balinese language Facebook group and found the members persist in the use of Balinese for its revitalization regardless of the invasion of the official state language and international language. Multilingual practice refers to users' multilingual resources and repertoires in social media communications. For instance, Themistocleous (2015) explored how users in internet relay chat switched between Cypriot and Standard Greek, the two varieties of Greek spoken on the island of Cyprus. Users were found to bring into play the various languages available in their linguistic repertoire and, consequently, switch between them.
Pragmatic dimension. In the pragmatic dimension, the frequently researched communicative functions are those of constructing identity, communicating ideology, and expressing attitudes. The most important communicative function of language use in social media is to construct identities at both the individual and collective levels. Individual identities in social media studies include gender, age, politician identities, among others. For example, Jing-Schmidt and Peng (2018) found that the male used the morpheme biăo (slut) as a gendered personal suffix in the Chinese cyber lexicon on Weibo to construct masculine identity of power. By focusing on instances of explicit and implicit references to age and aging of a Greek female user of Facebook, Georgalou (2015) argued that age identity is an interactive and collaborative process both facilitated and hindered by certain Facebook configurations. In Coesemans and De Cock's (2017) work, politicians were found to use self-reference to construct politician identity on Twitter during the election.
Collective identities constructed in social media include national, group, and corporate identities. Mexican immigrants living in the US, for instance, are found to co-construct an imagined experience on Twitter to forge their ethnic identity and display their sense of belonging to Mexican culture by participating in specific cultural practices using hashtags, memes, and multimodal resources (Christiansen, 2019). Tagg and Seargeant (2012) observed that Thai-English bilinguals use letter repetitions on Facebook and MSN to index a group identity. Bloggers use different forms of compounds to construct their identities as members of a distinctive and cohesive social community (Crawford Camiciottoli, 2016). Companies tend to interact with the stakeholders on social media, in the process of which corporate identities are constructed. For example, Feng and Wu (2016) examined the construction of corporate identities on Weibo across differential ownership in China and concluded with five corporate identities: authentic identity, specialist identity, companion identity, journalist identity, and CSR identity. Facebook and Twitter accounts of Turkish banks were investigated by Ozdora-Aksak and Atakan-Duman (2014) to find that they tend to emphasize the softer, especially socially responsible side of their organizational identities.
The second communicative function of social media is to communicate ideologies in both explicit and implicit ways. Politicized text types seem to be the most explicit ideologically loaded discourse on social media. For example, Trump used informal, direct, and provoking communication style, and employed positive self-representation, negative otherrepresentation, and top-down use of Twitter to communicate his right-wing populism ideology (Kreis, 2017). Ideologies are more implicit in the form of users' responses to online interlocutors on social media (Pihlaja & Musolff, 2017). For instance, the way sexual aggression is enacted and spread on Twitter was investigated as online communities respond to and participate in forms of extreme online misogyny (Hardaker & McGlashan, 2016). Besides, Zhao and Liu (2020) examined the language ideologies surrounding two regional Putonghua (Chinese standard language) varieties, and how they are perceived by laypeople through the analysis of Weibo posts.
Similar to the ideological function of social media use is its attitudinal function. Users express attitudes toward social and political life among others on social media as it has become the front of public opinions. For example, Zhang and Zhao (2020) analyzed how Chinese diaspora Youtube vloggers relate their experience during the COVID-19 pandemic drawing on theories of narrative and stance-taking. Results show that vloggers display both universal (e.g., fears) and culturally specific (e.g., mask-wearing) feelings, and invite their viewers to co-construe the emotional experience through narrative devices (e.g., shared stories) and linguistic choices, such as the pronoun ni (you) and address term dajia (everyone).

Cross-Disciplinary and Mixed-Method Study
The above findings further show that LSSM have become increasingly cross-disciplinary in that they are integrated with business, politics, computer science, among others. For instance, Page (2014) combines pragmatics studies on speech acts with the theory of image repair to examine corporate apologies posted on Twitter. Kreis (2017) integrates critical discourse analysis with researches on right-wing populism to explore the meaning and function of Trump's discursive strategies on Twitter. Ceron and D'Adda (2016)'s use of sentiment analysis suggests a cross-disciplinary approach of linguistics and computer science when they investigated information on Twitter accounts of Italian parties and voting intentions on Twitter of voters.
Second, mixed methods are not uncommon in LSSM despite the classification shown in Table 7. In mixed methods, quantitative approaches can provide contextual information for large-scale use of language and qualitative approaches can focus on specific details (Georgakopoulou & Spilioti, 2016;Page et al., 2014;Zeller, 2017). For example, Page (2014) used the corpus-linguistic method to quantify the frequency of corporate apologies posted on Twitter and then identified strategies adopted for image repair through qualitative analysis of example posts from the dataset. In addition, the qualitative survey of students' opinions on the Skype-assisted learning was combined with the quantitative analysis of types of lessons chosen by students, lessons taken per week by students, CMC (computer-mediated communication) lessons per student in a university classroom to reveal the important role of social media in English learning (Terhune, 2016).
In addition, studies of social media in other fields other than in that of linguistics tend to apply research methods based on non-text data generated on social media, such as geospatial analysis and network centrality analysis. Geospatial analysis uses geographical and spatial data on social media to get better insights about phenomena under investigation (Buchel & Pennington, 2017). For instance, Sobkowicz et al. (2012) argued that geospatial analysis could be used to understand the topics and opinions of the stakeholder from different regions in real-time to forecast political opinions on social media by modeling opinion formation. Network centrality analysis, structural analysis of social networks, aims to help understand the effect of networked influence (Gruzd & Wellman, 2014) on behaviors and connections of online users (Ghajar-Khosravi & Chignell, 2017). For example, Ghajar-Khosravi (2015) conducted the network centrality analysis of social networks on the web to investigate the topological features of social networks with different connections, such as the average distance between the nodes of collaboration-based networks much higher than friendship-based networks.

Conclusion
This paper attempted to present the status quo of linguistic studies on social media using bibliometric analysis. Our findings showed that such studies are increasing and being published in leading journals of linguistics as well as crossdisciplinary ones involving linguistics and politics, education, psychology, and anthropology, among others. Twitter ranked the first in the researched social media platforms, followed by Facebook, Blogs, Instagram, and YouTube. English was the most frequently used language in researched social media followed by French, Chinese, Spanish, and African language. Second, sociolinguistics and pragmatics are found to be the major strands of linguistics. Third, the study identified seven research methods: discourse analysis, critical discourse analysis, conversation analysis, multimodal analysis, narrative analysis, ethnographic analysis, and corpus analysis. Fourth, important research themes were extracted and classified based on four dimensions of the genre framework of social media studies. They were the participation nature and technology affordances of social media in the dimension of compositional level, the researched topics of education, (language) policy and politics in the dimension of thematic orientations, the researched discursive practices of (im) politeness, humor, indexicality and multilingualism in the dimension of stylistic traits, and the researched communicative functions of constructing identity, communicating (language) ideology and expressing attitude in the pragmatic dimension. Moreover, linguistic studies on social media tended to be characterized by cross-disciplinary and mixedmethod approaches.
There are also limitations to the current study in some aspects. For instance, the study only selected articles published in the area of linguistics for the sake of focusing on linguistic perspectives, but articles in the area of communication studies may also adopt linguistic approaches. Future research may include more data types (articles published in edited books and proceedings, and monographs) in linguistics as well as other disciplines (communication, politics, education). In addition, the reviewed aspects of strands of linguistics, research methods, and research themes are worth exploring in more depth in future research.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Humanities and Social Sciences Project of the Ministry of Education of the People's Republic of China (grant number: 21YJA740035).