Bibliometric Analysis of Statistics Journals Indexed in Web of Science Under Emerging Source Citation Index

Over the last few decades, statistics and probability became an integral part of major research areas with continuously increasing demand. This article aimed for bibliometric analysis of journals indexed in the category of “Statistics and Probability” under Emerging Source Citation Index (ESCI) of Web of Science (WoS) (2015–2019). After data extraction from WoS, the bibliometric analysis at source, author, and document levels was performed using “Bibliometrix” R-package. Out of 38 journals indexed in ESCI, 32 that fulfilled the criteria were selected. In total, 4,294 documents were retrieved and analyzed by various bibliometric metrices, and appropriate data presentation tools were planned. Sixteen countries were found as origin of sources, mainly from USA and India while 10 countries had one journal each. Among journals, Korean Journal of Applied Statistics, Journal of Statistics & Management Systems and Advances and Applications in Statistics contributed the most. Wiley Interdisciplinary Reviews-Computational Statistics, Journal of Statistics & Management Systems, and Pakistan Journal of Statistics and Operation Research showed leading impact indices. Total number of authors were 6,374. Only one document was cited more than 100 times. Countries collaboration network showed three main groups with minor overlap. Two clusters of keywords namely survey sampling and distribution theory were found. Overall, a consistent publication production trend was observed. Countries with relatively higher impact journals were USA, India, and Pakistan. Total citations and average citations showed dominance of developed countries. Study findings can benefit researchers and other stakeholders of the study subject to plan with better informed decisions.


Introduction
Statistics and probability have expanded its scope over the last few decades and have become an integral part of many fields (Donoho, 2017;Drummond & Tom, 2011). In particular, this attribute has increasing demand and contribution for major areas of research such as Arts & Humanities, Life Sciences & Biomedicine, Physical Sciences, Social Sciences and Technology. Despite known significance, the number of journals in this specific subject are assumed to be far less as compared with number of journals in other subjects. However, some positive wave has been observed on the issue over the last few years.
Statistics and Probability is one of the most important subjects which provide methods to find structure and give insight through data. Big data is establishing new challenges to statisticians: volume, velocity, and variability of unstructured data will require new theories, methods, and tools (Secchi, 2018). Recent advancement in computational methods has grabbed wide attention from researchers and readers of many disciplines toward this subject. Nevertheless, there is limited literature that explore the development and evolution of the subject itself.
There are numerous databases available for researchers and academicians such as Scopus, EBSCO, Science Direct, ProQuest, and PubMed. However, Web of Science (WoS), a Clarivate Analytics (Formerly Thomson Reuters) maintained platform, is considered as the most precise and comprehensive source for scientific exploration and appraisal with highest quality indexing. It is also assumed to be more appropriate to evaluate the research output of different regions, authors, or organizations (Jelercic et al., 2010;Ronda-Pupo et al., 2015). It encompasses search across salient search databases, disciplines, and document types along with more than one billion searchable cited references (WoS). Although, for any journal it is not common to be included in one subject category as usually most journals show overlap in terms of their coverage context yet WoS has made certain specific subject categories and subsequently each published document inherits all subject categories given to the parent journal. "Statistics and Probability" is one of such category in WoS with around 163 journals (Master Journal List [MJL], Web of Science Group).
In addition, in late 2015, WoS launched "Emerging Sources Citation Index" (ESCI), with more than 7000 journals covering scientific, social science, and humanities literature (ESCI, ISI Web of Knowledge by Clarivate Analytics [formerly known as Thomson Reuters], 2015). Journals indexed in the ESCI do not obtain Impact Factors. However, Journal Citation Reports (JCR) citation counts includes ESCI citations and consequently contributing to other journals Impact Factors. Moreover, in the continuously growing, dynamic and diverse literature, ESCI provides WoS users with extended possibilities to explore emerging research areas.
Bibliometrics is a gateway to evaluate such proceedings and fill the knowledge gap (Abramo & D'Angelo, 2011;Moed, 2006). In the field of statistics, various bibliometric studies have explored different aspects such as citation patterns in the journals of statistics and probability (Stigler, 1994), communications between statistical methodology and applied statistics (Eto, 2000), most-cited statistical papers (Ryan & Woodall, 2005), decade of research in statistics (De Battisti et al., 2015), statistical modeling of citation exchange between statistics journals (Varin et al., 2016), and the importance of being clustered: uncluttering the trends of statistics (Anderlucci et al., 2019).
In addition, over the last few decades, WoS has been one of the widely used sources for bibliometric analysis in various other scientific fields (Hossain, 2020;Merigó & Yang, 2017;Shukla et al., 2020;Yu & He, 2020;Yu et al., 2017). Foe ESCI, still no metrices and performance evaluation support is provided. Moreover, to the best of our knowledge, no study has explored the performance and trends of statistics journals in the ESCI category of WoS.
Therefore, this study aimed to share bibliometric analysis of all published documents during 2015-2019, from journals in the most relevant study topic category of "Statistics and Probability" under Emerging Source Citation Index (ESCI) of WoS.

Method
All journals from the category of "Statistics and Probability" under ESCI of WoS during 2015-2019 from the MJL, Web of Science Group were identified and found to be 38 ( Figure 1). Later, all 38 identified journals (sources) were verified individually from the actual list provided by WoS in study category and added in "advanced search" through field tag: SO = Publication Name [Index] in ESCI.
For further analysis, we selected 32 out of 38 journals in the same category as shown in Figure  All types of publications (total = 4,294) from the selected 32 journals were included. Data were extracted from WoS in plain text files and later bibliometric analysis at source level, author level, and at document level were performed using R "Bibliometrix" package (Aria & Cuccurullo, 2017) which is considered to have a relatively comprehensive and extensive techniques as compared with other bibliometric analysis tools (Moral-Muñoz et al., 2020). Tableau Desktop (2018.2 version) software was used for productivity and citation geographic mapping (Tableau).
Search was conducted on February 11, 2020, and two researchers (N.S.B. and A.A.M.) independently searched and abstracted required information to verify the process. The information of retrieved documents was analyzed by various bibliometric metrices such as journals, publication year, authors, indices, citation reports, institutions, and countries/ regions, and various data presentation tools were planned accordingly.
At source level, impact was assessed by h-index and g-index. The h-index is an author-level metric which assists to measure both impact of citations and publications productivity (Hirsch, 2005;Vílchez-Román, 2014). While the g-index is another index used for quantifying productivity and is based on the distribution of citations received by a researcher's publications (Egghe, 2006). Average citation per document is an indicator showing citations per publication to evaluate the impact authors, countries, and journals (Yi et al., 2008). It is calculated as total number of citations received by total number of documents published by a journal to assess the yearly impact and provides fairer evaluation for author and journal activity (Harzing, 2010). Dendrogram was planned to evaluate keywords. A collaborative network was planned to evaluate collaborative activities of various countries (Newman, 2004).

Results
Source-related information including number of items published, h-index, g-index, Total Citations (TC), and average citations per documents are shown in Table 1. Total number of documents published for all 32 ESCI journals in WoS category "Statistics & Probability" were 4,294, with 839, 835, 850, 900, and 842 documents published in 2015, 2016, 2017, 2018, and 2019, respectively. Articles were the most common document type representing around 94% (4,042) followed by editorial material 119 (3%) and review papers 103 (2.4%).   Table 1 Total number of authors were 6,374. Average authors per article was 2.37. There were 947 single authored documents. Table 2 shows most productive authors. Only three authors (Hamedani GG, Kim J, and Cordeiro GM) contributed with 30 or more publications in 5 years. Six authors showed h-index of ≥5 as shown in Table 2. King Abdulaziz University, Cairo University, and Korea University were shown to have the most frequent authors' affiliation.
In total, 16 countries were found as origin of sources. Among those, 10 countries had only one journal in study category, whereas USA and India collectively were contributing the most, around 37% with seven and five journals, respectively. Top corresponding authors' country were observed as India, Korea, and USA showing 660, 573, and 463 documents, respectively.
Figure 2(A) shows the country level publications productivity as well as total number of citations received. When explored for countries collaboration network as shown in Figure 2(B), three main groups were identified with minor overlap.
Only one document "KIM S, 2015, COMMUN STAT APPL MET" was cited more than 100 times and source "COMMUN STAT APPL MET" was found to be the leading source of top 10 highly cited documents as shown in Table 3. In terms of corresponding authors, India, Korea, and USA were the leading contributors, USA showing relatively more multiple country publications (MCP) in contrast to India and Korea as shown in Table 3. Figure 3 describe the occurrences, trends, and clustering of author keywords. Order statistics, bias and maximum likelihood as most frequently used author keywords from the selected study publications. Figure 3(A) is showing trend of key words by year. Simulation, moments, Monte Carlo simulation, expectation-maximization algorithm and variable selection were most trending word in year 2018. Figure 3(B) is showing a dendrogram showing mainly two clusters of keywords namely survey sampling and distribution theory. Further subclustering of distribution theory cluster can be observed as Regression, Bayesians, and Parameter Estimation keyword clusters.

Discussion
This article shares the bibliometric analysis for all 4,294 documents published in 32 ESCI journals in WoS category "Statistics and Probability" between 2015 and 2019. It showed a uniform trend in terms of numbers of publications each year with majority as articles. Findings suggest that three countries, USA, India, and Korea, had relatively more journals among total of 16 in this category, while 10 countries had only one journal. In addition, journals Korean

Journal of Applied Statistics, Journal of Statistics & Management Systems, and Advances and Applications in
Statistics contributed the most and were also from same three countries: Korea, India, and USA, respectively. In fact, USA and India collectively were contributing the most, with around 37.5% of all journals. On the other hand, least three contributing journals were from USA. Of four journals that showed h-index of >5, two were from USA, and one each from India and Pakistan. A journal each from USA, Korea, and Pakistan showed g-index of >8. These findings suggest the dominance of three countries: USA, India, and Korea for contribution and USA and India for impact. Korean journal did not show proportionate expected impact indices, whereas a journal from Pakistan, Pakistan Journal of Statistics and Operation Research, showed consistency in terms of impact indices. Eight journals showed >100 TC and were mostly from developed countries except a journal each from Korea, India, and Pakistan. These findings also suggest the consistent and quality input in the field from less developed regions. Three journals had average citations per document of >2 and all were from USA.
Majority of the published documents were multi-author documents. Only one document was cited more than 100 times and was from a Korean journal. In terms of corresponding authors, India, Korea, and USA were the leading contributors, with USA showing relatively more MCP in contrast to India and Korea. Only three authors contributed 30 or more publications and were from USA, Korea, and Brazil. Six authors showed h-index of ≥5. Interestingly, a university from Kingdom of Saudi Arabia, Egypt, and Korea were found with most frequent authors' affiliation. This finding may also suggest that relatively more institutes contributing to publications were from other regions outside the journals' origin. Countries collaboration network showed three main groups with minor overlap. One group mainly represents European region along with UK, South Africa, and Russia. While second group shows mainly USA centered representations from other parts of the world, while third group represents Asia and Middle East with India as an exception and was found more collaborative in second group along with USA. Perhaps, increased and relatively diverse collaborative prospects need to be considered for possible opportunities with efforts to improvise involvement of least representing regions, particularly from low-and middle-income countries. Mainly two clusters of keywords namely survey sampling and distribution theory were found. On further sub-clustering of distribution theory cluster, Regression, Bayesians, and Parameter Estimation keyword clusters were observed. These findings suggest diverse coverage of topics. Although limited available literature and data for comparison was a limitation, yet it also suggests for further and continuous exploration of trends and relevant analysis.
In terms of other limitations, analysis was conducted only on WoS-ESCI journals in the "Statistics & Probability" category with limited timeframe of 2015-2019 that may limit the generalizability of finding to the category in general. Second, limitations in WoS database may have some unidentified issues; however, the findings shared here for the leading contributors were manually verified. In addition, continuous changes and updates may show different publications data to be analyzed depending upon date of search and timeframe. Metadata from other sources might be beneficial to complement this study and provide comprehensive context on the subject.

Conclusion
Considering scarcity of literature on "Statistics & Probability" publication trends, despite its significance in research and academics, this article assists to fill the gap by providing overview and salient trends in WoS-ESCI "Statistics & Probability" category (2015-2019). A consistent publication trend was observed in terms of documents production and articles as major contributors, over the 5-year time span. Overall, 16 countries contributed to the 32 selected journals with major contributions from USA and India. Countries with journals showing relatively higher impact were USA, India, and Pakistan. Most of the documents were multiauthored. Total citations and average citations showed dominance of developed countries. Countries collaboration network showed three main groups with minor overlap. Two clusters of keywords namely survey sampling and distribution theory were  found. Although limited available literature and data for comparison was a limitation, yet it also suggests for further and continuous exploration of trends and relevant analysis. In conclusion, the bibliometric findings of this study can benefit relevant stakeholders and particularly researchers to better understand the performance and trends of study subject and plan with better informed decisions with the help of these findings.