When text mining meets science mapping in the bibliometric analysis: A review and future opportunities

The increasing volume of scientific publications has created a need for more efficient and effective literature review processes. Bibliometric analysis is a quantitative approach to analysing bibliographic data extracted from research studies to identify publishing patterns and trends within specific knowledge domains. Science mapping is a widespread technique in bibliometric analysis that enables researchers to reveal the structure of their respective fields and identify dominant themes. However, there is still a lack of clarity and transparency in describing the science mapping process, which can hinder continued refinement and improvement of this critical field of research. This study provides a comprehensive overview of science mapping in bibliometric analysis based on published review studies from prestigious international journals. It outlines the science mapping mechanism and explores challenges and opportunities, focusing on incorporating text mining approaches to support the analytical literature review process. The study sheds light on previously unexplored mechanisms in the literature of bibliometric analysis, revealing gaps in existing research. The study contributes to the growing body of research on bibliometric analysis by highlighting the need for continuous improvement and the deployment of text mining techniques to support the analysis of scientific publication data. This study offers valuable insights for researchers, policymakers, and practitioners seeking to enhance their understanding of science mapping and bibliometric analysis.


Introduction
As science continues to evolve, research fields have entered the era of massive data and are growing rapidly.Data collected until the end of 2021 in the Scopus database reveal that scholar categories are classified into 22 scientific fields and 174 sub-fields within 2022, while more than 10 million researchers are included for the evaluation. 1 In other words, the number of scientific research publications per year is substantially increasing, which drives the need to revamp the existing literature review process.Owing to the progressive expansion of scientific advancement in numerous domains, there are more intersections across research themes, resulting in the rapid evolution of academic research.From handwriting or typewriting to electronic copies, we are now at the point of creating publications' fingerprints in digital scientific platforms, such as the Web of Science (WoS).In addition to the research content, the related publication data, for example, sources, organisations, cited references and citation frequencies, are electronically retained.The era of 'Research 4.0' is thus characterised by enabling rich scientific data for investigation.Still, the systematic approach to analysing the scientific publication data for supporting the literature review and survey is much more challenging.
To tackle the above challenge, knowledge management (KM) in scientific research has emerged to use statistical methods and processes to coordinate and leverage the cumulative knowledge generated by research activities to promote sustainability and respond to changes in the research environment. 2,3Knowledge management (KM) is pivotal in scientific research, serving as the foundation for the systematic process of capturing, distributing, and effectively using knowledge to foster innovation and collaboration.The essence of KM lies in its ability to transform raw data and empirical observations into accessible and actionable insights, which is critical in the fast-paced and ever-evolving landscape of scientific inquiry.Within this context, bibliometric analysis emerges as a powerful tool, offering a quantitative approach to examining scientific literature.And it has significantly contributed to research advancement across various fields.Bibliometrics is a quantitative approach involving analysing bibliographic data extracted from research studies to study publishing patterns and trends of a specific knowledge domain. 4Using bibliometric analysis enables researchers to reveal the structure of their respective fields and identify dominant themes. 5cience mapping methods in bibliometric analysis has gained popularity over the past decade.However, there is still a lack of a comprehensive overview of the current status quo of these methods.While several software packages and algorithms are available for conducting science mapping, there is a need for greater clarity and transparency in describing the scientific mapping process to allow for continued refinement and improvement of this critical field of research.Text mining has emerged as a knowledge-intensive approach that utilises various analytical techniques, including information extraction, summarisation, topic modeling, document clustering, and sentiment analysis, to achieve a more comprehensive and in-depth analysis of a particular field of research.Text mining is a sophisticated analytical process applied in various fields, incorporating advanced concepts such as machine learning.In this context, incorporating text mining approaches into the science mapping should be considered to strengthen objectivity when interpreting the science mapping results and enable the analytical literature review process.
This study aims to provide an overview of the science mapping in bibliometric analysis based on the published review studies from prestigious international journals.Once the literature survey of the bibliometric analysis is completed, the science mapping mechanism is outlined with the exploration of challenges and opportunities.Lastly, the possibility of incorporating text mining approaches in the science mapping mechanism is discovered to drive the nextgeneration science mapping process.
This study presents a comprehensive overview of the use of science mapping in bibliometric analysis across various disciplines, highlighting the field's current state, the challenges faced, and the opportunities for future research.The study sheds light on previously unexplored mechanisms in the literature of bibliometric analysis, revealing gaps in existing research.In light of the challenges and opportunities identified, the deployment of text mining techniques has been discussed to support the bibliometric analysis, resulting in a more nuanced understanding of science mapping.Therefore, this study serves as a valuable contribution to the growing body of research on bibliometric analysis and highlights the need for continuous improvements in this research field.
The remainder of this paper is organised as follows.The second section describes the review methodology to outline the systematic approach to identifying high-quality articles.The third section summarises the theoretical and technical background of the bibliometric analysis, while the recent advances in text mining for the bibliometric analysis are discussed in the fourth section.The fifth section presents future research directions in the context of bibliometric analysis when the text mining algorithms are considered.Finally, the last section summarises the conclusions of this study.

Review methodology
This section delineates the methodology employed for conducting a literature survey, including the definition of terms and resources used in the literature search and the criteria for selecting and assimilating articles from the initial search result.The literature review is typically conducted in a stepwise cycle to obtain a holistic understanding of the bibliometric analysis, which includes the identification of relevant search keywords, article search, and analysis.The Web of Science (WoS) Core Collection contains curated and consistently indexed material that is uniquely selective and reliable for bibliometric analysis, owing to its unparalleled data structure and independent and rigorous auditing process, which has been in place for over five decades. 6Besides, WoS covers a wide range of disciplines and includes a variety of document types, such as articles, conference proceedings, and book chapters, which can ultimately provide a comprehensive view of the scholarly landscape.The study establishes a rigorous review protocol tailored to identify high-calibre bibliometric research, utilising a precise combination of databases, keywords, publication types, and ranking criteria, resulting in a distilled collection of 37 seminal papers from an initial set of 7404.This methodology underscores the widespread adoption of bibliometric analysis across prestigious journals, with an uptrend in application noted in recent years, particularly in leading publications ranked in JCR Q1, A or above in ABDC and 3* or above in AJG.A systematic review protocol to collect high-quality bibliometric research studies is thus developed in the following.

Review protocol
To search for high-quality studies related to the bibliometric analysis, the review protocol with five areas, including scientific database, keywords, publication type, publication year, and suitability criteria, is proposed in Table 1.Only the article and review articles are considered once the keywords are applied in the WoS Core Collection.In contrast, the extracted articles should be relevant to using bibliometric analysis in a specific research domain.From the initial search process, a total of 7404 studies were found in the scientific database.Due to the vast number of publications and the purpose of high-quality concentration, the initial search is refined by 'Highly Cited Papers,' which narrows the 7404 papers to 131.Leading journals are selected based on their proven capability and expertise in absorbing prominent articles in their respective fields to present a more representative and mainstream picture.A combination of three inclusion and exclusion criteria were applied to select 19 journals for review, namely Q1 in the Journal Citation Report (JCR) Impact Factor 2021, 7 A/A* in the Australian Business Deans Council (ABDC) 2019 list of journals, 8 and 3/4/4* in the Academic Journal Guide (AJG) of the Association of Business Schools (ABS) 2021. 9nsequently, 37 relevant studies are extracted for the detailed investigation in the following sections.

Distribution of the selected papers
In the past decades, 37 research and review articles, as in Appendix A, applied the bibliometric analysis to analyse and summarise the intellectual development of the specific research domains.At the same time, 19 prestige journals are considered as in Figure 1.It is found that 7 studies related to the use of bibliometric analysis are published in Technological Forecasting and Social Change.In comparison, 12 studies are evenly published in the International Journal of Production Research and the International Journal of Production Economics and Research Policy.In other words, the methodology of the bibliometric analysis is widely accepted and applied to summarise and visualise the development of intellectual cores.In addition, the number of selected papers over the years is shown in Figure 2, where the applications of bibliometric analysis become more popularised in recent years.On average, five high-quality articles have been published in the above prestige journals over the last 5 years.

Theoretical and technical background of bibliometric analysis
Bibliometric analysis typically encompasses two categories: (i) selection of bibliometric data and (ii) science mapping.The selection of bibliometric data involves the extraction of publication fingerprints about research constituents, such as information on authors, countries and citation counts.At the same time, science mapping examines the contribution and relationships between research constituents based on the analysis of bibliometric data. 5Science mapping pertains to intellectual interactions and structural connections among research constituents.The methods for science mapping generally consist of co-citation analysis, bibliographic coupling, co-word analysis, and co-authorship analysis. 10,11n addition to the typical science mapping process, some enrichments are employed, such as the PageRank algorithm and betweenness and closeness centrality.These approaches aid in further delineating science mapping results by analysing knowledge clusters in each field.

Bibliometric data
In brief, bibliometric data is regarded as a statistical form of data derived from academic publications, providing a quantitative perspective on the academic literature. 12,13It sheds light on fundamental information of academic literature, depicts various attributes, and identifies the flow of knowledge within a specific field over time through the analysis of data obtained from scientific databases such as publication year, 14 journals, 15 citation count, 16 keywords, authors 17 and location of institutions, leading to a comprehensive understanding of the knowledge development. 18s in Table 2, most bibliometric research studies (accounting for 70.2% of 37 selected papers) prefer analysing publication years, citation counts and keywords.Firstly, publication years provide insight into the historical development of a knowledge domain, allowing readers to identify key milestones, trends, and changes over time.This information can help researchers understand how the research  area has evolved and identify important gaps or emerging research areas.Secondly, citation counts indicate the impact and influence of individual papers and authors in a specific research area.Citation analysis can be applied to identify the most influential papers and authors and assess the level of agreement or controversy within the field.High citation counts may also indicate areas where further research is needed to clarify or extend existing findings.Thirdly, keywords are important for identifying and categorising papers within a specific research area.Subsequently, common themes and gaps in the extant literature can be identified.Some hints of the future development of the research domain can be discovered by combining with the citation analysis.

Science mapping analysis
Science mapping can essentially be divided into four phases: selection of bibliometric methods, construction of matrices, measurement of similarities, and post-hoc analysis, as shown in Figure 3. Regarding the science mapping methods, four methods, co-citation, bibliographic coupling, co-authorship, and co-word analyses, are applied in the selected research studies, as shown in Table 3.Among the above four methods, co-citation was the most commonly used, accounting for 59.4% of the pool.This method is useful for identifying publications that have received considerable attention in the past and for identifying publications frequently cited in a particular research area.It is also worth noting that co-word analysis was used extensively in 11 articles.On the other hand, bibliographic coupling and co-authorship were only utilised to a limited extent in the selected publications.Moreover, various software, including VOSviewer, Citespace, Bibexcel, Bibliometrix-R and Sitkis, were applied to perform bibliometric analysis on data extracted from scientific databases. 19To fully discover the mechanism behind the bibliometric analysis, the generic process to achieve the above science mapping methods is illustrated as follows: In science mapping, a central task is to investigate correlation patterns among bibliometric objects, such as authors, references and keywords.To this end, in terms of publications and bibliometric objects taken together, an m×n occurrence matrix O is expressed as in equation ( 1), 20 where o ij is the Boolean value, and the indexes i 2 ½1, m and j 2 ½1, n denotes the research documents and evaluation objects considered in the bibliometric analysis.Therefore, the occurrence of the selected m documents in terms of a specific evaluation dimension can be formulated.
Specifically, in the co-citation analysis, the evaluation objects (i.e. the index j 2 ½1, n) refer to the union set of reference items, that is cited papers, of the considered research documents.In contrast, the transpose of a correlation matrix for the co-citation analysis is considered for the bibliographic coupling analysis.In other words, while co-citation analysis measures the similarity between two cited reference papers, bibliographic coupling measures similarity based on the references' overlap.In the co-word analysis, the evaluation objects refer to the union set of authors' keywords in the publication dataset.Lastly, in the co-authorship analysis, the evaluation objects denote the union set of authors who contributed to the particular research domain.
Based on the above occurrence matrix O, a co-occurrence matrix C is constructed, which is a symmetric non-negative n×n matrix to indicate the sum of the occurrence between evaluation objects, as in equation (2).Also, e p and e q are two elementary vectors, where pth and qth positions in e p and e q are zero.When p¼q, the total frequencies of each evaluation object from all the publication documents can be determined.
Firstly, the association strength between two evaluation objects can be quantified by the ratio of the observed co- occurrence with the total occurrences, assuming the statistical independence as in equation (3).
Secondly, cosine similarity is another measure for the bibliometric analysis, which is calculated as the ratio of the observed co-occurrence with the geometric mean of total occurrences as in equation ( 4).
Thirdly, the inclusion similarity is calculated by the ratio between the observed co-occurrence and the minimum value of the occurrences in p and q as in equation (5).
Lastly, the Jaccard index is computed by the ratio between the observed co-occurrence and the total occurrences in p and q without the co-occurrence scenarios as in equation ( 6).
Furthermore, the generalised similarity measure was proposed as in equation (7) for evaluating the similarity between two evaluation objects p and q, where α2Rnf0g denotes the parameter to drive different similarity measures.As a result, the co-occurrence behaviour can be analysed by selecting the most appropriate similarity measures, while the science mapping results can be comprehensively compared and validated.

Post-hoc analysis
The process of science mapping is considered unified and complete once the similarity evaluation of scientific articles is conducted.The science mapping results can be visualised to identify the relationship between articles and the topics of the intellectual structure.According to the selected 37 relevant publications, five additional techniques are summarised to facilitate the post-hoc analysis of science mapping, namely (i) data clustering, (ii) PageRank algorithm, (iii) evolution analysis, (iv) strategic graph analysis and (v) network centrality.Firstly, data clustering has become a critical research field in science mapping analysis, receiving increasing attention from scholars and research organisations. 24Most top publications in our dataset refer to this post-hoc technique, which commonly becomes the source of science mapping visualisation.Clustering in science mapping refers to grouping similar objects together, which is a helpful technique in science mapping to identify clusters of related research areas, topics and themes within a scientific domain, as well as potential collaborations and partnerships among researchers.The mechanism of the data clustering is to minimise a weighted sum of squared Euclidean distances between all pairs of items through an optimisation process.In J18, 13 the analysis reveals the disruption risks within the supply chain.It illustrates the co-occurrence network of the author's indexed keywords about the supply chain disruption (SCD) field from 1310 publications.In J13, 11 four network clusters based on co-citation are formulated, strictly related to these three phases of blockchain development from 2233 selected publications.
Secondly, PageRank is a ranking algorithm originally developed and used in search engines to rank web pages to provide a better search engine experience. 25Using the PageRank algorithm, a score corresponding to the principal eigenvector of the normalised matrix is assigned based on the number and quality of incoming links.Science mapping implies that a document frequently co-mentioned with other important documents in the network receives a higher PageRank score, signifying its greater centrality and importance in the field.Prestige studies can be extracted within the research field. 26In J9, 10 the ten most prestigious articles from each cluster were examined after co-citation analysis, and the primary insights and objectives of each reviewed article were thus derived.
Thirdly, evolution analysis is a dynamic analysis technique used in science mapping to investigate the changes and trends of a specific research domain over time. 24Researchers commonly use this technique to identify trends and provide academic insights for deriving future agendas.Moreover, evolution analysis can assist researchers in making informed decisions about funding, collaboration, and publication strategies based on a comprehensive understanding of the current state and trajectory of the research domain.In J29, 27 the evolution of co-citation in quantitative models for managing supply chain risks was conducted, which indicates that the supply chain risk management at the design, tactical and operational planning levels, as well as demand and supply forecasting, appeared early and evolved quickly over the last two decades.Sourcing and supply uncertainty modelling with related case studies emerged and have evolved quite rapidly.
Fourthly, the thematic map was developed to analyse further the science mapping results, which were obtained and plotted according to centrality and density indices.Some researchers have proposed a framework for categorising scientific publications based on their centrality and density in the science mapping network. 28Thematic maps commonly identify articles and classify them into four main categories: (i) motor themes (high centrality and high density), (ii) basic themes (high centrality and low density), (iii) emerging/disappearing themes (low centrality and low density), and (iv) specialised themes (low centrality and high density).Research trends, clusters, and gaps can be effectively identified using thematic maps.In J13, 11 eight clusters were identified and classified into four main categories related to co-occurring keywords about distributed ledgers and decentralisation technologies of blockchain.At the same time, potential collaborators and funders could also be identified to facilitate future research planning and development.
Lastly, regarding network centrality, betweenness centrality is a measure of the degree to which other nodes depend on a specific node, hence, a measure of potential control.Simply speaking, betweenness centrality measures the extent to which a particular node lies on the shortest route connecting other nodes, with a higher betweenness centrality value indicating a node's greater influence. 29loseness centrality, on the other hand, can be interpreted as a measure of access efficiency or independence from potential control by intermediaries.Essentially, a node's closeness centrality is defined as the inverse of the average length of the shortest routes to/from all other nodes in the network, with greater closeness centrality indicating more relatedness to other nodes. 30,31Both of these indicators assist researchers in identifying additional details and relationships based on the science mapping results.In J11, 12 co-occurrence analysis was used to apply betweenness centrality and closeness centrality to uncover additional relationships between author-defined keywords network analysis.

Recent advances in text mining for the bibliometric analysis
Recent advances in text mining and natural language process (NLP) techniques enable further possibilities to enhance bibliometric analysis.Relationship extraction, on the other hand, aims to identify the relationships between the entities defined in the text.Event extraction identifies events or actions described in the text, including their participants, locations, and time frames.In contrast, attribute extraction involves extracting specific attributes or properties of objects, such as their age, gender, occupation, or other relevant attributes.Each type of information extraction involves characteristic identification processes, such as part-of-speech (POS) tagging for entity extraction, relationship dependency parsing for relationship extraction, event trigger identification, argument identification, and attribute identification for event and attribute extraction.Several techniques and approaches can be used for information extraction, including rule-based systems, statistical methods, and machine learning algorithms. 33ule-based systems use a set of handcrafted rules to identify patterns in text and extract relevant information, while statistical methods rely on probabilistic models to identify patterns in the data.

Overview of text mining techniques
Text/document clustering.Text/document clustering is a well-established method for systematically grouping similar objects into various clusters, and it is also applicable to textual data from documents.Considering the following characteristics of document representation. 34Representing text poses unique challenges due to its high dimensionality and sparsity.The lexicon used in a given dataset may contain many terms, while individual documents may only contain a small subset.Even with a large lexicon, words tend to be correlated, implying that the number of concepts in the data is considerably smaller than the feature space.This necessitates carefully designing clustering algorithms that can account for these correlations.
Moreover, given the wide variation in the number of words across different documents, it is essential to normalise the document representations appropriately during the clustering process.After selecting and extracting relevant features from the textual data, such as word frequency term frequency-inverse document frequency (TF-IDF), 35 effective clustering requires normalising these features in terms of their relative frequency of presence in the document and over the entire collection.Finally, different algorithms can be applied for text clusterings, such as kmeans clustering, 36 hierarchical clustering, and densitybased clustering, to supplement the science mapping results.
Topic modelling.Topic modelling is a technique used in text mining to automatically identify hidden topics and patterns from a collection of documents.The goal is to discover the hidden semantic structure in a text corpus by grouping words that frequently co-occur across the documents. 37ach document is assigned to the topics with different weights, which specify both the degree of membership and the coordinates of the document set, after learning and preparing the text data for analysis, removing stop words, stemming or lemmatising the words, and converting the text into a numerical format that can be analysed.Topic modelling algorithm is then applied to extract the hidden topics.Latent Dirichlet allocation (LDA) and Non-negative Matrix Factorization (NMF) are commonly used algorithms.In the LDA, it is assumed that each document is a mixture of several topics, and each topic is a distribution of words, in which Bayesian inference is applied to estimate the topic distributions and the word distributions for each topic. 38he use of NMF involves factorising a matrix of word frequencies into a product of two matrices, one representing the data's topic structure.The input matrix is typically a document-term frequency matrix, and the resulting factor matrices can then be used to represent each document as a linear combination of a small number of topics or latent features, which can be interpreted as clusters of related terms.Once the topic model in a field or cross-filed has been learned, it can be used for various applications such as document comprehension, recommendation, and summarisation. 39xt summarisation.Text summarisation generates a shorter version of a given text while retaining its most important information.The goal of text summarisation is to make the content of a text more easily and quickly accessible by reducing the amount of information that the reader needs to process. 40There are two main approaches to text summarisation: extraction-based and abstraction-based. 41xtraction-based summarisation involves identifying and extracting the most important sentences or phrases in a text to create a summary.This approach relies on statistical and linguistic techniques to identify and rank key sentences or phrases based on their importance.For the output of the extractive summarisation, target sentences are integrated as the output.
On the other hand, Abstraction-based summarisation involves creating a summary that is not simply a selection of existing text but rather a new text that conveys the same information as the original text but in a more concise form.This approach uses natural language generation techniques to create a more human-like summary.The generic processes of text summarisation involve text representation, sentence extraction, and sentence fusion.For the output of the abstractive summarisation, the selected sentences may need to be combined and fused to generate a coherent summary, which can be supported by natural language generation, encoder-decoder models, attention mechanisms, transformer models, and reinforcement learning.
Sentiment analysis.Sentiment analysis is the process of identifying and extracting subjective information from text data to determine the emotional tone of the text, whether it is positive, negative, or neutral, expressed by the writer towards a particular topic or entity. 42Sentiment analysis can be performed at various levels of granularity, ranging from individual words and phrases to entire documents.Feature extraction and sentiment classification are two core compositions of sentiment analysis.Feature extraction involves identifying and extracting relevant features from the pre-processed text data, like specific words or phrases, parts of publications, syntactic structures, and other linguistic features. 43Sentiment classification uses rule-based systems, machine learning algorithms, and deep learning models to classify text automatically as positive, negative, or neutral in sentiment.The rule-based approach to sentiment analysis involves using pre-defined rules or lists of words with associated sentiment scores to determine the sentiment of the text.This approach assumes that certain words or phrases are more likely to be associated with positive or negative sentiment; for example, words such as 'happy', 'joyful', and 'excited' indicate positive sentiment scores, while words such as 'sad', 'angry', and 'frustrated' indicate negative sentiment scores.In artificial intelligence, the machine learning approach entails utilising neural network algorithms to learn from data and detect sentiment patterns in textual data. 24Typically, these algorithms are trained on a labelled text data dataset with known sentiment scores.By applying statistical models, the algorithms classify newly encountered text data into positive, negative, or neutral sentiment categories.Although this approach is more intricate and necessitates greater computational resources and data, it generally yields higher accuracy. 44

Future research agenda
According to the extant literature, the research status quo related to the integration of text mining and bibliometric analysis is summarised in Table 4 to hint at future research opportunities.Consequently, the domain of bibliometric research can be further evolved to support the development of

Information extraction and science mapping
Information extraction (IE) aims to extract specific information from a large corpus of text data, such as named entities, relations between entities, and events.IE can identify entities such as people, organisations, concepts, or keywords from text data, 45 which can then be used as input for co-occurrence analysis.Important topics or themes in a corpus can be revealed by identifying frequent co-occurring entities.Another approach combines IE and co-occurrence analysis by comparing the results of both techniques.By doing so, similarities and gaps between important concepts and their relationships identified by IE and co-occurrence analysis can be identified.This comparison could validate the importance of the identified concepts and relationships and provide additional insights into the field of interest.For instance, if certain keywords extracted from the full text of publications using IE techniques are not frequently cooccurring in the science mapping analysis, this may suggest that they are less important in the field Figure 5.
On the other hand, if certain terms frequently co-occur in the science mapping analysis but are not identified as entities using IE, this may indicate that they are  underrepresented in the literature and require further investigation.Therefore, IE and co-occurrence analysis can be combined to understand a given corpus comprehensively.This approach provides a valuable tool for researchers to identify important concepts, their relationships, and knowledge gaps, aiding decision-making, hypothesis generation, and future research directions.

Text/document clustering and science mapping
In the context of bibliometric research, there may be variations in the approaches used in science mapping and text/ document clustering, depending on the analysis source and specific techniques applied to the data source, input and output representation, and dimensionality. 14,46Typically, science mapping clustering entails pre-processing metadata with multiple fields of publications, authors, or keywords, resulting in science mapping clusters that are organised into a network.On the other hand, text/document clustering involves pre-processing the text to create a vector representation of the documents with multiple features or terms, producing clusters of documents based on their similarity.Comparing these two approaches can offer valuable insights into the relationship between the topics discussed in specific fields and the language employed to discuss those topics.This comparison can identify fields where the terminology used may be inconsistent and ambiguous, where there may be disconnects between the key concepts identified by the two clustering methods, and where they may complement each other.For example, if science mapping and text/ document clustering yield similar results, it may indicate that the underlying intellectual structure of the field is well captured.Conversely, if the two methods yield different results, it may suggest that one or both methods may need refinement and re-evaluation.Thus, a latent research method is to compare the clustering results obtained from science mapping and text/document clustering to identify similarities and gaps between them and provide insights into potential areas for improvement in research methodologies.

Text summarisation and science mapping
Science mapping aims to examine a large amount of scientific literature to reveal trends, patterns, and connections between various research fields.However, these results can be intricate and challenging to interpret, especially for those not experts in the field. 5Text summarisation can be utilised in science mapping to concisely summarise key insights and findings relevant to co-citation analysis, bibliographic coupling, and cooccurrence analysis.As mentioned above, previous studies have classified text summarisation into two main approaches: extractive-based summarisation and abstraction-based summarisation. 47In the case of co-citation and bibliographic coupling analysis, information extraction could be applied by first extracting the abstracts, introductions, conclusions, or full texts of the most frequently cited publications from the science mapping dataset and reference list.Then, text summarisation algorithms could be employed to identify the most relevant sentences or phrases, which could be used to create summative paragraphs for science mapping analysis documents and objects.In the case of co-word analysis, the keywords or phrases from the results could be selected and combined to summarise the main themes and trends in the analysed field.Another approach to achieving summarisation is abstraction-based summarisation.In the case of co-citation and bibliographic coupling analysis, this approach could be implemented by using natural language processing techniques to recognise the most crucial concepts and relationships between all the cited publications in the science mapping dataset and reference list.A summary could then be generated based on these concepts and relationships.In the case of co-word analysis, a summary could be generated by synthesising new sentences that describe the co-occurrence analysis results, capturing the main themes and trends.
Moreover, the advent of generative AI, particularly large language models (LLMs) like GPT and Claude, has revolutionised text summarisation within science mapping.LLMs can process extensive corpora of scientific literature, identify salient points, and generate coherent, comprehensive summarisations.Unlike traditional extractive techniques, LLMs can generate novel text that captures the essence of the original content without merely extracting parts of it.This approach makes them especially suitable for abstractive summarisation, where understanding and conveying complex ideas in new ways is key.Moreover, LLMs can be fine-tuned to the domain-specific language of scientific discourse, thus enhancing their ability to interpret and articulate the nuances of academic research.For cocitation and bibliographic coupling, LLMs can discern the subtleties in the relationships between documents, offering a summary and a conceptual mapping of the scientific terrain.They can recognise emergent patterns, track the evolution of themes over time, and detect nascent areas of interest before they become prominent.LLMs take on an even more creative role in the realm of co-word analysis.They can weave keywords and phrases identified in co-occurrence analysis into a cohesive narrative revealing prevailing dialogues within a field.The AI does not merely list the terms but contextualises them within the broader research landscape, offering a narrative that can guide researchers through the thematic structures of their domain.

Topic modelling and science mapping
In contrast to text summarisation, which aims to reduce the content of documents while preserving important information, topic modelling aims to uncover the latent themes within a collection of documents. 48Once the science mapping results are obtained, topic modelling techniques can be deployed to identify the underlying topics or themes within the network.This approach involves identifying groups of publication datasets or reference lists that are frequently co-cited together and assigning them to a topic or theme based on the words used in those papers' titles, abstracts, or full texts.The resulting topic model can provide valuable insights into the key concepts and ideas driving research in publication datasets and the relationships between different research fields and sub-fields.In the case of co-word analysis, the input data would consist of co-word results, and applying topic modelling on this co-word matrix can help identify groups of keywords that frequently co-occur together, which may indicate underlying topics or themes within the documents' keywords.
Furthermore, topic modelling can identify words that act as "bridge words" or "pivot words" between different topics or themes in documents.These words connect multiple topics or themes and can be useful in identifying the relationships between different research areas based on the science mapping patterns of scientific publications.Science mapping can identify the key topics and research areas linked together by identifying frequently occurring bridge words from different research fields.

Text sentiment and science mapping
Sentiment analysis is an important aspect of text analysis that aims to determine the polarity of the sentiment expressed in a text, whether positive, negative, or neutral.Beyond the existing research, sentiment analysis can also be used to infer whether a particular topic is outdated or no longer relevant based on the sentiment expressed in recent texts. 49Combining sentiment analysis with science mapping can provide valuable insights into the relevance and popularity of different research fields and topics.By applying sentiment analysis to the science mapping result, researchers can determine the sentiment of the words or phrases used in the titles, abstracts, conclusions, or full text of the relevant fields and topics identified through cocitation, bibliographic coupling, and co-word analysis.This way can help identify shifts in sentiment over time, indicating changes in the relevance or popularity of certain topics.This information can be used to create indexes that guide future academic exploration and determine whether a particular field or topic is promising or outdated.Therefore, combining sentiment analysis with science mapping can provide a more comprehensive understanding of the trends, patterns, and relationships within a particular field of research.

Conclusion
This study discusses using bibliometrics and science mapping analysis in contemporary research.In order to investigate the status quo, 37 high-quality publications from 18 prestige journals were selected for analysis, in which the publication year and citation are the most commonly used bibliometric metadata.Moreover, co-citation is the most frequently used science mapping technique, followed by the co-word analysis.The workflow for science mapping involves data collection and pre-processing, construction of the occurrence and co-occurrence matrix for similarity calculation, and enhancements to extract insights based on the similarities.Furthermore, this study also introduces five text mining techniques, including text summarisation, sentiment analysis, topic modelling, information extraction, and text/document clustering, which can be combined with science mapping for greater insights.Last but not least, the future opportunities to integrate text mining and bibliometric analysis are outlined through mapping their recent development.

Figure 2 .
Figure 2. Distribution of the selected papers over prestige journals.

Figure 1 .
Figure 1.The inclusion-exclusion flowchart of the review protocol.

Figure 3 .
Figure 3.The framework of the science mapping process.
Remark: (a) Information extraction, (b) text/document clustering, (c) topic modelling, (d) text summarisation, (e) sentiment analysis;3: Complete exploration, ⵔ: No/limited exploration, 7: Not applicable future research agendas.The future research applications of text mining in bibliometric analysis are shown in Figure 4.

Figure 4 .
Figure 4. Number of selected papers over the publication years.

Figure 5 .
Figure 5.The applications of text mining in science mapping.

Table 1 .
Review protocol for the bibliometric analysis.

Table 2 .
Areas and usage of bibliometric data.

Table 3 .
Areas and usage of the science mapping methods.

Table 4 .
Mapping results of the text mining and bibliometric analysis.