Probability model of sensitive similarity measures in information retrieval

In today’s Internet age, a lot of data is stored and used, which is very important. In people’s daily life, if these data are sorted, information retrieval technology will be used, and in information retrieval, some information retrieval inaccuracies often appear. Information retrieval model is an important framework and method for fast, complete, and accurate user information retrieval. With the rapid development of information technology, great changes have taken place in people’s production and life. Various information network technologies are widely used in people’s lives. The resulting flow of information shows explosive growth, information retrieval. User requirements are getting higher and higher. How to complete personalized information retrieval in a large amount of mixed information, so that retrieval technology can help us obtain effective retrieval results, has become a realistic problem worth exploring. In this article, the application of probability model based on sensitive similarity measure in information retrieval model is analyzed, and a similarity measure algorithm based on spectral clustering is proposed. By improving the similarity measure, the sensitivity problem of scale parameters is overcome and the retrieval precision is improved. In order to better reflect the superiority of the proposed algorithm, this article compares with ng-jordan-weiss (NJW) and deep sparse subspace clustering (DSSC) algorithms. The experimental results show that the proposed algorithm is superior to NJW and DSSC algorithms for different data sets in different evaluation indicators (Rand and F-measure).


Introduction
With the rapid development of the information age, data processing technology has been widely used in people's lives. As a common data processing technology, 1-3 information retrieval technology is the main way for users to query and obtain information and also the method and means to find information. The narrowly defined information retrieval refers only to information retrieval, 4,5 that is, the customer takes a specific method according to the needs and uses the search tool to find out the search process of the required information from the information collection. The generalized information retrieval is a process in which information is processed, organized, and stored in a certain way, and then the relevant information is accurately found according to the specific needs of the information user. Information retrieval is the field of research in library and computer science, with the aim of providing a faster, more accurate and more complete search method. Information retrieval, especially text retrieval, has become one of the most influential search tools. On the Internet, it helps people around the world easily access a variety of information at almost no cost. This informational search has provided a powerful fuel for human economic, cultural, and technological development. With the rapid development of the Internet, digital cameras, multimedia, and the popularity of the Internet, people are now increasingly immersed in online search for information, and image queries are becoming indispensable.
Today's society is rapidly developing in the era of networking and informationization. Computer network technology 6 has become one of the most influential technologies in the world and society. It covers a wide range of areas, covering almost all social and economic fields, and has been well applied in people's livelihood and military. Therefore, its development and application are crucial and have become synonymous with a long period of time. In countries around the world, "machine substitution" is the mainstream trend in manufacturing. Not only that, but also widely used in education, finance, medical, transportation, security, electricity, and many other fields, reflecting the huge application advantages and market potential. Unfortunately, the robot industry should be a high-end manufacturing industry, but China's robot industry still has not got rid of the strange circle that can only participate in the low-end field of high-end industries. At the same time, the development of the robot industry lacks top-level design, which undoubtedly affects the development of the robot industry. Information retrieval is inseparable from network technology and is also based on network technology. However, the data often contain some sensitive elements. If these sensitive elements are directly published or shared, it will cause leakage of user privacy. Therefore, we must consider how to accurately handle sensitive data 7,8 in a large number of data.
Similarity measure is a measure that comprehensively assesses the similarity between two things. The closer the two things are, the greater their measure of similarity and the more alienated the two things, the smaller their measure of similarity. The similarity measure 9,10 has a wide variety of methods and is generally selected according to actual problems. Commonly used similarities are correlation coefficient [11][12][13] (measuring the proximity between variables) and similarity coefficient 14 (measuring the proximity between samples). If the sample gives qualitative data, then measure the proximity between the samples, the matching coefficient, consistency, and so on of the available samples. To quantify things by quantitative methods, we must use quantitative methods to describe the degree of similarity between things. A thing often needs to be characterized by multiple variables. For example, if a group of sample points described by p variables are classified, each sample point can be regarded as a point in p-dimensional space. It is natural to use distance to measure the similarity between sample points. The information retrieval model (IRM) is the use of mathematical language and tools to translate and abstract information and its processing in information retrieval into a mathematical formula. It is determined in three aspects: (1) the perspective of processing query formulas and documents, (2) the theory of dealing with query formulas and document relationships, and (3) the algorithm between query formulas and documents. The IRM uses mathematical or other language and tools to framework and method for representing and calculating the main elements of information retrieval and the degree of matching between them. Experts and scholars in related fields have been studying a more suitable search model and retrieval method. Since 2000, many experts and scholars have carried out research and research on IRMs. Information retrieval theorists have proposed a large number of IRMs. At present, models such as Boolean vector space [15][16][17] and traditional probability are accepted widely.
We have a preliminary understanding of sensitivity metrics and probability-related knowledge. Therefore, based on the above analysis, according to the probability model and the similarity measure of sensitivity degree, a spectral clustering [18][19][20] algorithm for improving the similarity measure is proposed, which improves the similarity measure. It overcomes the problem of sensitivity to scale parameters, improves clustering accuracy, and achieves a good clustering effect algorithm to improve the probability theory model and the accuracy of information retrieval. In order to better study the probabilistic model of sensitive similarity measure in information retrieval, this article will introduce the information retrieval, information retrieval probability, and similarity measure algorithm in detail.

Development of information retrieval technology
Information retrieval technology has experienced the development of early information retrieval 21,22 technology to computer modern retrieval technology. Before the computer retrieval technology was produced, the information retrieval technology generally experienced the development stage from the complete manual retrieval system ! semi-mechanical retrieval system ! electromechanical and photoelectric retrieval systems. Before the 1940s, the way of information retrieval was mainly manual retrieval, using some search tools 23,24 such as books, indexes, abstracts, and so on, which were arranged by literature attributes such as classification, subject words, and authors to find the required documents. More representative such as library catalog cards and some well-known search journals such as CA, BA, SCI, IM and China's Zhongmu, foreign orders and so on. Although the manual retrieval method is convenient, flexible, and easy to use and master, the retrieval speed is slow, the reliability is poor, the retrieval efficiency is susceptible to external influences, and the multipath and multi-angle search literature cannot be simultaneously performed. Therefore, the quality and quantity of services that are manually retrieved are inefficient. In order to eliminate these limitations, it is necessary to develop new methods of detection, new retrieval equipment, and establish a more complete system of inspection. In this context, semi-mechanical retrieval methods and electromechanical and photoelectric retrieval methods were gradually developed in the 1950s and 1960s. The semimechanical method is represented by the edge perforation card method and then the overlap-to-hole card retrieval method. Their essence is a hand-checking perforation card system, which uses such a retrieval tool. Although a certain degree of multivariate search and the combination of subject concepts can be completed, the search efficiency and retrieval time are improved compared with the full manual search method. The actual retrieval speed is still not high, and the retrieval process is mainly based on manual operation. As technology advances, various mechanical searches have been developed. Things always move forward, and information retrieval methods are constantly evolving as the various movements in the retrieval system change. There is a contradiction between the vast literature and information resources and people's specific needs. This is the fundamental contradiction in information retrieval. It promotes the development of information retrieval theory and technology methods. On the one hand, it is the "explosion" and "pollution" caused by the huge increase in knowledge and information. On the other hand, people ask for accurate, convenient, and convenient information to find their own useful information. This has led to changes in information retrieval technology. In 1946, the United States successfully developed the world's first electronic computer ENNIC. In 1949, the United States made the second generation of transistor computers. In 1964, IBM built a third-generation integrated circuit 25 computer. In 1970, the fourth generation of large-scale integrated circuit computers such as IBM-370 came out. In 1971, Intel made the world's first commercial microcomputer. In 1970, computer networks emerged, and the development of computers became a new category of technology. This was information technology and led to the information revolution. 26 It is in the context of the rapid development of computer technology that computer retrieval technology is ushered in.
In the development of information retrieval, we learned that it has experienced multiple stages of development, and the trend of development is becoming more and more intelligent. Today, with the rapid development of science and technology, there are more and more information retrieval object, including not only text information such as documents and data but also media information 27,28 such as graphic images, sounds, and videos. These are the categories of information retrieval research. Nowadays, information retrieval has realized the development from network to intelligence. The object of information retrieval has been a long-term improvement from the previous closure to the present, from the previous stability and consistency to the current dynamic and wide distribution. As the Internet becomes more popular, the amount of information resources we need to face is increasing. If you want to get the information you need in the shortest amount of time, it will bring great difficulties to computer information retrieval. But with the development of technology, this is absolutely achievable. Figure 1 shows the framework of the intelligent information retrieval form.

Principles of information retrieval probability model
Its application is based on four related principles: the principle of related mind independence, the independence of words, the relevance of literature, and the principle of probability ordering. Based on probability theory, 29,30 the model builds a probability model for documents and queries and calculates the similarity between documents and queries based on the model. The probability model is based on the distribution of question keywords in related and unrelated documents and is represented by the weight of the keywords. The query results are sorted according to the sum of the weights of the keywords that meet the question. The probability model is a model that is simple to implement and works well. It is assumed that both the document D and the user query Q can be represented by a binary term vectorx ¼ ðx 1 ; x 2 ; Á Á Á x n Þ. If the term T i 2 D, then x i ¼ 1, otherwise x i ¼ 0, while assuming two mutually exclusive events, such as W 1 : The document is related to the user query and W 2 : The document is not relevant to the user query. By calculating P ðW 1 =xÞ or PðW 2 =xÞ of the document, it is possible to determine the relevance of the document to the user query. For discrete distributions, you can use the Bass formula and simplify it to get the function between the document and the user query where p i ¼ r i =r and q i ¼ ðf i À r i Þ=ðf À rÞ, f denotes the total number of documents in the training document set. r represents the number of documents in the training document set related to the user query. F i represents the number of documents containing the term T i in the training document set. R i denotes the number of documents containing the term T i in the r related documents. In order to improve the description probability of the ideal result set, the system needs to interact with the user.

Similarity measurement algorithm
Similarity metrics use quantitative methods to classify things, and quantitative methods must be used to describe the degree of similarity between things. A thing often needs to be characterized by multiple variables. If a group of sample points described by p variables are classified, each sample point can be regarded as a point in the p-dimensional space. One analytical method often used for similarity metrics is cluster analysis. Cluster analysis is a method of group analysis using the principle of "objects are clustered" and is a far-reaching statistical analysis method for classifying samples and indicators. The traditional spectral clustering algorithm usually uses the Gaussian kernel function as the similarity function. Because the algorithm is very sensitive to the kernel parameters, it is difficult to determine a suitable scale parameter. In order to solve this problem, the spectral clustering algorithm is given by improving the similarity function. Spectral clustering algorithm is a new type of clustering algorithm proposed in recent years. Different from the traditional clustering algorithm, the spectral clustering algorithm obtains the optimal result by solving the optimal partition of the graph. The advantage is that it can be applied to sample space of arbitrary shape and can converge to global optimal solution. The spectral clustering algorithm is widely used in image processing, computer vision, text mining, machine learning, and other fields. Spectral clustering algorithm is also a hot spot in the field of machine learning research. The similarity function is the focus of current research on spectral clustering improvement. The spectral clustering algorithm is based on the theory of spectral partitioning, and the data clustering is regarded as the graph partitioning problem. The essence of the graph partitioning problem is the approximation of the graph partitioning criterion. The optimal solution of graph partitioning is an non-deterministic polynomial (NP)-hard problem. Think of all the data samples as fixed vertices V in the undirected weighted graph G ¼ ðV ; EÞ space, which can be connected by edges. The weighted edge E ¼ ½W ij is represented by the similarity between the ith vertex and the jth vertex. The similarity matrix is defined as: if i 6 ¼ j, W ij ¼ expðÀdðx i ; x j Þ 2 =s 2 Þ; otherwise W ij ¼ 0. The similarity matrix W contains all the information needed for clustering. By segmenting the graphs composed of all the data points, the weights of the different subgraphs after the graph segmentation are as low as possible, and the edge weights in the subgraphs are as high as possible, so as to achieve the purpose of clustering. The clustering problem is solved by the multipath segmentation problem solving the undirected graph, and the original problem is transformed into the spectral decomposition of the similar matrix or Laplacian matrix.

Improve similarity measure
Let G ¼ ðV ; EÞ be an undirected graph with a vertex set of V and an edge set of E. Think of the data point as the vertex on the graph G, giving the definition of the manifold distance between the two vertices as R ij . Definition 1. Any two vertices p 0 and p 1 on the graph G, there is a vertex sequence r ¼ ðp 0 ; p 1 ; Á Á Á p l Þ indicating a path of length l connecting p 0 and p 1 on the graph G, where p k 2 V ð0 k lÞ and ðp k ; p kþ1 Þ 2 ð0 k lÞ. Let R ij denote the set of all reachable paths connecting the two data points p i and p j on the graph G. The manifold distance between the vertices p i and p j is shown below where distðp k ; p kþ1 Þ represents the Euclidean distance between data points p k and p kþ1 . The scaling factor r (r > 1) is a tunable parameter.
Definition 2. In order to standardize the similarity value between 0 and 1, to improve the density-sensitive distance in the DSSC algorithm, the improved manifold distance measurement function is defined as follows Definition 3. Weighting factor for each data point Definition 4. Improved manifold distance similarity function Equation (5) satisfies the characteristics of nonnegative, reflexive, symmetrical, and triangular inequalities and satisfies the global consistency clustering hypothesis and the local consistency clustering hypothesis.

Algorithm steps
The time complexity of the algorithm is Oðn 3 Þ (n is the number of data points in the data set). The algorithm steps are as follows:

Experimental results and analysis
In order to verify the effectiveness of the proposed algorithm, the NJW and DSSC algorithms were compared with the artificial data set and University of California Irvine (UCI) data set, respectively. In order to evaluate the performance of each clustering algorithm, this article uses the evaluation indicators to measure.

Evaluation indicators
Due to the data set in the UCI database, the number of clusters and the correct classification of each data point are known. Therefore, it is only necessary to use clustering indicators of external metrics to evaluate the effectiveness of the clustering results for these data sets. The two clustering indicators implemented in this article belong to the external measurement method. After implementing these two effectiveness indicators, the clustering results of a certain data set in the UCI database can be measured to match the preknown structure. Evaluate the pros and cons of different clustering algorithms for clustering results of the same data set. In the clustering performance evaluation method, the validity index can find the partition with the best number of clusters. In order to evaluate the correctness of the clustering results, this article gives a comparison of the Rand index and the F-measure index. These two statistics can be used to calculate how similar the two clustering results are to the expected results.
Rand indicator. The Rand indicator is a commonly used evaluation indicator for clustering results. It is used to measure the degree of agreement between clustering results and external standard classes of data. Each sample is either divided into the same class or different classes. Among the clustering results, the data originally belonging to the same class still belong to one class in our clustering results. Data that do not belong to the same class still do not belong to one class in our clustering results. The accuracy is equal to the ratio of the correct matching logarithm to the total matching logarithm, that is, RI ¼ the correct total number of matches/total number of matches, which is Let C denote the actual category information; K indicate the clustering result, where r stands for the logarithm of the same class in both C and K; and s denotes the logarithm of the different classes in C and K. The sum of r and s is the number of elements that are correctly divided. q denotes the logarithm of elements belonging to the same category in C but not belonging to the same category in K, and t denotes the logarithm of elements belonging to the same category in C but belonging to the same category in K. g þ s þ r þ 1 stands for the logarithm of the elements that can be formed in the data set. The Rand index value is between 0 and 1.
Input: Data set X, number of clusters k, number of neighbors k 0 Output: Division of the data set: Step 1: Calculate the Euclidean distance for any two points x i , x j in the data set dist ij ¼ ðjjx i À x j jj 2 Þ 1 2

=
Step 2: Construct a Laplacian matrix, where the diagonal matrix is D and LS ij is calculated by the formula (3). When i ¼ j, LS ij ¼ 0.
Step 3: Feature decomposition: calculating the feature vectors v 1 ; v 2 ; Á Á Á v k corresponding to the k largest eigenvalues of the matrix L, and constructing the matrix ½v 1 ; v 2 ; Á Á Á v k 2 R n Â k .
Step 4: Normalization processing: unitize the row vector of V to Step 5: Consider each line y i of the matrix Y as a point of the R k space, and use the k-means algorithm or another algorithm to calculate, and obtain k clustering results: Step 6: If the i-th row of Y belongs to the j-th class, the original data point x i is also classified into the j-th class.
The larger the Rand index value, the greater the similarity of the data in the cluster and the higher the degree of agreement between the two partitions. A value of 1 for the Rand indicator indicates that the two divisions are identical. The Rand index, that is, a cluster structure of the data set is C ¼ C 1 ; C 2 ; Á Á Á C m f g , and the data set is known to be divided into P ¼ P 1 ; P 2 ; Á Á Á P s f g . Let a denote if the two points belong to the same cluster in C, and the number of the same group in P. b indicates whether two points belong to the same cluster in c but the number of different groups in P. c represents that if two points do not belong to the same cluster in C, then P belongs to the same group. d stands for that if two points do not belong to the same cluster in C and the number of different groups in P, then a þ b þ c þ d ¼ M is the maximum number of all pairs in the data set, that is, M ¼ N ðN À 1Þ= 2, where N is the total number of points in the data set. The degree of similarity between C and P can be defined by the following validity index as Rand index R ¼ ða þ dÞ=M.
F-measure indicator. F-measure is a combination of two indicators: Precision and Recall. In order to accurately describe the evaluation index, the number of data points in different cases is represented by variables (take the classification of Iris data sets in the UCI database as an example), as shown in Table 1 The total F-measure is F ¼ where jij is the number of all objects in the category i (actual category). The clustering results of the affinity propagation (AP) algorithm to the Iris data set in the UCI database are shown in Table 2.
As can be seen from Table 2, the clustering results of the AP algorithm on the Iris data set, the 50 data points actually belonging to the Setosa class, are correctly clustered into the Setosa class. There are actually 50 data points in the versicolor class, 45 are correctly clustered into Versicolor, and 5 points are incorrectly clustered into the Virginica class. There are 50 data points actually belonging to the Virginica class, 43 points are correctly clustered into the Virginica class, and 7 points are incorrectly clustered into Versicolor.
The calculation of the F-measure indicator should first calculate the accuracy and recall rates as P ði;jÞ ¼ n ij n j and R ði;jÞ ¼ n ij n i , respectively. Then calculate the F-measure indicator, the indicator is Fði; jÞ ¼ 2ÂPði;jÞÂRði;jÞ Pði;jÞþRði;jÞ . In the formula, n i is the number of data samples contained in cluster i in the clustering result, n j is the number of data samples contained in cluster j, and n ij is the number of data samples that should belong to cluster j but are incorrectly divided into cluster i. The larger the F-measure indicator value, the better the clustering performance.

Artificial data set
The experiment was carried out on the artificial data set, and then the experiment was evaluated for the Rand index. The kmeans algorithm in all spectral clustering algorithms takes the best results from 100 iterations. It is suggested in the previous experiment that the optimal solution can be obtained in (1,60). Therefore, the parameters of DSSC and the parameter p of the algorithm in the following experimental comparisons are in the range of [2,60] which are compared with other algorithms. The data set information is shown in Table 3. In order to make the comparison of the algorithm more obvious, the Rand evaluation index comparison chart of the three algorithms is listed below (see Figure 2), and the F-measure evaluation index is shown in Figure 3.
From the comparison diagrams of Figures 2 and 3, it can be found that the performance of DSSC is slightly worse on the three circles of 20 data sets, and the performance of NJW and the algorithm of this article are good. This shows that all the spectral clustering algorithms have obvious clustering effect on the convex data set, and whether the similarity measure method has no direct influence on the clustering effect. The method of this article is slightly better than the NJW algorithm on the Size5 data set. The DSSC performs best on the squarel data set. It can be seen that the DSSC is not sensitive to parameter changes and the algorithm is unstable. The DSSC results on the square4 data set are better than NJW and the algorithm in this article. The above comprehensive analysis shows that the proposed algorithm is better than DSSC. This is because the algorithm of this article can fully exploit the global characteristics of the data and can well handle outliers such as noise points, which can be better applied to the data sets of convex and manifold. Therefore, the algorithm of this article is better than NJW algorithm and DSSC algorithm.

UCI data set
In order to further verify the effectiveness of the proposed algorithm, the UCI data sets Glass, Wine, Iris, and Vehicle are selected. NJW, DSSC algorithm, and the algorithm of this article are compared. These data sets have category label experiments, and the clustering effect is more clearly contrasted with the expected effect. Table 4 lists the basic information of these four UCI data sets. Figure 4 shows a comparison of the Rand evaluation indicators for the three algorithms. Figure 5 shows a comparison of the F-measure evaluation indicators of the three algorithms. The comparison shows the best clustering effect on the data set. It can be seen from Figures 4 and 5 that the DSSC algorithm has the largest F-measure evaluation value on the Glass data set, but the Rand evaluation value is lower than the algorithm in this article. The DSSC algorithm performance on the three data sets Iris, Wine, and Vehicle is not as good as the algorithm in this article. From the overall situation of the UCI data set, because the algorithm of this article improves the manifold distance measure, it fully exploits the various intrinsic links between data points. Therefore, the algorithm of this article can find the optimal solution under two kinds of evaluation indicators, and the algorithm is relatively stable, which is better than NJW and DSSC algorithms.
Through the comparison of the experimental results of the above artificial data set and UCI data set, the improved     manifold distance spectral clustering algorithm proposed in this article has achieved good clustering effect. Considering the global and local consistency, it fully reflects the data. The spatial characteristics, good robustness, stable algorithm, and good processing of outliers such as noise points make the similarity of theoretical calculations more consistent with the real situation and better clustering performance.

Conclusion
Because the similarity measure is very important for the clustering effect of spectral clustering, the traditional clustering parameter sensitivity and multi-scale problem cannot get good clustering effect. The existing DSSC algorithm is not stable enough. The spectral clustering algorithm of similarity metrics overcomes the problem of sensitivity to scale parameters by improving the similarity measure, improves the clustering accuracy, and achieves a better clustering algorithm than the DSSC algorithm. It can not only handle the data set of the convex distribution but also the data set of the manifold distribution, with good robustness and better performance. The time complexity of the algorithm in this article is O(n 3 ), which is the same order of magnitude as NJW. How to reduce the computational complexity and better application to big data is the focus of the next phase of research.
As the amount of information in various fields continues to increase, the demand for information retrieval is increasing. Traditional information retrieval methods are gradually being replaced by intelligent information retrieval systems. Intelligent information retrieval satisfies people's needs for information diversification and is conducive to improving the efficiency of information retrieval. The intelligent information retrieval technology based on the Semantic Web enhances the ability of computers to recognize natural language and accelerates the realization of knowledge representation and acquisition. However, in many computer information retrieval processes, due to the use of natural language indexing and retrieval, inaccurate queries may occur. Especially in the era of Internet information, search demand is gradually difficult to meet people's growing demand for information retrieval. There are still the following problems. (1) Content problem: At present, network information resources are becoming more and more abundant, whether the retrieved content is accurate, and whether the network information resources of the query can be displayed, which is a problem. When we search for information, it is common to search for content that does not meet our requirements. Therefore, in order to increase the amount of retrieval and ensure the singularity of the query method, a lot of work needs to be done. (2) Object problems: In the process of information retrieval, the information retrieval needs of different people are different. How to classify these requirements to personalize the user's use and also ensure accuracy, these are the objects that need improvement.
In response to the above problems, we propose corresponding countermeasures. (1) Language intelligence: The so-called "smart intelligence," that is, when we input keywords into the information retrieval system through natural language, we can search processing and ambiguity analysis and assist the query at the knowledge level or concept level. Through the system to give us some intelligent tips, we can help us get the best results. (2) Content specific: In an information retrieval system, the ability to analyze content needs to be improved. In this process, information that is not related to the search content should be screened out. This not only makes the title and the full text a search point but also searches by sound, image, and the like. (3) Figure 5. UCI data set F-measure evaluation index comparison chart.
Technology intelligence: Nowadays, some intelligent retrieval technologies have emerged in China, including not only automatic indexing, automatic summarization technology but also intelligent technology such as automatic tracking and automatic roaming. These search techniques are gradually being improved and optimized. In recent years, concepts such as "smart browsers" and "knowledge sharing agents" have been proposed. With the in-depth study of the IRM, we find that each retrieval model has its own characteristics, advantages, and deficiencies. Their development is not synchronous but complementary. In addition, many models are in the active stage of exploration and experimentation, and the development of each model is not the same due to the different scope of application. The general development trend of modern network information retrieval technology is to develop in the direction of multifunctionalization and intelligence, to adapt to the transformation of information organization from structural to unstructured, so as to meet the requirements of people's information acquisition and utilization to the utmost extent. Although search technology has developed rapidly in all aspects, there are still many problems in information retrieval technology in the network environment. For example, the object feature is automatically drawn and taken. Based on multi-similar feature indexing, query, retrieval, and other issues, the ontology theory derived from the field of knowledge engineering and artificial intelligence can well handle natural language understanding problems and language inference mechanisms. It is the hot issue of information retrieval in the current web environment. As information service personnel, we should constantly track and master the latest developments in modern information technology and should have a strong sense of technology promotion, make full use of modern information technology to carry out work, and make information services for the whole society.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China under grant number 51605061 and the Science and Technology Research Program of Chongqing Municipal Education Commission under grant numbers KJ1706172 and KJ1706162.