Prediction of coal and gas outbursts by a novel model based on multisource information fusion

As one type of geologic disaster, coal and gas outbursts seriously threaten safe production in coal mines, restricting the sustainable development of the mining industry. However, coal and gas outbursts are difficult to forecast due to their uncertainty and the limitation of sample size, which affect the accuracy of the traditional prediction methods to some extent. Therefore, this study developed a novel model based on multisource information fusion to realize the predictive progress of coal and gas outburst disasters. Through the application of Dempster-Shafer theory, a method of multisource information fusion, the proposed model combined the results of different forecasting approaches, including conventional techniques and an emerging method based on artificial intelligence. To enhance the performance of the established model, this study improved Dempster-Shafer theory and verified its effectiveness in dealing with highly conflicting information. We then applied this model to the No. 3 coal seam of the Xinjing coal mine, Shanxi Province, China. The fused prediction accurately reflected the situation of outburst hazards and showed good compensation for false prediction. An analysis of the results concluded that the model based on multisource information fusion increases the credibility of the forecast, which might provide technical support for safe coal mine production.


Introduction
Coal and gas outbursts (also referred to as outbursts) are geohazards that involve a sudden and violent disintegration of broken pulverized coal and gas from a coal mine working face to the roadway, permeating most coalbeds (Kissell and Iannacchione, 2014;Lama and Bodziony, 1998;Zhao et al., 2014). When mixed with air and ignition, serious blowouts will produce gas explosions and even fire accidents, therefore triggering fatalities and considerable economic losses. In China, there is a wide distribution of mines vulnerable to outburst disasters because of the unique and complex geological conditions in China, requiring specific and effective prevention and early warning measures to curb outburst disasters (Chen et al., 2019;Jia et al., 2017).
For almost half a century, seam-mining countries have focused on outburst prediction, and many prediction methods have been proposed. D ıaz Aguado and Gonza´lez Nicieza (2007) investigated two different mines in the Asturias Central Basin, Spain and combined some predictive indices to establish an assessment of gas outbursts. The stochastic model approach of the Monte Carlo method applied by Wold et al. (2008) suggested that geomechanical variables, such as strength fields, also influence coal and gas outbursts. Islam and Shinjo (2009) found that gas emissions resulted from the lack of mine ventilation and the low atmospheric pressure. These approaches are generally divided into two categories: direct and indirect forecasting methods (Tang et al., 2016). Direct forecasting methods, also called index methods, directly contact the coal mass to measure some parameters that affect the occurrence of outbursts by drilling and exploring for the purpose of prediction of outbursts (Zhao et al., 2019). In terms of applicability in the field, these forecasting approaches are preferred in China. However, there are many complex nonlinear factors influencing coal and gas outbursts, and these indices are sensitive to different occurrences of dynamic phenomena during underground mining, thus possessing different critical values. What is more, in some particular cases, coal and gas outbursts occur even if the measured index value does not exceed the critical value. Therefore, in addition to the direct methods, there is also extensive research on indirect forecasting methods, which provide a new direction for the development of outburst prediction. Mathematical means have been frequently adopted in recent years as an indirect method in newly explored areas. These methods learn and analyse the correlation between the outburst occurrence and outburst factors after being trained by historical data.
Currently, artificial intelligence models are being applied to the mathematical methods of indirect prediction. Zhang et al. (2008) ascertained the relationship between the factors and hazards of coal and gas outbursts by multifactor pattern recognition and then predicted the hazard of each prediction unit. Wang et al. (2010) established a predictive model based on Bayesian discriminant analysis and attained reasonable results that completely coincided with the actual situation. Zhang and Lowndes (2010) applied a coupled artificial neural network and fault tree analysis model to the prediction of coal and gas outbursts, and the results indicated that the combined solution method could explicitly recognize model relationships between the geological conditions and the potential risk of outbursts. Li et al. (2017) used the initial gas flow in a borehole to determine the critical values of coal and gas outbursts based on Fisher's criterion and predicted the risk of outbursts by a novel linear and continuous method, indicating the proposed method better than the traditional prediction method.
These advanced single artificial intelligence methods have achieved valuable results in the field of outburst prediction; nevertheless, coal and gas outbursts are uncertain geological dynamic phenomena because of their complicated nonlinear systems. Consequently, there are still some inadequacies in practical application when adopting only single methods. For instance, the poor stability of individual models may lead to a high rate of false alarms in field monitoring applications. To address such shortcomings, researchers recommended information fusion methods for predictive systems. For example, Zhang and Yang (2015) established a risk assessment model for coalbed methane development based on multisource information fusion and then evaluated the gas emission risk of eight mining areas of the Luling mine according to the six-layer functional classification model of the fusion system.
Significantly, in multisource information fusion technology, the synthesis algorithm is the theoretical basis, and with the emergence of new methods based on statistical inference, artificial intelligence and information theory, a growing number of efficient methods are being applied to the synthesis algorithm. Among these newly emerging applications of mathematical methods, support vector machine (SVM) and Dempster-Shafer theory (D-S evidence theory) are also appropriate for prediction.
SVM, a promising machine learning technique that analyses data for classification and regression, performs very well in some small sample data processing, which may become more complicated by deep learning models; therefore, it is very suitable for gas data. SVM has been successfully applied in many research fields, such as text categorization (Manochandar and Punniyamoorthy, 2018), image classification (Jac Fredo et al., 2019;Rahmani et al., 2019), health care (Cai et al., 2002;Hua and Sun, 2001), and other engineering fields Liu et al., 2019;Zuo & Carranza, 2011). D-S evidence theory is a mathematical framework for analysing uncertain and partial information. D-S evidence theory has been used in a variety of studies and adapted in several fields, such as information fusion (Feizizadeh et al., 2014;Jiang et al., 2017), expert systems (Dymova et al., 2016;Qiu et al., 2018), image identification (Ghasemi et al., 2013;Haouas et al., 2019), and other engineering fields (Jiang et al., 2018).
Owing to the non-linearity and uncertainty of coal and gas outbursts, it is important to establish an accurate model based on particular geological features. Considering the aforementioned advantages and disadvantages of direct and indirect forecasting methods, we attempt to combine multiple methods and conduct the prediction of coal and gas outbursts based on multisource information fusion to improve the accuracy and reliability of the uncertain forecast. Hence, this paper studies the prediction of coal and gas outbursts in the Xinjing coal mine (in Shanxi Province, China) and proposes a new outburst prediction model, with a forecast in the Zhuxianzhuang coal mine as a validation in different geological conditions. The proposed model demonstrates its contributions in the following aspects.
(1) The model is capable of flexibly determining the degree of correlation between each factor and outburst occurrences according to the actual conditions of different mining areas by learning historical data, which greatly enhances the model adaptability.
(2) The improved D-S evidence theory resolves the failure of the traditional evidence theory when combining highly conflicting evidence, therefore improving the accuracy of convergence.
(3) The model succeeds in combining different forecasting methods as well as presenting a novel basic probability assignment from preliminary predictions. (4) By applying the proposed model to research areas, it is indicated that predictions can be made in these areas to provide early warning for safe mine production.
The rest of this paper is organized as follows. In the next section, we briefly introduce the geological background of the Xinjing coal mine, and the prediction methodologies used in this paper (i.e., direct forecasting methods, SVM, and D-S evidence theory) appear in this section. Then, we validate the improved D-S evidence theory and present the application of the proposed fusion-based predictive model as well as the analysis of the converging results. Finally, brief conclusions are made in the last section.

Geological situation
The Xinjing coal mine, located in the middle of the North China plate (Figure 1), appears as a southwest-dipping monocline with a few secondary folds. The multiple-periodic, low-order fold structures of the mine provide favourable storage conditions for coalbed methane (Wang et al., 2017). This coal mine has been identified as a mine that is prone to high gas outbursts, and the No. 3 coalbed is an outburst coal seam, characterized by a simple structure, small fractures, and mostly normal faults.
In general, the coal and gas outbursts in the No. 3 coalbed are the result of the comprehensive action of gas pressure, the in situ stress and the physical and mechanical properties of coal. Therefore, research on outburst prediction there typically focuses on the parameters related to these three aspects and our present study adopts such factors to predict coal and gas outbursts: the burial depth of the coalbed (the depth from the coalbed to the bedrock surface), gas pressure, gas content, initial gas emission rate, Protodyakonov coefficient of coal strength, gas emission amount, coal seam thickness, geological structure, pillar collapses, and coal structure. The experiments in this research will rank these factors according to their relationship with the condition of outbursts and then extract them as input datasets for the prediction model.  Wei et al., 2007).

Direct forecasting methods
The indices of the direct forecasting methods often consist of gas pressure (P), drilling gas inrush initial velocity (q), drill cutting desorption (Dh 2 ), drilling cutting weight (s), Protodyakonov strength index ( f ), initial gas emission rate (Dp), and comprehensive index (D and K) (Zhou et al., 2019). According to the provisions of the prevention of coal and gas outbursts (China's State Administration of Work Safety, 2009), the identification of an outburst-prone coal seam should first be ground on the field gas dynamic phenomenon. When there are few or no significant characteristics of the dynamic phenomenon, the identification is performed by the forecasting indices, as shown in Table 1.
In addition to using a single index, Wang (1994) proposed a forecasting method using a comprehensive index (including values of D and K). The possibility of an outburst of coal seams can be estimated by D or K D ¼ ð0:0075H=f À 3Þðp À 0:74Þ; where D and K are comprehensive indices recognizing whether there will be an outburst, H is the buried depth, f is Protodyakonov's coefficient of coal mass, P is the gas pressure, and Dp is the initial velocity of gas emission. Generally, the critical values of D and K are determined by specific mines corresponding to their observation data, and when there are no reliable measured data, the risk of outburst can be determined by the referential value, as shown in Table 2. In Table 2, if the value calculated at a point is greater than the corresponding critical value, it is considered a dangerous location at which outbursts occur.
Least squares support vector machine: Indirect forecasting method In practical applications of prediction, limited to sample size, we tend to pursue efficient inductive learning (learning by examples) methods to realize the automatic discovery of regularities in data and therefore apply these methods to new but similar data. Among these methods, SVM, developed from statistical learning theory, is a supervised machine learning algorithm categorizing the input dataset by constructing an optimal separating plane (Vapnik, 2000). This paper uses least squares support vector machine (LSSVM)-a modified version of SVM proposed by Suykens and Vandewalle (1999)-as one of the three preliminary forecasting methods. SVM was originally designed for binary classification and then gradually extended to multiclass classification, so the one-against-one method was one of the early and major implementations (Hsu and Lin, 2002), with its principle shown in Figure 2. Given two types of samples (represented by green circles and pink triangles in Figure 2), the task of SVM is to find the optimal classification hyperplane to separate the samples and create rules that will guide the classification of future samples. In Figure 2, hyperplanes P 1 and P 2 are all capable of segregating the classes well, but P is the optimal classification hyperplane that can maximize the distances-called margin-between the nearest data point of each class.
Assume that there is a training set x i 2 R n ði ¼ 1; . . . ; kÞ, and y i 2 fÀ1; 1g is the associated label for each vector x i . SVM aims to find the hyperplane P by the following linear equation where x 2 R n is the weight vector that determines the direction of P, and b is a constant influencing the distance between P and the origin. Suppose that the hyperplane ðx; bÞ is an optimal hyperplane, then As shown in Figure 2, the data points closest to the hyperplane satisfy the following equation These points are called "support vectors", and the sum of the two distances of different support vector classes to the hyperplane is 2=kxk, also known as the margin. Hence, the work of finding P is settled by the solution of x and b that meets equation (3) to maximize the margin, i.e., the objective is stated as the following optimization problem Then, the solution of equation (4) will deduce a model (equation (5)) corresponding to the optimal classification of hyperplane P To solve the convex quadratic programming of equation (4), the Lagrange function is introduced, and its dual problem (equation (6)) can be obtained where a ¼ ða i ; a 2 ; . . . ; a k Þ is the Lagrange multiplier. Therefore, an optimal saddle point can be computed by differentiating with respect to x and b. By substitution and elimination of variables, the dual problem of equation (4) can be found by s:t: After solving for the variable a in equation (7), the variables x and b can be calculated, and thus far, the final hypothesis is consequently turned into a linear combination of the training points with its discrimination function expressed as where k is the number of support vectors.
In linearly inseparable cases, a hyperplane with the smallest number of errors can be constructed by the concept of a soft margin. By introducing nonnegative slack variables e i , the constrained optimization problem in equation (4) becomes where C is a cost parameter penalizing the slack variables e i in the objective function. Therefore, the optimization can be solved through the application of the Lagrange multiplier in the same way as in the separable case (equation (7)) by transforming equation (9) into its dual problem, subject to 0 a i C; i ¼ 1; 2; . . . ; k. However, in most classification problems, there may not exist a hyperplane that divides two samples correctly in the primal space. To address the problem, Vapnik initiated kernel functions /, which can map the data from the original feature space to a higher dimensional feature space (/ðxÞ ¼ / 1 ðxÞ; / 2 ðxÞ; . . . ; / n ðxÞ ) where an optimal hyperplane can be found, achieving the solution by a linear algorithm. The kernel function Kðx i ; xÞ ¼ h/ðx i Þ; /ðxÞi realizes nonlinear transformation (equation (10)) by replacing the inner prod- Similar to the method for the separable case, after solving equation (10), the optimization model (equation (11)) can be attained LSSVM, as a modified version of the conventional SVM, transforms quadratic programming into a solution of a set of linear equations and substitutes equality type constraints for inequalities from the classic SVM approach, thus improving the model. LSSVM introduces least squares to SVM by expressing the classification problem where c is a regularization factor, like the penalty factor C in SVM, and e i the slack variable. Similar to the optimization problem of SVM, Lagrange multipliers a i are applied and the objective function (equation (13)) can be obtained Then the conditions for optimality are acquired by partially differentiating by x; b; e; and a. After eliminating the intermediate variables x and e, a i and b can be obtained by solving the linear equation . The final LSSVM becomes Kðx i ; xÞ is also a kernel function in equation (14). General kernel functions include the sigmoid kernel, the radial basis function kernel (Gaussian RBF kernel) and the polynomial kernel. When prior knowledge is absent, the radial basis function kernel performs better and is appropriate for samples in any distribution with sound generalization; therefore, this paper adopts it as kernel functions.

Dempster-Shafer theory and its improvement
Both direct and indirect methods have their own advantages. However, the range of the identification results of the three forecasting methods, i.e., the direct forecasting methods and LSSVM, may vary widely; thus, a comprehensive consideration based on the uncertainty reasoning method may be required to increase the reliability of coal and gas outburst prediction.
As one of the uncertainty reasoning methods, D-S evidence theory is becoming increasingly attractive. Compared with the theories of statistical inference, such as probabilistic reasoning, D-S evidence theory is more flexible in measuring indeterminacy and more concise in reasoning mechanisms and is closer to the general convention of human thinking, especially when handling unknown information.
Basic concept. Let H be a finite and exhaustive set of N mutually exclusive elements, indicated by H ¼ fh 1 ; h 2 ; . . . ; h i ; . . . ; h N g; where H is called the frame of discernment, and h i is one of the events in frame H (i ¼ 1; 2; . . . ; N:N is the number of events) (Shafer, 1976). All the possible subsets of H constitute the power set of H, represented by 2 H . Definition 2.1. 8A 2 H; a basic probability assignment (BPA) is a mapping (m) from 2 H to ½0; 1, formally defined by that also satisfies the following condition In Shafer's original definition, the condition mð;Þ ¼ 0 is required, which corresponds to the "closed-world assumption". However, in some recent papers, this condition is often omitted, corresponding to the "open-world assumption" (Jousselme et al., 2001). The rest of the paper considers the closed-world assumption.
We can regard a BPA as our belief at an observation time: mðÁÞ ¼ 0 represents no belief in a hypothesis, mðÁÞ ¼ 1 represents total belief, and mðÁÞ 2 ð0; 1Þ represents partial belief. A BPA with mð;Þ ¼ 0 is also known as a mass function (Liu, 2006), and an element of 2 H having a non-zero mass is called a focal element.
Definition 2.2. Assuming that m 1 and m 2 are two BPAs defined on the frame of discernment H, the combined BPA by Dempster's combination rule is thus defined by The case of lð;Þ ¼ 0 implies no conflict between m 1 and m 2 , while lð;Þ ¼ 1 represents that m 1 completely contradicts m 2 .
When there are n (n > 2) pieces of evidence, i.e., m 1 ; m 2 ; . . . ; m n are BPAs defined on H, the combined BPA m is formulated as follows The final orthogonal sum m represents our belief in the most likely object.
Improvement of the D-S evidence theory. In the traditional D-S rule, a wrong convergence contradicting reality may be yielded when lð;Þ ! 1, i.e., there is a presence of relatively higher contradiction between some pieces of evidence. Thus, some sophisticated alternatives to the conventional D-S evidence theory have been proposed and are categorized into combination modification (Lefevre et al., 2002) and source modification (Murphy, 2000). Combination modification argues that the reason for the conflict is the improper disposal of evidence when empty sets are present. However, source modification states that a disturbed evidence source triggers conflicts; therefore, there is a need to pre-process the evidence. This paper considers the two ideas, presenting a novel conflict measurement and adopting an optimized combination to solve the paradox that may occur when conflicting evidence exists.
Step 1. Measurement of conflict. The degree of conflict between two pieces of evidence can be described by the distance between two BPAs. In the direct application of the D-S evidence theory, lð;Þ, explained in the previous section, is generally regarded as the measure of conflict. The lð;Þ changes the value of BPAs in all the focal elements in both pieces of evidence to 1 À lð;Þ À1 times the original amount. However, the conflict is not the result of all the focal elements. Therefore, many scholars have studied the correlation between evidence and proposed the interevidence distance to measure both conflict and similarity, which mainly includes the following three distances.
Assume that m 1 and m 2 are two BPAs defined on the frame of discernment H. Then, the distance between m 1 and m 2 is denoted as (Jousselme et al., 2001) where D is a 2 N Â 2 N positively defined matrix, with its elements jAj denotes the cardinality of A, and D conveys the information that the distance dðm 1 ; m 2 Þ is a function concerning jAj, jBj, jA \ Bj, and jA [ Bj. The conflict between A and B can be measured by jA \ Bj: when jA \ Bj ¼ 0, there is no common element in A and B, then it is asserted that A strongly conflicts with B.
Pignistic probability transformation (PPT) (Ristic and Smets, 2006) is a probability transforming the BPA to probability. Let mðAÞ be a BPA on the D-S frame H ¼ fx 1 ; x 2 ; . . . ; x n g, and then its PPT on H is defined by where B & H. In addition, the PPT for the singleton is BetP m can also be generalized to power set 2 H (3) Liu's distance. Liu (2006) presented an alternative quantitative measure to reveal the relationship among beliefs using the difference of the PPT among evidence to describe the degree of conflict. Let m 1 and m 2 be two BPAs on frame H, and BetP m 1 ðAÞ and BetP m 2 ðAÞ be their respective PPTs. Then, the distance between betting commitments of the two BPAs can be given by where difBetP m 2 m 1 ¼ difBetP m 1 m 2 , and 0 difBetP m 2 m 1 1.
When there are n sources on H with their corresponding BPA (m 1 ; m 2 ; . . . ; m i ; . . . ; m n ), the PPT among these n sources is The above three methods are commonly used as the measurement of the distance between evidence, which consider the distance not only from the perspective of dissimilarity but also from consistency. Inspired by Jousselme, who viewed a BPA as a special case of vectors, many researchers adopt this geometrical interpretation of BPA when studying the distance (Wen et al., 2008); hence, we can measure the conflict among evidence using the cosine of vectors. Assume that there are n hypotheses on frame H, and every BPA of their evidence is a vector: m i ¼ ½m i ðA 1 Þ; m i ðA 2 Þ; . . . ; m i ðA r Þ; A i 2 PðHÞ; i ¼ 1; . . . ; n; where m i 2 Y PðHÞ, and Y PðHÞ is the vector space generated by elements of PðHÞ. Then, we can use cosðm i ; m j Þ ¼ hm i ; m j i=km i kkm j k to quantify the distance between m i and m j to estimate the conflict.
Here, this study presents a new measurement to illustrate the conflict.
Definition 2.3. Let there be two BPAs (m i and m j ) in the frame of discernment H. Then is called the parameter of conflicting degree between m 1 and m 2 .
In equation (26), dðm i ; m j Þ is the Jousselme distance, difBetP m j m i is the Pignistic distance, and cosðm i ; m j Þ is the cosine distance. Additionally, equation (26) satisfies the requirements for conflicting measurement (nonnegativity, symmetry, and boundedness).
Step 2. Calculation of the degree of support. Deng et al. (2004) proposed a method for expressing the weight of evidence according to the definition of the degree of support among the evidence; when the evidence is largely supported by other evidence, this evidence is more similar to other evidence and thus possesses a greater weight. In this part, we redefine the supporting degree of evidence with Deng's weight for reference: Definition 2.4. Let E i be the evidence on the frame of discernment H, given that the conflict between two BPAs is conf ðm i ; m j Þ, 8i; j 2 ½1; n. Then, the similarity between evidence i and evidence j is Considering the positive correlation between simðm i ; m j Þ and the similarity of evidence, the support degree of evidence can, therefore, be denoted by On a frame H, when a piece of evidence is notably against other evidence, it can be convinced that the information it provides is inadequate; consequently, the support degree of evidence supðm i Þ should be allocated relatively smaller with the purpose of eliminating or reducing the impact on convergence.
The relative support degree of evidence is then obtained by the normalization of the support degree where 0 normðm i Þ 1, obviously, and normðm i Þ represents the reliability of a piece of evidence supported by the whole evidence source.
Step 3. Modification of the evidence source.
Considering that highly conflicting evidence yields counterintuitive results, in this step, we rectify the evidence by their assistant measuring index. The modified BPAs are defined as follows where m 0 i ðA r Þ is the new BPA allocated to A r .
In order to improve the efficiency of fusion, we implement a clustering analysis to the new modified BPAs, combining evidence with different degrees of conflict in different combination rules. Agglomerative hierarchical clustering is chosen here due to its character of merging from singletons to a root cluster. At the first stage of this process, every BPA (m 0 i ) is a maximal cluster (P i ), and their dðm i 0 ; m j 0 Þ can be calculated to measure the conflict. In addition, a threshold (d), determined by practical application, is adopted as the standard to assemble similar BPAs into a cluster: If dðm i 0 ; m j 0 Þ > d, there is an extreme divergence among the evidence, therefore requiring clustering to different classes. When all the dðm i 0 ; m j 0 Þ between newly formed clusters (P) are greater than d, the process ends, producing x clusters (P 1 ; P 2 . . . ; P x ). Similar to the definition of the relative support degree of evidence, the relative support degree of the cluster is deduced as follows.
Definition 2.5. Let (P 1 ; P 2 . . . ; P x ) be clusters after the merging of n evidence, with each P i involving q piece of evidence (q ! 1). The relative support degree of the cluster is then Step 5. Combination.
Evidence within a certain cluster P i will still be combined by Dempster's rule; therefore, t clusters of evidence (t n) as well as their relative support degree can be generated. The conflicting clusters should be converged by a new combination rule that requires a rectification factor (Yang et al., 2011): Definition 2.6. A rectification factor of two clusters is where E ij measures the absolute difference between P i and P j and is positively correlated to the difference. The relative difference measurement is thus defined as where E ij 0 ðA r Þ is the relative difference on focal element A r . Then, we can define a novel combination rule based on clustered evidence: Definition 2.7. The novel combination rule of conflicting clusters is Verifiably, the new combination satisfies requirements for BPAs, i.e., X A r 22 H mðA r Þ ¼ 1.

Validation and evaluation of the improved D-S evidence theory
In order to validate the improvement of the modified D-S evidence theory, this paper fuses five pieces of evidence from different sources and compares the convergence results with the combination rule in other studies. Assume that the focal element A is the correct object, and five different pieces of evidence with their BPAs on the frame of discernment H are  Table 3. From Table 3, it can be seen that D-S combination rule (Dempster, 2008) consistently regards that focal element A should be assigned no belief only because of m 2 ðAÞ ¼ 0; and with increasing evidence, beliefs will increasingly gather to focal element C, which reflects that the merging result is influenced considerably by m 2 . The Yager rule (Yager, 1987) attributes the conflicting information to the universal set, so this operation reduces the belief on element B; nevertheless, with increasing combination times, almost all the BPA convergence concentrates on the frame H, which is practically helpless for assistant decisions. Sun's method (Sun et al., 2000) introduces reliability to the combination, whose accuracy is better than that of the Yager rule, but mðHÞ is still greater than 0.5, also not beneficial to the decision. The Murphy rule (Murphy, 2000) combines the weighted average evidence four times, reasonably locates belief to hypothesis A, and gives less belief to B and C. The proposed rule in this paper correctly provides the possibility of the occurrence of A in a way that is quicker than Murphy's combination. Furthermore, when completing the convergence with only three pieces of evidence, A emerges as a high probability event. Thus, the proposed method is appropriate for conflicting materials in a more reasonable and rapid way.

Empirical data preparation
Because there are many complicated factors that cause coal and gas outbursts, it is a necessity to rank those elements and thereby determine the main controlling factors. The present research uses grey relational analysis (GRA) to analyse and investigate the key factors of outburst events. It is characterized by its convenient calculation, availability to any data volume, and fewer errors compared with that of qualitative analysis (Yang et al., 2012). It regards the similarity or dissimilarity of the factor trend as the measurement of the relational degree among the elements (Xiong et al., 2018). Considering the outburst probability and the degree of coal damage in the research area, we reclassify the coal body structure in Table 1 into three types from a macro perspectiveprimary constructional coal and ruptured coal (coal I), mortar coal (coal II), and mylonitic coal (coal III)-with numbers 1, 2, and 3 representing coal I, II, and III in the following paragraphs, respectively (Zhang et al., 2016). Obviously, coal III is the coal prone to outburst. Based on the historical data in the study area, the ranking of the degrees of correlation is acquired, as shown in Table 4. The order of the contribution of each factor is as follows: coal structure > gas pressure > coalbed burial depth > gas content > initial gas emission rate > Protodyakonov coefficient of coal strength > coal seam thickness > gas emission amount > geological structure > pillar collapses. The first seven elements whose degrees of correlation are greater than 0.55 are selected as the prediction indices of coal and gas outbursts.

Performance of the predictive model
The proposed predictive model is executed on 57 datasets of field geological data in the No. 3 coal seam of the Xinjing coal mine. Twenty samples are selected randomly for coal and gas outburst prediction, and the remaining data are used as the training sets of LSSVM. After ranking the relational degree of the factors, the initial gas emission rate is chosen for the single index method, the Protodyakonov coefficient of coal strength accompanied by the initial gas emission rate are chosen for the composite index method (K), and the remaining five factors are chosen for LSSVM. The field-derived geological data for the prediction of coal and gas outbursts at the Xinjing coal mine are shown in Table 5, where R 1 ¼ ½1 represents the outburst event and R 2 ¼ ½À1 signifies the non-outburst event. The results from three prediction methods (single index, composite index, and LSSVM) will be taken as BPAs of the D-S evidence theory for the final prediction. The process of the predictive model based on multisource information fusion is shown in Figure 3.
Construction of the predictive model. Let the D-S frame of discernment be H ¼ fR 1 ; R 2 g, where R 1 is the presence of an outburst hazard, and R 2 is its absence. Each of the testing samples has three pieces of evidence (evidence A, B, and C) from the different forecasting methods. (1) Evidence A (The first evidence from the LSSVM prediction model).
First, we use 37 field datasets collected from previously mined zones in the Xinjing coal mine for leave-one-out cross validation, which is suitable for the proof of small samples, to verify the performance of the LSSVM model. The overall datasets are divided into two sets (a training set with 36 groups of data and a testing set with one group of data) for crossvalidation testing in succession.
The experiment is performed in a MATLAB V R R2016a environment, and the LSSVM model is built on the dataset with the help of LSSVMlab Toolbox (version 1.8) (Pelckmans et al., 2002). Since the kernel function adopted in this paper is the Gaussian RBF kernel, two parameters needed to be tuned: the regularization parameter (c) and the squared bandwidth (r 2 ). The LSSVMlab Toolbox has a built-in tune function applying the state-of-the-art global optimization technique, coupled simulated annealing (CSA), to find the suitable parameters and then fine-tune these parameters by grid search as a second optimization procedure. During the parameter tuning, the performance is estimated by 10-fold cross validation.
Before being input to the LSSVM model, the empirical datasets should be normalized within the range of [0, 1]: (given that the number of samples is n and the number of variables is m) x i ðkÞ norm ¼ x i ðkÞ À min k x i ðkÞ max k x i ðkÞ À min k x i ðkÞ ; i ¼ 1; 2; . . . m; k ¼ 1; 2; . . . ; n (36) where x i ðkÞ is the value of sample k under factor i, and x i ðkÞ norm is the normalized value of x i ðkÞ. The training dataset is denoted as T ¼ fðx i ; y i Þ; i ¼ 1; 2; . . . ; 37g; x 2 R m ; y i 2 fÀ1; 1g, where y i are levels of actual outburst occurrences (non-outburst danger (À1), occurrence of an outburst (1)). After the pre-processing of the data and the search of the suitable tuning parameters, we train the LSSVM model and then simulate and evaluate it at test points. The classified results are described in Table 6 (the misclassified values are in bold) showing that LSSVM misjudges the situation at No.18 and No.33.
We measure the efficacy of this trained LSSVM model by popular performance metrics (including accuracy, precision, recall, and F1 score) for this classification task, where the occurrence of an outburst is treated as positive. Note that "positive" here is the definition of the presence of events rather than their absence, not the actual effect on coal mines. Accuracy is the proportion of correct predictions to the total sample size, while precision is specific for results and represents the proportion of actual positive samples (correctly predicted occurrence of outbursts) in the total outburst predictions. Recall is specific for the original sample, referring to the correct predictions of outbursts among the total outburst events, and the F1 score is based on precision and recall: F1 score ¼ 2*(recall * precision)/(recall þ precision). From the results, the accuracy of predicted outbursts is calculated as 0.95, the precision is calculated as 1, and the recall is calculated as 0.89. Therefore, the F1 score is 0.94, showing that the performance of this trained LSSVM model fits well with the field data. Since the validity is demonstrated of LSSVM in coal and gas outburst prediction, it can be applied as a preliminary analysis for the prediction model proposed in this paper. In this section, we adopt the Bayesian posterior class probability of each predicted sample as the constructed BPA (equation (37)) in each outburst prediction where pðR 1 Þ is the posterior class probability of the occurrence of outbursts. The predictive results and BPAs of each point from the first evidence source are then obtained, as shown in Table 7, where m 1 ðR 1 Þ represents the probability of outburst occurrence, and m 1 ðR 2 Þ represents the probability of non-outburst forecasted by evidence A.
(2) Evidence B (The second evidence from the single index method). The initial gas emission rate ðDpÞ method is employed as the second evidence. Nonetheless, it is not very precise and reasonable to determine whether coal and gas outbursts occur simply depending on a fixed critical value (Miao, 2009). Actually, this boundary range is a fuzzy critical region, and the predictive results of the discernment framework are fuzzy sets. In addition, determining the BPAs of D-S evidence theory is still an open issue. Therefore, we take R 1 and R 2 as two fuzzy sets and construct the membership functions of R 1 and R 2 according to the critical values of the single index method in this coal and gas outburst prediction (with the graph of fuzzy membership shown in Figure 4) x R 1 ðDpÞ ¼ 1; Dp ! 10 exp À DpÀ10 r 2 2 2 ; Dp < 10 8 < : (38) where r 2 2 ¼ 2:5 is an approximate standard deviation of the initial gas emission rate of evidence B deduced from recorded data. The closer the value of degree of membership x R 1 ðDpÞ is to 1, the higher the probability that Dp belongs to the occurrence of an outburst; therefore, the fuzzy membership of R 1 and R 2 for the measured data can be determined by equations (38) and (39) and regarded as their corresponding BPAs Hence, the second evidence with its BPAs for D-S evidence is derived (shown in Table 8), where m 2 ðR 1 Þ represents the probability of outburst occurrence, and m 2 ðR 2 Þ represents the probability of non-outburst forecasted by evidence B.
(3) Evidence C (The third evidence from the comprehensive index method). This study employs a comprehensive index K to execute the prediction of coal and gas outbursts. Similar to the acquisition method of the second evidence, it is necessary to construct the membership function where r 2 2 ¼ 2:25 is an approximate standard deviation of K of evidence C deduced from recorded data. Thus, the fuzzy membership and their corresponding BPAs are determined by (with the graph of fuzzy membership shown in Figure 4) The third evidence with its BPAs for D-S evidence is shown in Table 8, where m 3 ðR 1 Þ represents the probability of outburst occurrence, and m 3 ðR 2 Þ represents the probability of non-outburst forecasted by evidence C.
Prediction by multisource information fusion. Finally, we use the improved combination rule proposed in the section Dempster-Shafer theory and its improvement to fuse the three predicted results and then analyse the probability of outburst risk. The predicted results are listed in Table 9, where mðR 1 Þ represents the probability of outburst occurrence, and mðR 2 Þ represents the probability of non-outbursts converged by improved D-S evidence, showing that the prophetic model is capable of forecasting the coal and gas outbursts of the No. 3 coal seam in the study area. According to the field experience of coal and gas outbursts, the points with the probability of mðR 1 Þ greater than 0.6 are classified as the occurrence of outbursts, and the points with the probability of mðR 2 Þ greater than 0.6 as non-outbursts. The fused prediction demonstrates that there will be 15 sites predicted to be safe and 5 sites predicted to have outbursts, and all of these predictions are consistent with the actual situation in the mine.
From the comparative table of 20 samples, we can learn that the advanced D-S evidence theory assesses four positions of non-outburst where all three preliminary forecasting methods also predict that no outburst occurs, including points No. 6,No. 10,No. 14,and No. 20. In particular, three means indicate that there is no outburst at point 2, but the approximation of the comprehensive index at the critical value will mislead the judgement. Therefore, after the convergence, the possibility of non-outbursts here is 0.6957, which provides convincing evidence for decisions. The points where coal and gas outbursts will appear can also be divided into two cases. The predictive credibility of points that are difficult to identify, i.e., at points No. 16 and No. 17, increases after the application of D-S evidence theory. In the remaining 13 sites, where the symptoms of outbursts are obvious, the probability of outburst existence is forecasted to be 1 as a good reflection of the strong agreement among the three initial methods. It is worth mentioning that the prediction for points No. 16 and No. 17 in particular presents the advantage of multisource information fusion by the improved D-S evidence theory. By congregating the multisource evidence, the estimation is consistent with the real situation with great certainty so that the prediction can be significantly improved by "compensating" the incorrect solution. If only the solitary index method was adopted, the initial gas emission rate at point No. 16 is smaller than the critical value in the provisions, while its BPA is greater than 0.5 after the assignment of the membership function. Likewise, the LSSVM forecasts that there will be coal and gas outbursts at point No. 17, but its Bayesian posterior class probability is only 0.51337, causing slight ambiguity for judgement. After the application of D-S evidence theory, the prediction result supports the occurrence with a probability of 0.62858, more consistent with the actual geological situation.
Therefore, compared with the existing predictive methods, which ordinarily focus on individual models and are limited in their flexibility to variable critical values, the model presented in this paper shows higher accuracy and better stability, effectively providing a sound way to solve the uncertain information of coal and gas outbursts.
Furthermore, to test the performance of the proposed approach, the present work compares it with two other single machine learning algorithms, i.e., LSSVM and artificial neural  Table 7. We use the classic ANN model, that is, the feed-forward neural network, for coal and gas outburst prediction. The hidden (intermediate) layer is established to one layer owing to its reasonable precision for the approximation of any continuous multivariate function. The hidden nodes of the model are adjusted by 4, 3 and 2 to test the predictive performance. The results show that the numbers of the total accurate predictions by different ANN types are 20, 5, 19 with respective accuracies of 100%, 25% and 75%, extremely unstable.
The reason why the predictions by the feed-forward neural network vary sharply is that in small-sample prediction, the optimization function tends to fall into the local optimal solution (i.e., overfitting, which has a good fitting effect on the training samples but a poor effect on the test set), and this "trap" increasingly deviates from the true global optimal. Therefore, it is necessary to manually adjust the node. LSSVM performs better with small sample datasets. More importantly, after the offset by D-S evidence, the situation of outbursts can be forecasted more convincingly.
Performance evaluation in another research area. To test the general applicability of this predictive method in Chinese coal mines with different geological conditions, we apply it to the outburst forecast in the Zhuxianzhuang coal mine, Anhui Province. This coal mine, located in northern China, is a large-scale high-gas-outburst mine, and its No. 8 coal seam has a high outburst frequency (Liu and Cheng, 2015). We conduct this predictive experiment with 45 datasets of field geological data, selecting 15 samples at random for validation and using the remaining data as the training sets of the LSSVM model for preliminary forecasting.
After the ranking of the key factors of outburst events in the Zhuxianzhuang coal mine by GRA, the indices of coal and gas outburst forecasting determined by the predictive model are as follows: the Protodyakonov coefficient of coal strength, gas content, coalbed burial depth, initial gas emission rate, coal structure, gas pressure, and gas emission amount.
As in the predictive procedure in the Xinjing coal mine, the initial gas emission rate and the Protodyakonov coefficient of coal strength are chosen for index methods, and the remaining factors are chosen for the LSSVM model. In addition, the outburst event is represented by R 1 ¼ ½1, and the non-outburst event is represented by R 2 ¼ ½À1, constituting the D-S frame of discernment. Each of the results of 15 test datasets is derived from the three preliminary methods, which are then taken as BPAs of the improved D-S evidence theory for the final prediction.
The predicted results are listed in Table 10, where mðR 1 Þ represents the probability of outburst occurrence, and mðR 2 Þ represents the probability of non-outburst occurrence forecasted in the Zhuxianzhuang coal mine, demonstrating that this prediction model can also be applied to the No. 8 coal seam in the Zhuxianzhuang coal mine. From the results, it can be observed that the proposed method corrects the mistake of the single index method at points 2, 9, 10 and 12 and the misjudgement of LSSVM at point 14. Thus, the final prediction determines that there will be six sites at a safe level and nine sites likely to experience outbursts, all of which are in line with the actual situation in the mine. The case study in this research area shows that our proposed forecasting method can flexibly adjust the relevant factors and fuse the different predictive results, thereby enhancing the reliability of coal and gas outburst prediction. conducted in both the Xinjing coal mine and the Zhuxianzhuang coal mine also revealed that the presented model outperforms the previously used models due to its flexibility in the selection of the main controlling factors and fusion by the improved D-S evidence theory.
Although we successfully demonstrate the potential of multisource information fusion technology in coal and gas prediction, there are still some limitations to be addressed in the future. Since we do not have enough data to carry out experiments due to the difficulty in obtaining mine data, it is necessary to analyse and apply this method to additional geological conditions. This is the major obstacle to the study. In addition, this paper simply uses the recorded historical data to analyse and forecast outbursts. Many researchers now apply time series for coal and gas outburst prediction. Therefore, we will consider studying field data in chronological order in future work.