A novel divergence measure in Dempster–Shafer evidence theory based on pignistic probability transform and its application in multi-sensor data fusion

Dempster–Shafer (D–S) evidence theory is more and more extensively applied in multi-sensor data fusion. However, it is still an open issue that how to effectively combine highly conflicting evidence in D–S evidence theory. In this article, a novel divergence measure, called pignistic probability transformation divergence, is proposed to measure the difference between evidences. The proposed pignistic probability transformation divergence can reflect the interaction between single-element and multi-element subsets by introducing the pignistic probability transformation, and satisfies the properties of boundedness, non-degeneracy, and symmetry. Moreover, the pignistic probability transformation divergence can degenerate as Jensen–Shannon divergence when mass function and the probability distribution are consistent. Based on the pignistic probability transformation divergence, a new multi-sensor data fusion method is presented. The proposed method takes advantage of pignistic probability transformation divergence to measure the discrepancy between evidences in order to obtain the credibility weights, and belief entropy to measure the uncertainty of the evidences in order to obtain the information volume weights, which can fully mine the potential information between evidences. Then, the credibility weights and the information volume weights are integrated to generate an appropriate weighted average evidence before using Dempster’s combination rule. The results of two application cases illustrate that the proposed method outperforms other related methods for combining highly conflicting evidences.


Introduction
Multi-sensor data fusion is an information modeling process in which data from multiple sensors are comprehensively analyzed to realize decision-making. As we all know, the information collected by a single sensor cannot be used to describe a certain system in multiple levels and perspectives, so the results are not convincing. In contrast, multi-sensor data fusion can process data from multiple sensors comprehensively to obtain more reliable results than a single sensor. Thus, it is widely applied in massive applications in the real world.
However, it is still an open issue that how to effectively combine the uncertain, inconsistent, or even conflicted data collected from different sensors. Many theories have been presented to solve this problem, including Dempster-Shafer (D-S) evidence theory, 1,2 fuzzy set theory, 3,4 rough sets, 5,6 Z-numbers, 7,8 D-numbers, 9,10 R-number, 11,12 and so on.
The essence of D-S evidence theory is a generalization of probability theory. Compared with the traditional probability theory, it can more effectively express the random uncertainty. In addition, it also provides a powerful Dempster's combination rule 13 which satisfies the excellent characteristics of the associativity and commutativity, and can realize the fusion of evidences without prior information. Due to its flexibility and effectiveness on handling uncertainty, D-S evidence theory is widely applied in various fields of multi-sensor data fusion, such as target recognition, 14,15 fault diagnosis, 16,17 decision-making, 18,19 and risk analysis, 20,21 . Whereas, Dempster's combination rule leads to counter-intuitive conclusions when there is a conflict between evidences. 7 Thus far, many methods have been developed in order to overcome the defect, which are mainly divided into two types. 22 One is to modify Dempster's combination rule. 23,24 It is believed that the counter-intuitive conclusions are made as a result of some defects of Dempster's combination rule, that is, the normalization process causes the conflicting information to be completely discarded. So, the key to modifying Dempster's combination rule is the redistribution of conflicts. However, the modified combination rule cannot keep the excellent mathematical properties of Dempster's combination rule, like commutativity and associativity. The other is to modify the original evidence. It is believed that Dempster's combination rule itself is not wrong, that conflicts are caused by the instable evidence sources. The improvement method is first to preprocess the conflicting evidence in order to obtain the weighted average evidences. Then, the weighted average evidences are fused using Dempster's combination rule to get reasonable results. In this article, we focus on the second method. Han et al. 25 introduce the concept of evidence support based on the Jousselme distance function and take a weighted average of all the evidences. Xiao 26 generalizes the traditional Jousselme distance to the complex evidence distance to measure the conflicts of complex the basic probability assignment (BPA) functions, and uses it as a weighted factor to revise the original evidence. Xiao 27 utilizes belief Jensen-Shannon (BJS) divergence to measure the distance between evidences to generate the modified evidences. Whereas, BJS divergence measures the difference between evidences by treating multielement subsets as single-element subset, which cannot reflect the interaction between single-element and multielement subset. Although the reinforced belief (RB) divergence proposed by Xiao 28 overcomes the shortcomings of BJS divergence, the multi-sensor data fusion method based on the RB divergence is still some room for improvement to achieve more high belief degree for the correct target. Besides, the new divergence proposed by Wang et al. 29 also addresses the above deficiencies, but the time complexity of the multi-sensor data fusion method based on the new divergence is high.
In this article, we propose a novel divergence measure, called pignistic probability transformation (PPT) divergence, to measure the difference between evidences. The proposed PPT divergence can reflect the interaction between single-element and multi-element subsets by introducing the PPT, and satisfies the properties of boundedness, non-degeneracy, and symmetry. Moreover, the PPT divergence can degenerate as Jensen-Shannon (JS) divergence 30 when the mass function and the probability distribution are consistent. Based on the PPT divergence and the Deng entropy, a new multi-sensor data fusion method is presented. This method takes advantage of PPT divergence to measure the discrepancy between evidences in order to obtain the credibility weights, and the Deng entropy to measure the uncertainty of the evidences in order to obtain the information volume weights, which can fully mine the potential information between evidences. Then, the credibility weights and information volume weights are integrated to generate an appropriate weighted average evidence before using Dempster's combination rule. Two application cases are provided to illustrate the superiority of our method.
The rest of the article is organized as follows. Section ''Preliminaries'' introduces some relevant basic theoretical knowledge. Section ''The PPT divergence measure'' proposes the PPT divergence in D-S evidence theory, and the comparative analysis is conducted. In section ''The multi-sensor data fusion method based on the PPT divergence,'' the multi-sensor data fusion method is presented based on the PPT divergence and the Deng entropy. Section ''Application'' presents the two application cases of the proposed fusion method, and analyzes its results. Finally, section ''Conclusion'' concludes this article.

Preliminaries
In this section, some preliminaries are briefly introduced, including D-S evidence theory, the Deng entropy, pignistic probability transform, and several typical divergences for the purpose of understanding the descriptions in the rest of this article.
where Y is called the frame of discernment (FOD), and F i is named single-element proposition or subset. We define 2 Y as a power set which contains 2 N elements and can be described as 32 where ; is an empty set in equation (2).
Definition 2 (the BPA). In the FOD Y, the BPA function m is also called mass function and is defined as a mapping of the power set 2 Y to [0,1] 33 where mass function m(A) represents the degree of support to A, and A is called the focal element or subset. The mass function m(;) is equal to 0 in classical D-S evidence theory.
Definition 3 (belief function). Let A be a hypothesis in the FOD Y. A belief function Bel : 2 Y ! ½0, 1 is defined as Definition 4 (plausibility function). Let A be a hypothesis in the FOD Y. A plausibility function Pl: 2 Y ! ½0, 1 is defined as Definition 5 (Dempster's combination rule). Assume m 1 and m 2 are the two independent BPAs on 2 Y . Dempster's combination rule, represented in the form m = m 1 È m 2 , is defined as follows 13 in which where È represents Dempster's combination rule. K is called the conflict coefficient, and it has values between 0 and 1. The larger K is, the more conflict between two evidences is.

Entropy
Definition 6 (the Deng entropy). The Shannon entropy 34 has a great contribution to the measurement of uncertainty, but it has some limitations when there is the BPA. Because it measures uncertainty based on probability. In order to solve this problem, Deng proposes the Deng entropy 35 in the framework of D-S evidence theory. It is a generalization of the Shannon entropy, and is defined as where m is a mass function defined on the FOD Y, A is the focal element of m, and jAj is the cardinality of A.

Probability transform
Definition 7 (pignistic probability transform). Suppose m is a BPA, the pignistic probability transform is defined as follows 36 where jA \ Bj is the cardinality of the intersection of A and B. The mass function m(;) is equal to 0 in a closed world (i.e. in classical D-S evidence theory). So, it can be simplified to the following form The essence of pignistic probability transform is to convert a BPA function into a probability distribution. It allocates the belief to each element equally. When A is single-element subset, equation (9) can be simplified as Divergence measures Definition 8 (JS divergence). Given two probability distributions P = p 1 , p 2 , . . . , p n f gand Q = q 1 , q 2 , . . . , q n f g , with P i p i = P i q i = 1. The JS divergence between P and Q is denoted as 30 Definition 9 (BJS divergence). Given two independent BPAs m 1 and m 2 defined on Y, the BJS divergence between m 1 and m 2 is denoted as 27 The essence of BJS divergence is to replace the probabilities distributions in JS divergence with BPAs, so that it can be applied to measure the discrepancy between evidences.
Definition 10 (RB divergence). Let H be the FOD, which contains h mutually exclusive and collectively exhaustive events. Let m 1 and m 2 be the two belief functions in H. The RB divergence between the belief functions m 1 and m 2 is defined as 28 where The PPT divergence measure Although BJS divergence can measure the difference between evidences and achieve accurate fusing results, the multi-element subsets are treated as single-element subsets for divergence calculation, which ignores the influence of multi-element subsets on divergence measurement. A counter example is depicted in Example 1.
Example 1. Given three BPAs m 1 , m 2 , and m 3 defined on Y = fA, Bg, they are shown as follows The divergence is calculated with BJS divergence as follows As we know, object A has the greatest reliability for m 1 , object B has the greatest reliability for m 2 , and object A, B has the greatest reliability for m 3 . Obviously, the difference between m 1 and m 2 is greater than that of m 1 and m 3 or m 2 and m 3 . Whereas, the results calculated by BJS divergence illustrate that there is no difference between m 1 , m 2 , and m 3 , which is contrary to common sense. Therefore, to solve above problem, we propose a novel divergence measure.

The definition of the PPT divergence
Let m 1 and m 2 be the BPAs in the FOD Y. The novel divergence between m 1 and m 2 is expressed as where BetP m (A) = P

BY AB
(m(B)=jBj), and note that A is single-element subset. I(m 1 , m 2 ) represents the Kullback-Leibler (KL) divergence. 37 Obviously, PPT(m 1 , m 2 ) can also be described in the following form The pignistic probability transform is described as keeping the belief of the single-element subset unchanged and evenly allocating the belief of the multielement subset to the included single-element subset. Essentially, the function BetP m transforms the BPA into a probability distribution. The proposed divergence, named PPT divergence, fully reflects the interaction between single-element and multi-element subsets, thereby effectively avoiding the situation that multielement subsets are regarded as single-element subsets in calculating the divergence between evidences. Furthermore, when the BPA consists of only singleton subsets, the PPT divergence is degenerated as JS divergence.

The properties of the PPT divergence
There are some properties of the PPT divergence shown as follows: Boundedness. Proof. For equation (10), we have Based on the Shannon inequality, 38 we have I(P, Q) ø 0, where P, Q are the two probability distributions. In equation (15), the BetP m transforms BPA into a probability distribution. Consequently, BetP m 1 , BetP m 2 , and BetP m 1 + BetP m 2 =2 are all the probability distributions. In the same way, I(BetP m 1 , BetP m 1 + BetP m 2 =2) ø 0 and I(BetP m 2 , BetP m 1 + BetP m 2 =2) ø 0 hold, so PPT(m 1 , m 2 ) ø 0 For equation (15), we have Finally, we can conclude that the value of the PPT divergence is ½0, 1.

This equation holds only if
Thus, we can conclude that the non-degeneracy of PPT divergence holds.

Comparative analysis
In this section, numerical examples (Examples 1-4 from Wang et al. 29 ) are used to illustrate the validity of PPT divergence and compare it with BJS divergence.
The divergence value, recalled Example 1, is calculated with the PPT divergence as follows  The PPT divergence value varying with jT j and a is shown in Figure 1. We can conclude from Figure 1(a)   that the divergence value fluctuates from 0 to 1, no matter how jTj and a change. The boundedness of the PPT divergence is verified and its value is [0,1]. When jTj = 1, namely, the evidences m 1 and m 2 consist of singleton subsets, there is no intersection between the subsets. Thus, the divergence value is larger than that of other jT j values in Figure 1(b). As the jT j increases from 2 to 8, the divergence value increases accordingly. This is intuitive, because the cardinality extension of a multi-element subset must result in an increase in uncertainty. As shown in Figure 1(c), when jT j is fixed, the divergence value decreases monotonically as a increases from 0 to 0.9. This is also consistent with intuitive, because the belief distribution of the two BPAs is getting more and more similar. When a = 0:9, namely, m 1 = m 2 , the divergence value is always equal to 0, no matter what the value of jT j is. It proves once again that the non-degeneracy of PPT divergence holds. The divergence values varying with T are shown in Figure 2. Intuitively, with the increase in cardinality of multi-element subsets, the divergence value between two BPAs increases accordingly. However, the BJS divergence value always remains the same. The reason is that the BJS divergence treats the multi-element subset as single-element subset for calculation, which cannot reflect the influence of multi-element subsets on divergence measurement. On the contrary, the PPT divergence fully considers the interaction between single-element and multi-element subsets and obtains rational results, which illustrates the feasibility of the PPT divergence.
Example 4. Suppose that there are two BPAs m 1 and m 2 in the FOD which is Y = fA, Bg. We change a step by step from 0 to 1. In each step, a increases D = 0:02 For every change of a, the PPT divergence and the BJS divergence achieve the same value as shown in Figure 3. This is reasonable, because BetP m (A) = P BY m(B)=jBj can be simplified to BetP m (A) = m(A) when two BPAs only consist of singleton subsets, namely, the PPT divergence degenerates into the BJS divergence. In this case, the BJS divergence assigns the mass function's hypothesis to the single element and the BPA is turned into probability. In fact, the BJS divergence also degenerates to JS divergence.
We can conclude from the above examples that the results obtained by the PPT divergence are more reasonable than those obtained by the BJS divergence in measuring the divergence between BPAs.
The multi-sensor data fusion method based on the PPT divergence To combine highly conflicting evidence effectively, a new multi-sensor data fusion method is proposed based on the PPT divergence and the Deng entropy. In this proposed method, the PPT divergence is used to measure the discrepancy between evidences to obtain the credibility weight of evidence which is inversely proportional to the divergence value, and the Deng entropy is used to measure the uncertainty of evidence to obtain the information volume weight of evidence. The proposed method comprehensively determines the final weight of evidence from two aspects: difference and uncertainty, which fully excavates the potential information between evidences, so that it can deal with conflicting evidence more effectively. The flowchart of proposed method is shown in Figure 4.

Obtain the credibility weight of evidence
Step 1-1: assume there are k independent BPAs on the same FOD that contains n elements: Y = fA 1 , A 2 , . . . , A n g. The divergence value between two BPAs is generated with equation (16). Then, a divergence matrix D k 3 k is constructed as follows Step 1-2: the support degree S of evidence m i is calculated as Note that, to avoid that the support degree S of evidence m i is meaningless, we need to replace , if and only if there are k identical pieces of evidence.
Step 1-3: the credibility weights W d of evidence m i are defined as Obtain the information volume weight of evidence Step 2-1: the Deng entropy H d of evidence m i is generated with equation (7) H Step 2-2: to avoid assigning zero weight to the evidence whose belief entropy is zero, the information volume is defined as Step 2-3: the information volume weights W iv of evidence m i is calculated as Generate and combine the weighted average evidence Step 3-1: the final weight w of evidence m i is calculated as Step 3-2: the weighted average evidencem is calculated asm Step 3-3: Dempster's combination rule is used k 2 1 times to combine the weighted average evidence according to equation (5). Then, the final combination result is calculated as Application In this section, the proposed multi-sensor data fusion method is applied in two application cases of target recognition and fault diagnosis. Step 2-3: the information volume weight W iv of evidence m i is calculated as follows Step 3-1: the final weight w of evidence m i is calculated as follows Step 3-2: the weighted average evidencem is calculated as followsm Step 3-3: use Dempster's combination rule to fuse the weighted average evidence four times, and the final combination results are shown in Table 3 and Figure 5.

Discussion
The result of combining conflicting evidence with different methods is shown in Table 3. The comparisons of the BPAs generated by different methods are shown in Figure 5. As shown in Table 3, although all of the methods can recognize the objective A as the true target, the proposed method achieves a much higher belief value (0.9892) than Xiao's 27 method and Xiao's 28 method, which illustrates that the proposed method solves the shortcomings of Xiao's 27 method and has a greater improvement in the belief value for the correct target than Xiao's 28 method. The reason is that it not only reflects the interaction between single-element and multi-element subsets by introducing the PPT, but also uses the credibility weights and the information volume weights to determine the final weights of evidences. In real applications, a slight increase of the belief value for the correct target is significant to improve the performance of target recognition system. Comparatively, the proposed method and Wang et al.'s 29 method similarly achieve the highest belief degree of the correct target. According to the time complexity calculation method presented in Li and Xiao, 39 the proposed method calculates the divergence value between two evidences for k(k À 1) times.
Since 1, 2, . . . , n), we need to process . . , n), we need to process n 2 3 2 nÀ1 elements to obtain D(m 1 , m 2 ). The time complexity of calculation of divergence matrix is O(k 2 3 n 2 3 2 nÀ1 ). The weighted average evidence will be combined for k À 1 times. Thus, the time complexity of combining the weighted average evidence is O(k 3 2 nÀ1 ). The overall time complexity of Wang et al.'s method is O(k 2 3 n 2 3 2 nÀ1 ). Obviously, the time complexity of the proposed method is lower than that of Wang et al.'s method. Therefore, the proposed method outperforms other four methods in target recognition application. Step 2-3: the information volume weight W iv of evidence m i is calculated as follows Step 3-1: the final weight w of evidence m i is calculated as follows Step 3-2: the weighted average evidencem is calculated as followsm Step 3-3: use Dempster's combination rule to fuse the weighted average evidence four times, and the final combination results are shown in Table 5 and Figure 6.

Discussion
As shown in Table 5 and Figure 6, facing the conflicting sensor report m 5 , Dempster's method fails to combine the highly conflicting evidence, which generates counter-intuitive result and treats the fault type F 3 as the true fault type, even if the other four evidences support the fault type F 1 . Whereas, the proposed method, Xiao's 27 method, Xiao's 28 method, and Wang et al.'s 29 method can handle the highly conflicting evidence and recognizes the fault type F 1 correctly. Furthermore, compared with Xiao's 27 method, the proposed method is more effective in managing conflicting evidence, and has the higher belief value (0.9957) of the correct fault type as shown in Table 5. The main reason is that the PPT divergence reflects the interaction between singleelement and multi-element subsets by introducing PPT. Meanwhile, the proposed method has more room for improvement in the belief value of the correct target than Xiao's 28 method. The reason is that the proposed method not only makes use of the PPT divergence to obtain the credibility weight, but also considers the uncertainty of the evidences by adopting the Deng entropy to obtain the information volume weight. Then, these two kinds of weights are used to determine the final weight comprehensively. The proposed method and Wang et al.'s 29 method similarly achieve the highest belief value for the correct fault types, but the time complexity of the proposed method is lower than that of Wang et al.'s 29 method. The above analysis fully demonstrates the effectiveness and superiority of the proposed method in fault diagnosis application.

Conclusion
In this article, a novel divergence measure, called PPT divergence, is proposed to measure the difference between evidences. The proposed PPT divergence can reflect the interaction between single-element and multi-element subsets by introducing the PPT, and overcome the shortcomings of the BJS divergence which treats multi-element subset as single-element subset for calculation. Furthermore, the PPT divergence satisfies the properties of boundedness, non-degeneracy, and symmetry, and can degenerate as JS divergence when the mass function and the probability distribution are consistent. Comparative analysis shows that the PPT divergence can obtain more reasonable results than the BJS divergence in measuring the  Figure 6. The comparison of the BPAs generated by different methods for fault diagnosis. discrepancy between evidences. Based on the PPT divergence, a new multi-sensor data fusion method is presented. The proposed method takes advantage of PPT divergence to measure the discrepancy between evidences in order to obtain the credibility weights, and the Deng entropy to measure the uncertainty of the evidences in order to obtain the information volume weights, which can fully mine the potential information between evidences. Then, the credibility weights and the information volume weights are integrated to generate the weighted average evidence before using Dempster's combination rule. Through two application cases analysis, compared with Xiao's and Wang et al.'s multi-sensor data fusion method, our proposed method has higher belief degree for the correct target and lower time complexity, respectively. In general, the proposed method outperforms other four methods for combining highly conflicting evidences.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.