1 Introduction
Modeling data and knowledge as uncertain information granules within a complete domain is a crucial foundation for symbolic reasoning and decision-making approaches [1,2]. Consider an uncertain variable $ {X} $, whose true value is assumed to lie within the frame Ω. The key issue in reasoning involves identifying appropriate constraints from various information sources and updating information to infer the most likely value [3]. Probability distributions, as the most common method for modeling uncertainty, can adequately represent randomness by assigning normalized weights to the singletons within the frame [4]. Probability distributions can granularize statistical information, distinctly representing the internal conflicts [5] among singletons. Additionally, possibility theory, which is developed from membership functions, offers a more reasonable approach for modeling semantic information. Unlike probability distributions, possibility distributions relax the normalization constraint, allowing all elements to take values within the interval [0, 1], effectively modeling the incompleteness of information [6,7]. Both probability and possibility measures represent uncertain information by assigning beliefs to singletons [8], requiring the information-generating model to accurately determine the belief for each element. However, in practical reasoning and decision-making, the accuracy of reasoning and decisions is often compromised by epistemic ignorance due to incomplete, unreliable, or imprecise knowledge [9]. Therefore, a key challenge for uncertainty representation methods based on probability and possibility measures is to incorporate contextual information into the granularization process of uncertain information distributions.
Dempster-Shafer theory, also known as evidence theory, was first proposed by Dempster through multi-valued probability mapping and later developed by Shafer into a reasoning method for uncertain environments based on random encoding semantics [10]. The basic unit of information in evidence theory is the basic probability assignment (BPA), which assigns beliefs to subsets of the frame, with the beliefs on multi-element subsets interpreted as unrefinable support. Compared to probability theory, evidence theory allows for ignorance regarding the granule itself [11]. In terms of information processing, Dempster proposed an orthogonal sum for combining independent information granules, known as Dempster’s rule [10]. This rule can be seen as a generalization of Bayesian theorem, enabling Bayesian updates to be performed without prior information [12]. Due to its stronger uncertainty modeling capabilities and effective information updating methods, reasoning and decision-making methods under the evidence theory framework are widely applied in multi-source information fusion [13,14], fault diagnosis [15,16], multi-criteria decision making [17,18], network science [19–21], and machine learning in imprecise environments [22–24]. Smets proposed the transferable belief model (TBM) [25], a new semantic representation of evidential information [26]. TBM abandons the interpretation of evidence theory from the perspective of upper and lower probabilities and instead fully adopts the viewpoint of unreliable testimony, providing a complete framework for belief transfer and decision-making [27]. In subsequent research, because evidence theory and TBM share many identical application scenarios, scholars in recent years have integrated them into the theory of belief function, treating them as a single approach unless discussing the semantics of imprecise probabilities. TBM consists of two levels: The credal level and the pignistic level. At the credal level, belief transfer is performed within dimensions of the power set. When no further information is available to update BPA, the pignistic level is applied, where BPA is transferred to a distribution over singletons to facilitate decision-making.
The basic probability assignment (BPA) can be viewed as a random finite set, where the power set of the frame is modeled as the event space to capture the randomness of subsets. It is evident that the elements of the subset in belief structure are unordered. In recent years, Deng et al. introduced the order of elements into subsets, extending the event space from the power set to the permutation event space (PES), and proposed the random permutation set theory (RPST) [28–31]. The permutation mass function (PerMF) is a normalized distribution assigned to random permutation sets. Compared to the power set, PES offers a broader and more comprehensive perspective for describing uncertainty in information, thus garnering significant attention and discussion [32]. As a typical representative of uncertainty modeling based on the power set, the interpretation and application of evidence theory within PES remain open issues.
This paper aims to develop RPST from the perspective of TBM by proposing the layer-2 TBM. By discussing its fundamental semantics, belief transfer methods, decision-making approaches, and data-driven generation methods, the paper seeks to simultaneously handle both quantitative and qualitative information in uncertain environments. This work further enriches the semantic interpretation of RPST and validates the effectiveness and superiority of uncertainty modeling within the PES framework.
The structure of this paper is as follows: Section 2 introduces the necessary concepts of Dempster-Shafer theory and RPST. Section 3 discusses the motivation behind the implementation of layer-2 TBM, including the belief transfer methods at the credal level and the decision-making approaches at the pignistic level. Section 4 explores methods for granularizing PerMF from real datasets and demonstrates the effectiveness of layer-2 TBM through application scenarios in multi-source information fusion. Finally, Section 5 concludes the paper and discusses future research directions.
2 Preliminaries
2.1 Dempster-Shafer theory
For a finite domain $ \varOmega =\{ \omega_{1}{\mathrm{,}}\;\omega_{2}{\mathrm{,}}\;\cdots {\mathrm{,}}\; \omega_{{n}}\} $, its power set is denoted as
$ 2^\varOmega=\left\{\emptyset{\mathrm{,}}\;\left\{\omega_1\right\}{\mathrm{,}}\;\left\{\omega_2\right\}{\mathrm{,}}\;\left\{\omega_1 \omega_2\right\}{\mathrm{,}}\; \cdots {\mathrm{,}}\; \left\{\omega_1 {\mathrm{,}} \;\omega_2 {\mathrm{,}} \;\cdots\; \omega_n\right\}\right\} $ ()
Consider a mapping $ {m} :{2}^\varOmega\to \left[\mathrm{0{\mathrm{,}}\;1}\right] $ satisfying $ \displaystyle\sum_{{F}_{i}\subseteq \varOmega }m \left({F}_{i}\right)=1 $, where $ m $ is called BPA or mass function; $ {F}_{i} $ is a subset represented by the binary coding of $ i $, such as $ {F}_{3}=\left\{{\omega }_{1}{\omega }_{2}\right\} $. When $ m \left({F}_{i}\right) > 0 $, $ {F}_{i} $ is denoted as the focal set. $ m $ is used to represent the restriction of the target element in $ \varOmega $, and the beliefs on focal sets $ {F}_{i} $ represent the unrefinable support for the statement the target element belongs to $ {F}_{i} $.
Conjunctive combination rule (CCR) and disjunctive combination rule (DCR) both are well-known approaches to update the beliefs. CCR is an effective tool for combining distinct and reliable bodies of evidence, while DCR, as its dual operation, can discount unreliable beliefs to a state of ignorance. For two mass functions $ {m}_{1} $ and $ {m}_{2} $, CCR and DCR are defined as
$ {m}_{1}\cap {m}_{2} \left({F}_{i}\right)=\sum_{{F}_{i}={F}_{j}\cap {F}_{k}}{m}_{1} \left({F}_{j}\right)\cdot {m}_{2} \left({F}_{k}\right) $ ()
$ {m}_{1}\cup {m}_{2} \left({F}_{i}\right)=\sum_{{F}_{i}={F}_{j}\cup {F}_{k}}{m}_{1} \left({F}_{j}\right)\cdot {m}_{2} \left({F}_{k}\right) $ ()
When no additional bodies of evidence are available to update the beliefs, the beliefs on multi-element focal sets are assigned to singletons for decision-making, a process known as probability transformation. Pignistic probability transformation (PPT) is an effective tool that satisfies the linearity and the maximum entropy principles, and it is defined as follows:
$ \rm{BetP}_m(\omega)=\sum_{\omega \in F_i} \frac{m \left(F_i\right)}{(1-m(\emptyset)) \cdot\left|F_i\right|} $ ()
where $ \left|\cdot \right| $ is the cardinality of the focal set.
In TBM, the combination rules that update belief within the same dimension are referred to as the credal level, while the probability transformation, which projects belief from the power set to singletons, is referred to as the pignistic level.
2.2 Random permutation set theory
When the element orders are introduced in the focal set, the event space is extended from the power set to PES, which is defined as
$ \mathcal{P}\mathcal{E}\mathcal{S}\left(\varOmega\right)=\{{\varnothing }{\mathrm{,}}\;\{\omega_{1}\}{\mathrm{,}}\;\{\omega_{2}\}{\mathrm{,}}\;\left(\omega_{1}\omega_{2}\right){\mathrm{,}}\;\left(\omega_{2}\omega_{1}\right){\mathrm{,}}\;\cdots {\mathrm{,}}\;\left(\omega_{1}{\mathrm{,}}\;\omega_{2}{\mathrm{,}}\;\cdots{\mathrm{,}}\; \omega_{{n}}\right){\mathrm{,}}\;\cdots {\mathrm{,}}\;\left(\omega_{{n}}{\mathrm{,}}\;\omega_{{{n}}-1}{\mathrm{,}}\;\cdots {\mathrm{,}}\;\omega_{1}\right)\} $ ()
where $ \left(\cdot \right) $ represents the ordered set. Similar to a mass function, consider a mapping $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}:\mathcal{P}\mathcal{E}\mathcal{S}\left(\varOmega\right)\to \left[\mathrm{0{\mathrm{,}}\;1}\right] $ satisfying $\rm{Perm}\left(F_i^j\right) \in [0{\mathrm{,}}\; 1]{\mathrm{,}}\; \displaystyle\sum_{F_i^j \in \mathcal{P E S}} \rm{Perm}\left(F_i^j\right)=1 $, Perm is called PerMF. When $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{j}\right) > 0 $, $ {F}_{i}^{j} $ is called an ordered focal set, where i retains the same meaning as in the focal set, and $ j=1{\mathrm{,}}\;2{\mathrm{,}}\;\cdots {\mathrm{,}}\;\left|{F}_{i}\right|! $ is determined by lexicographic order, representing the order of the focal set $ {F}_{i} $. For the combination rules and probability transformation of PerMF, please refer to Refs. [24,25]. As they do not follow the semantics of TBM, their details are not provided in this paper.
3 Layer-2 TBM
3.1 Motivation and overview
In TBM, BPA is generated through the combination and decomposition of simple mass functions (SMFs). When a BPA has only two focal sets, and one of them is the frame $ \varOmega $, it is called an SMF. The semantics of SMF can transform the unreliable testimonies into a BPA. For an uncertain variable $ X $ and a frame $ \varOmega $, when an agent receives testimony $ X\in {F}_{i} $ and the reliability of the source is $ {\alpha} $, the testimony can be represented as an SMF $ {{F}_{i}}^{1-{\alpha}}\equiv \{m \left({F}_{i}\right)={\alpha }{\mathrm{,}}\mathrm{ }\mathrm{ }m\left(\varOmega\right)=1-{\alpha}\} $. In particular, when only combination is considered, the combination of any number of SMFs must result in BPA. Therefore, SMFs can be viewed as effective information interfaces for unreliable testimonies. However, when the testimony provided by the agent is not a set but a sequence with preferences, which is more common in practical applications, SMF cannot adequately encode this unreliable sequence. In Ref. [33], Zhou et al. provided an encoding method through random permutation set (RPS). When an agent receives a testimony $ {F}_{i}^{j} $ with reliability degree $ {{α}} $, it can be written as a PerMF:
$ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\equiv \left\{\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{j}\right)={\alpha}{\mathrm{,}}\; \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{{2}^{\left|\varOmega\right|}-1}^{1}\right)= \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{{2}^{\left|\varOmega\right|}-1}^{2}\right)= \; \cdots \; =\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{{2}^{\left|\varOmega\right|}-1}^{\left|\varOmega\right|!}\right)=\frac{1-{\alpha}}{\left|\varOmega\right|!}\right\} $ ()
where the order of elements reflects their propensity to be the target element. Consider three information distributions:
$ $ ()
$ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left(\left(\omega_{1}\omega_{2}\right)\right)=1 $ ()
Although they all indicate that the likelihood of $ \omega_{1} $ is greater than $ \omega_{2} $, they differ in the types of uncertainty they represent. The probability distribution indicates that the agent knows with certainty that the probability of $ \omega_{1} $ is $ 0.6 $ and the probability of $ \omega_{2} $ is $ 0.4 $. This information distribution only contains randomness, and if it is reliable and truthful, the conflict belief will transfer to the empty set in subsequent information updates. The mass function indicates that there is a $ 0.2 $ probability that $ \omega_{1} $ is the target element and an additional $ 0.8 $ probability that it is unknown whether $ \omega_{1} $ or $ \omega_{2} $ is the target element. In this case, the types of uncertainty include both randomness and non-specificity. Similarly, PerMF indicates that it is unknown whether $ \omega_{1} $ or $ \omega_{2} $ is the target element, but $ \omega_{1} $ is considered more likely than $ \omega_{2} $, although this propensity is not strong enough to be reflected numerically. In this scenario, the information distribution includes three types of uncertainty: Randomness, non-specificity, and propensity.
The concept of the layer-2 belief structure (layer-2 BS) was first introduced in the context of discussing the disjunction and conjunction rules of RPS [34]. However, how to extend TBM to layer-2 BS has not been explored. Similar to TBM, layer-2 TBM also consists of a credal level and a pignistic level. Given that PerMF introduces additional order information compared to BPA, layer-2 TBM must address not only the transfer of beliefs but also the updating of order information and its influence on those beliefs. For the ordered focal set $ \left(\omega_{1}\omega_{2}\right) $, it can be expressed as $ \omega_{1}\succ \omega_{2}\succ \succ \omega_{3} $, which suggests that while there is a preference for $ \omega_{1} $ over $ \omega_{2} $, the support for both $ \omega_{1} $ and $ \omega_{2} $ is considerably greater than that for $ \omega_{3} $. Thus, under the interpretation of layer-2 BS, we argue that the propensity represented through order is weaker than the numerical beliefs, a consideration that has been overlooked in previous research on RPS-based information processing. Therefore, in the credal level, the beliefs transferred between focal sets should not be influenced by their orders, and only at the pignistic level, the orders affect the proportion of beliefs assigned to the singletons. Therefore, layer-2 TBM can be regarded as a refined version of TBM.
When discussing relevant measures of RPS [35,36], a common premise is that when order is disregarded, RPS-based approaches should degrade to evidential approaches. It is important to note that this degradation is not achieved by merely transforming an ordered set into an unordered one, but through the following transformation:
$ {m}_{\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}} \left({F}_{i}\right)=\displaystyle\sum _{j=1:\left|\varOmega\right|!}\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} \left({F}_{i}^{j}\right) $ (1a)
$ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{m} \left({F}_{i}^{j}\right)=\frac{m \left({F}_{i}\right)}{\left|{F}_{i}\right|!} $ (1b)
In this paper, before discussing the approaches within the RPS framework, four requirements are proposed.
1) Requirement 1. [Degradability] When $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} $ and $ m $ can transform to each other without information loss through (1), i.e., $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} $ satisfies $ \forall {F}_{i}\subseteq \varOmega{\mathrm{,}}\;{j}_{1}{\mathrm{,}}\;{j}_{2}\in \{1{\mathrm{,}}\;2{\mathrm{,}}\; \cdots {\mathrm{,}}\;\left|{F}_{i}\right|!\} $, existing $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} \left({F}_{i}^{{j}_{1}}\right)=\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} \left({F}_{i}^{{j}_{2}}\right) $, then the RPS-based operation performed on $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} $ is to be equivalent to the evidential operation performed on $ {m}_{\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}} $.
2) Requirement 2. [Ordinal relatedness] For a focal set, different orders are not independent; there existing a dissimilarity measure to correlate them and guide the information processing methods in layer-2 TBM.
3) Requirement 3. [Co-dominant ability] The belief updating for the ordered focal set should be determined by quantitative and qualitative information collectively.
4) Requirement 4. [Weak propensity] Order information cannot influence the beliefs of focal sets at the credal level; it can only guide belief assignment at the pignistic level.
3.2 Credal level in layer-2 TBM
In an information processing system where both input and output information granules are normalized distributions on PES, the system can be regarded as the credal level of a layer-2 TBM. Building on the most well-known methods at the credal level of TBM, namely CCR and DCR, Zhou et al. [34] proposed conjunctive and DCRs for the layer-2 BS, denoted as l2-CCR and layer-2 disjunctive combination rule (l2-DCR).
Definition 1. [l2-CCR & l2-DCR] Considering $ k $ PerMFs $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{i} $$ i=\{1{\mathrm{,}}\; 2{\mathrm{,}}\;\cdots {\mathrm{,}}\;k\} $ under the frame $ \varOmega $, when their sources are distinct and reliable, the l2-CCR is defined as
$ {m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{C}}}={m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{1}}\cap {m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{2}}\cap \cdots \cap {m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{k}} $ ()
$ O\left({{F}}_{{i}}^{{j}}\wedge {F}_{p}^{q}\right)=\sum _{\omega_{x}{\mathrm{,}}\omega_{y}\in {F}_{i}\cap {F}_{p}}\frac{R\left[{F}_{i}^{j}{\mathrm{,}}\;{F}_{p}^{q}\right]\left(\omega_{x}{\mathrm{,}}\;\omega_{y}\right)}{\left(\genfrac{}{}{0pt}{}{\left|{F}_{i}\cap {F}_{p}\right|}{2}\right)} $ ()
$ {O}_{C} \left({F}_{i}^{j}\right)=\sum _{t=1:{k}}\left[\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{t} \left({F}_{i}^{j}\right)+\sum _{{F}_{i}\subset {F}_{p}}\sum _{{q}=1:\left|{{F}}_{{p}}\right|!}\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{t} \left({F}_{q}^{p}\right)O \left({F}_{i}^{j}\wedge {F}_{p}^{q}\right)\right] $ ()
$ {O}_{C}^{N} \left({F}_{i}^{j}\right)=\frac{{O}_{C} \left({{F}}_{{i}}^{{j}}\right)}{\displaystyle\sum _{{h}=1}^{\left|{F}_{i}\right|!}{F}_{i}^{h}} $ ()
$ \mathrm{P}\mathrm{e}\mathrm{r}{{\mathrm{m}}}_{C}={O}_{C}^{N}\left({F}_{i}^{j}\right)\cdot {{m}}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{C}} \left({F}_{i}\right) $ ()
where $ R\left[{F}_{i}^{j}{\mathrm{,}}\;{F}_{p}^{q}\right]\left(\omega_{x}{\mathrm{,}}\;\omega_{y}\right) $ is a logical function that returns $ 1 $ when the elements $ \omega_{x} $ and $ \omega_{y} $ have identical relative orders in both $ {F}_{i}^{j} $ and $ {F}_{p}^{q} $, and returns $ 0 $ otherwise. When their sources are distinct and at least one of them is reliable, the l2-DCR is defined as
$ {m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{D}}={m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{1}}\cap{m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{2}}\cap \cdots \cap {m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{k}} $ ()
$ O \left({F}_{i}^{j}\wedge {F}_{p}^{q}\right)=\sum _{{\omega }_{x}{\mathrm{,}}\;{\omega }_{y}\in {F}_{i}\cap {F}_{p}}\frac{R\left[{F}_{i}^{j}{\mathrm{,}}\;{F}_{p}^{q}\right]\left({\omega }_{x}{\mathrm{,}}\;{\omega }_{y}\right)}{\left(\genfrac{}{}{0pt}{}{\left|{F}_{i}\cap {F}_{p}\right|}{2}\right)} $ ()
$ {O}_{D} \left({F}_{i}^{j}\right)=\sum _{t=1:k}\left[\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{t}\left({F}_{i}^{j}\right)+\sum _{{F}_{p}\subset {F}_{i}}\sum _{q=1:\left|{F}_{p}\right|!}\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{t} \left({F}_{q}^{p}\right)O \left({F}_{i}^{j}\wedge {F}_{p}^{q}\right)\right] $ ()
$ {O}_{D}^{N} \left({F}_{i}^{j}\right)=\frac{{O}_{D}\left({F}_{i}^{j}\right)}{\displaystyle\sum _{h=1}^{\left|{F}_{i}\right|!}{F}_{i}^{h}} $ ()
$ \mathrm{P}\mathrm{e}\mathrm{r}{{\mathrm{m}}}_{D}={O}_{D}^{N} \left({F}_{i}^{j}\right)\cdot {m}_{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{D}} \left({F}_{i}\right) $ ()
In Ref. [28], their properties have been analyzed from the perspective of information fusion. Based on the four requirements proposed in this paper, we will evaluate whether these methods can handle uncertainty in a reasonable manner within layer-2 TBM.
Proposition 1. [Degradability] Both l2-CCR and l2-DCR satisfy the degradability, i.e. they have $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{1}}\cap \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{2}}\cap\cdots \cap \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{k}}=\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{1}\cap {m}_{2}\cap \cdots \cap \mathrm{ }{m}_{k}} $, resp. $ \cup $.
Proof. Consider a focal set $ {F}_{i} $, since $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m} $ is implemented from $ {m} $, all orders of $ {F}_{i} $ have the identical beliefs. Suppose $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{1}\right)= \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{2}\right)=\cdots =\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{\left|{F}_{i}\right|!}\right)={a} $, in l2-CCR, its orders will influence the focal sets $ {F}_{p} $ satisfying $ {F}_{p}\subset {F}_{i} $. For arbitrary order $ {F}_{p}^{q} $, it has $ O\left({F}_{i}^{1}\wedge {F}_{p}^{q}\right)=O\left({F}_{i}^{2}\wedge {F}_{p}^{q}\right)=\cdots =O\left({F}_{i}^{\left|{F}_{i}\right|!}\wedge {F}_{p}^{q}\right) $, hence the combination outcome satisfies $ \mathrm{ }\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{1}}\cap \mathrm{ }\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{2}}\cap \cdots \cap \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{k}}=\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{m}_{1}\cap {m}_{2}\cap \cdots \cap \mathrm{ }{m}_{k}} $.
Proposition 2. [Ordinal relatedness] Both l2-CCR and l2-DCR satisfy the ordinal relatedness, i.e., the relationship between orders is soft.
Proof. According to the definitions of l2-CCR and l2-DCR, the relationship between orders is quantified through $ O\left({F}_{i}^{j}\wedge {F}_{p}^{q}\right) $, which means different orders may be dependent through a correlation coefficient $ \left[0{\mathrm{,}}\mathrm{ }1\right] $. Hence, they satisfy the ordinal relatedness.
Proposition 3. [Co-dominant ability] The belief updating of ordered focal sets in l2-CCR and l2-DCR both are determined by the symbolic orders and numerical beliefs together.
Proof. Consider the ordered focal set $ {F}_{i}^{j} $. Its corresponding belief is updated through the CCR or DCR on the layer-1 belief structure. Once the belief of the focal set is determined, the proportion assigned to its orders is determined based on the orders from the sources. Therefore, the implementation of $ \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{j}\right) $ is first determined by $ m \left({F}_{i}\right) $ and then assigned to the order $ j $ according to a specific proportion.
Proposition 4. [Weak propensity] Both l2-CCR and l2-DCR treat the order of elements as a weak propensity, meaning that the order of elements does not affect the numerical beliefs of the focal set.
Proof. According to the definitions of l2-CCR and l2-DCR, the beliefs of the focal sets are determined in the layer-1 structure, and the beliefs of their orders are derived from a proportion of these, which cannot exceed the corresponding focal set beliefs. Thus, the order of elements functions as a weak propensity in l2-CCR and l2-DCR, influencing the numerical beliefs only after the focal sets have been updated.
Hence, l2-CCR and l2-DCR can be seen as the credal level in the layer-2 TBM, and more belief updating methods in layer-2 TBM can be developed from these two basic operations.
3.3 Pignistic level in layer-2 TBM
In TBM, PPT satisfies the maximum entropy principle, meaning that the beliefs of multi-element focal sets are distributed uniformly among their singletons. This uniformity arises because there is no propensity among the elements in the focal set, so no single element should be given more support than the others. However, in layer-2 TBM, the order introduces a propensity among the elements. Thus, the key issue at the pignistic level is how to reasonably utilize this order information to make decisions.
Zhou et al. [33] utilized the weights of the ordered weighted averaging (OWA) operator to offer a perspective for projecting distributions on PES onto the singletons. For the singleton’s marginalization in Ref. [33], it can be seen as the ordered version of PPT.
Definition 2. [MEOWA-based PT] Given a PerMF Perm under an n-element, its maximum entropy OWA-based probability transformation (MEOWA-based PT) is defined as
$ \mathrm{M}\mathrm{E}{\mathrm{P}}_{\mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}}\left[\mathrm{O}\mathrm{r}\mathrm{n}\right]\left( \omega \right)=\sum _{{F}_{i}\ni \omega }\sum _{{j}=1:\left|{F}_{i}\right|!}{t}_{{F}_{i}^{j}\left( \omega \right)}^{\left|{F}_{i}\right|}\left[\mathrm{O}\mathrm{r}\mathrm{n}\right]\cdot \mathrm{P}\mathrm{e}\mathrm{r}\mathrm{m}\left({F}_{i}^{j}\right) $ ()
where $ {t}_{{F}_{i}^{j}\left( \omega \right)}^{\left|{F}_{i}\right|}\left[\mathrm{O}\mathrm{r}\mathrm{n}\right] $ represents the weight of $ \omega $ in the weight vector $ {F}_{i}^{j} $, which is generated based on the MEOWA weights with a length of $ \left|{F}_{i}\right| $ and an orness measure $ \mathrm{O}\mathrm{r}\mathrm{n} $. $ \mathrm{O}\mathrm{r}\mathrm{n} $ should be located in the range $ \left[0.5{\mathrm{,}}\mathrm{ }1\right] $, which represents the propensity degree of the elements. When $ \mathrm{O}\mathrm{r}\mathrm{n}=0.5 $, the MEOWA-based PT degrades to PPT, indicating that the decision maker does not consider the influence of order. When $ \mathrm{O}\mathrm{r}\mathrm{n}=1 $, the beliefs are assigned entirely to the first element of their corresponding ordered focal sets, signifying that the propensity difference between elements with successive orders is large.
According to the requirements in subsection 3.1, Requirement 2 is not suitable for probability transformation, as it does not involve comparisons between sequences. Furthermore, similar to the proofs at the credal level, MEOWA-based PT clearly satisfies the other requirements. As a parametric method, MEOWA-PT requires the agent to provide an orness measure during the decision-making stage, thereby offering a more flexible reasoning process. Additionally, since PPT satisfies the maximum entropy principle, choosing the weights for the MEOWA operator is more logical than for other MEOWA operators.
3.4 Layer-2 TBM
Based on the previous discussions, the flow chart of layer-2 TBM is shown in Fig. 1. At the credal level, the testimonies and the contextual knowledge are combined through $ \mathrm{l}2 $-DCR to construct the granules on PES as PerMFs, and then PerMFs are combined through $ \mathrm{l}2 $-CCR. At the pignistic level, the beliefs of the fused PerMF are assigned to the singletons to make decisions.

Figure 1.Flow chart of layer-2 TBM.
Example 1. Assume that in a murder case, the perpetrator is one of four suspects. There are five independent witnesses who claim to have seen parts of the incident. The possible suspects can be represented by a 4-element frame, and the testimonies of the five witnesses along with their reliability are shown in Table 1.

Table 1. Testimonies of witnesses and their contextual knowledge.
Table 1. Testimonies of witnesses and their contextual knowledge.
Witness | Testimony | Reliability | 1 | $ {\omega }_{3} $ is more likely to be the perpetrator than $ {\omega }_{1} $. | 0.80 | 2 | Likelihood of the perpetrator is $ {\omega }_{3}\succ {\omega }_{2}\succ {\omega }_{1} $. | 0.85 | 3 | Perpetrator is $ {\omega }_{3} $ or $ {\omega }_{4} $. | 0.30 | 4 | Perpetrator is $ {\omega }_{1} $. | 0.50 | 5 | $ {\omega }_{3} $ is more likely to be the perpetrator than $ {\omega }_{4} $. | 0.25 |
|
The testimonies and their contextual knowledge can be implemented as PerMFs in Table 2.

Table 2. PerMFs of testimonies and contextual knowledge.
Table 2. PerMFs of testimonies and contextual knowledge.
Witness | Testimony | Reliability | 1 | $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_1}\left({F}_{5}^{2}\right)=1.0 $ | $ \operatorname{Perm}_{{R}_1}(\varnothing)=0.80{\mathrm{,}} \quad \operatorname{Perm} _{{R}_1}\left(F_{15}^i\right)=\dfrac{0.20}{24}{\mathrm{,}}\; i=\{1{\mathrm{,}}\;2{\mathrm{,}}\; \cdots{\mathrm{,}}\; 24\} $ | 2 | $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_2}\left({F}_{7}^{6}\right)=1.0 $ | $ \operatorname{Perm}_{{R}_2}(\emptyset)=0.85{\mathrm{,}}\quad \operatorname{Perm}_{{R}_2}\left(F_{15}^i\right)=\dfrac{0.15}{24}{\mathrm{,}}\; i=\{1{\mathrm{,}}\; 2{\mathrm{,}}\;\cdots {\mathrm{,}}\; 24\} $ | 3 | $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_3}\left({F}_{12}^{1}\right)=0.5 $$ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_3}\left({F}_{12}^{2}\right)=0.5 $ | $ \operatorname{Perm}_{{R}_3}(\emptyset)=0.30{\mathrm{,}} \quad \operatorname{Perm}_{{R}_3}\left(F_{15}^i\right)=\dfrac{0.70}{24}{\mathrm{,}}\; i=\{1{\mathrm{,}}\; 2{\mathrm{,}}\; \cdots {\mathrm{,}}\; 24\} $ | 4 | $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_4}\left({F}_{1}\right)=1.0 $ | $\operatorname{Perm}_{{R}_4}(\emptyset)=0.50{\mathrm{,}} \quad \operatorname{Perm}_{{R}_4}\left(F_{15}^i\right)=\dfrac{0.50}{24}{\mathrm{,}}\; i=\{1{\mathrm{,}}\; 2{\mathrm{,}}\;\cdots {\mathrm{,}}\; 24\} $ | 5 | $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_5}\left({F}_{7}^{1}\right)=1.0 $ | $ \operatorname{Perm}_{{R}_1}(\emptyset)=0.25{\mathrm{,}} \quad \operatorname{Perm}_{{R}_5}\left(F_{15}^i\right)=\dfrac{0.75}{24}{\mathrm{,}}\; i=\{1{\mathrm{,}}\;2{\mathrm{,}}\; \cdots {\mathrm{,}}\; 24\} $ |
|
According to Fig. 1, PerMF of the $ i $th weightiness is $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{i}=\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{T}_i}\cup \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{R}_i} $. The conjunction of PerMFs is $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{{C}}=\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{1}\cup $$ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{2}\cup \cdots \cup \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{5} $, and the outcomes of MEOWA-based PT with different orness measures are
$ $ ()
where the bold belief is the decision result. In Example 1, $ \mathrm{O}\mathrm{r}\mathrm{n}=0.5 $ means that the effect of order information on decision making is disregarded, and $ \omega_{1} $ is designated as the perpetrator. As the orness measure increases, the designated perpetrator changes to $ \omega_{3} $. This suggests that the introduction of the orness measure can influence the decision outcome in certain cases, providing more flexibility in the decision-making process compared to TBM.
4 Data driven application of layer-2 TBM
4.1 Classification through TBM
In TBM, determining the category to which an instance belongs by fusing the mass functions of its attributes based on the fitted distribution of the training set is known as an attribute fusion-based classifier. Unlike probabilistic classifiers, evidential classifiers allow for reliability corrections to the mass functions generated from attribute distributions. Considering a multi-label classification task, the label domain is $ \varOmega=\{{\omega }_{1}{\mathrm{,}}\;{\omega }_{2}{\mathrm{,}}\;\cdots {\mathrm{,}}\;{\omega }_{{n}}\} $, and the training set is written as
$ {\cal T}{\cal S} = \left[ {} \right]\left[ {} \right] $ ()
where $ {x}_{t}^{j} $ represents the $ {j} $th attribute value of $ t $th instance, and $ {l}_{t} $ represents the label of $ t $th instance, $ {l}_{t}\in \varOmega $. Attribute $ j $ and label $ {\omega }_{i} $ can be represented by fitting a corresponding distribution of normal membership functions. When there are $ {n}_{i} $ instances in training set with label $ {\omega }_{i} $, the mean (${\mu }_{i{\mathrm{,}}j} $) and standard deviation (${{\textit{σ}}}_{i{\mathrm{,}}j} $) of corresponding normal membership function $ {f}_{i{\mathrm{,}}j} $ are
$ $ ()
Consider an input instance $ x=\{{x}^{1}{\mathrm{,}}\;{x}^{2}{\mathrm{,}}\;\cdots {\mathrm{,}}\;{x}^{k}\} $. Substitute the attribute value $ {x}^{j} $ into $ {f}_{i{\mathrm{,}}j} $ for each $ i\in \{1{\mathrm{,}}\;2{\mathrm{,}}\;\cdots {\mathrm{,}}\;n\} $, and compute the possibility distribution as $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{j}\left(\omega_{i}\right)=\dfrac{{f}_{i{\mathrm{,}}j}\left({x}^{j}\right)}{\underset{{o}=1:{n}}{\mathrm{max}}{f}_{o{\mathrm{,}}j}\left({x}^{j}\right)} $. $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{{j}} $ models the possibilities that the instance belongs to each label when the $ {j} $th attribute value is $ {x}^{j} $. Based on the canonical decomposition-based belief transformation method (CD-BFT) proposed in Ref. [37], the $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{j} $ can be represented on belief structure as $ {m}_{j} $. Repeat the substitution for the other attribute values to generate a set of mass functions $ {m}_{1}{\mathrm{,}}\;{m}_{2}{\mathrm{,}}\;\cdots {\mathrm{,}}\;{m}_{k} $. Then, combine these different attributes using CCR, and make the final decision through PPT to perform the classification. The flow chart of attribute fusion-based classifier through TBM is shown in Fig. 2 .

Figure 2.Flow chart of attribute fusion-based classifier through TBM.
Since the CD-BFT satisfies the combination rule consistency, if no additional contextual knowledge is considered, CCR of mass functions is equivalent to the conjunctive possibility distribution through t-norm. Consequently, the above uncertainty modeling method cannot leverage the structural advantages of belief structures, meaning that higher-order uncertainty representations have the same effect as singletons. Additionally, since the distribution of attribute values is typically not strictly Gaussian, modeling these values using Gaussian distributions may introduce some errors. Considering a three-label classification task, suppose that an attribute of two instances implements two possibility distribution: $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{1}=\left\{\mathrm{1.00{\mathrm{,}}\;0.10{\mathrm{,}}\;0.01}\right\} $ and $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{2}=\left\{\mathrm{1.00{\mathrm{,}}\;0.95{\mathrm{,}}\;0.05}\right\} $. For $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{1} $, it is evident that $ \omega_{1} $ is the most likely label for the instance. Given the significant gap between it and the second-highest possibility, the likelihood of misjudgment due to inaccurate distribution fitting is low. For $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{2} $, the possibilities of $ \omega_{1} $ and $ \omega_{2} $ are close, which increases the likelihood of error. Directly conjuncting multiple similar distributions can amplify this error, leading to incorrect categorization of the results. Based on the above, in this paper, we try to extend the classifier into layer-2 TBM to resolve these issues, which develops a new construction method for PerMFs that not only fully leverages structural advantages but also uses two layers to weaken the effect of errors in distribution fitting.
4.2 Classification through layer-2 TBM
Layer-2 TBM can model two types of information: Quantitative and qualitative. The belief updating for the former is handled using a product-based method, while for the latter, a weighted averaging method is employed. As discussed earlier, for samples with large possibility gaps, fitting errors are usually minimal, and quantitative numerical modeling helps preserve their propensities. For samples with smaller possibility gaps, where fitting errors are more likely, qualitative order modeling helps mitigate the impact of these errors in subsequent fusion. Based on the above idea, we extend the attribute fusion-based classifier into layer-2 TBM.
The flow chart of the attribute fusion-based classifier using layer-2 TBM is shown in Fig. 3. After generating the possibility distributions $ \mathrm{P}\mathrm{o}\mathrm{s}\mathrm{s} $ in the TBM version, the subsequent process is as follows:

Figure 3.Flow chart of the attribute fusion-based classifier through layer-2 TBM.
1) Transform possibility distribution to probability distribution through normalization: $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}=\dfrac{\mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{j}\left({\omega }_{i}\right)}{\displaystyle\sum _{{\omega }_{n}\in \varOmega}\mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{j}\left({\omega }_{n}\right)} $.
2) Reorder the probability distributions in descending order, $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}:\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(t\right)=\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}\left(\zeta \left(\mathrm{t}\right)\right) $, where $ \zeta \left(\mathrm{t}\right) $ indicates the element with the $ t $th largest probability.
3) Supposing $ k=n $, calculate the difference of orness measures $ \mathrm{O}\mathrm{R} $, which represents the attitude characteristic of an ordered distribution,
$ {\Delta }\mathrm{O}\mathrm{r}\mathrm{n} =\mathrm{O}\mathrm{R}\left(\left[\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(1\right){\mathrm{,}}\;\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(2\right){\mathrm{,}}\;\cdots {\mathrm{,}}\;\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(k\right)\right]\right)-{\mathrm{OR}}\left(\left[\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(1\right){\mathrm{,}}\;\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(2\right){\mathrm{,}}\;\cdots {\mathrm{,}}\;\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(k-1\right){\mathrm{,}}\;0\right]\right) $ ()
where $ \mathrm{O}\mathrm{R}\left(w\right)=\dfrac{\displaystyle\sum _{{i}=1:\left|w\right|}\left(\left|w\right|-i\right)\cdot {w}_{i}}{\left|w\right|-1} $.
4) If $ {\Delta }\mathrm{O}\mathrm{r}\mathrm{n} < \beta $, delete the last probability from $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S}\left(k\right) $ and normalize the remaining probabilities. Decrement $ k $ by $ 1 $ and repeat Step 3. If $ \mathrm{\Delta }\mathrm{O}\mathrm{r}\mathrm{n} > \beta $ or if there is only one probability left in $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S} $, proceed to Step 5.
5) If there is only one probability left in $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S} $, construct $ {m}_{j} $ using $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{j} $ and CD-BFT, then transform the mass function to PerMF $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j} $. If $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}^{S} $ has more than one probability, it has $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j0}\equiv \{\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j0}\left(\left(\zeta \left(1\right){\mathrm{,}}\;\zeta \left(2\right){\mathrm{,}}\;\cdots {\mathrm{,}}\;\zeta \left(k\right)\right)\right)=1\} $, and $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j}=\mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j0}\cup \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j{\varnothing }} $, where
$ $ ()
$ r=1-\frac{-\displaystyle\sum _{ \omega \in \varOmega}\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}\left(\omega_{i}\right)\mathrm{log}\,\mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{j}\left(\omega_{i}\right)}{\mathrm{log}\,n} $ ()
6) Combine the implemented $ \mathrm{P}\mathrm{e}\mathrm{r}{\mathrm{m}}_{j} $ for $ j\in \{1{\mathrm{,}}\mathrm{ }2{\mathrm{,}}\; \cdots {\mathrm{,}}\mathrm{ }{k}\} $ using $ \mathrm{l}2 $-CCR, and identify the label of the instance using MEOWA-based PT.
In the above steps, three hyper-parameters need to be determined: $ \beta $, $ {\alpha} $, and the orness measure in MEOWA-based PT. $ \beta $ is used to decide whether to remove the last probability from $ \mathrm{P}\mathrm{r}\mathrm{o}{\mathrm{b}}_{{j}}^{{S}} $. When $ \mathrm{\Delta }\mathrm{O}\mathrm{r}\mathrm{n} $ is less than $ \beta $, it indicates that the smallest probability measure has slightly influence on the overall distribution, meaning the gap with the other elements is large and can therefore be removed. When the deletion process leaves only one element, it indicates that most of the distribution’s support is concentrated on that particular element, similar to the situation where $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{1}=\{1.00{\mathrm{,}}\mathrm{ }0.10{\mathrm{,}}\mathrm{ }0.01\} $. In this case, there is a low likelihood of fitting errors, making it preferable to use CD-BFT to construct the information distribution. When the number of remaining elements is greater than one, it suggests that multiple elements have similar support, akin to the case where $ \mathrm{P}\mathrm{o}\mathrm{s}{\mathrm{s}}_{2}=\{1.00{\mathrm{,}}\mathrm{ }0.95{\mathrm{,}}\mathrm{ }0.05\} $. In this case, the elements with similar possibilities are grouped into an ordered focal set, where order is used to represent the weak propensity. $ {\alpha} $ is used to adjust the degree of discounting to the PerMF; as the degree of discounting increases, the PerMF will assign a greater amount of belief to the vacuous set, up to a maximum of $ r $. $ \mathrm{O}\mathrm{r}\mathrm{n} $ in MEOWA-based PT is used to adjust the influence of order information in decision making.
4.3 Experiments on real dataset
To verify the advantages of the layer-2 TBM classifier over the TBM classifier, we compare their performance on real datasets. The parameters of the datasets are shown in Table 3.

Table 3. Parameters of the datasets.
Table 3. Parameters of the datasets.
Dataset | Instance | Label | Attribute | Iris | 150 | 3 | 4 | Wine | 178 | 3 | 13 | Seed | 270 | 3 | 7 | Breast cancer | 569 | 2 | 30 | Sonar | 208 | 2 | 60 | Iono | 351 | 2 | 34 |
|
Using random seeds 1−20 from sklearn, we perform five-fold cross-validation and take the average accuracy as the classification accuracy. The performance of TBM, layer-2 TBM, and random permutation set reasoning (RPSR) [25] are shown in Table 4. For layer-2 TBMs, the hyper-parameters $ \beta $, $ {\alpha} $, and $ \mathrm{O}\mathrm{R} $ are indicated in parentheses, respectively.

Table 4. Comparison of experimental results of three classifiers.
Table 4. Comparison of experimental results of three classifiers.
Dataset | TBM | RPSR | Layer-2 TBM | Iris | 95.29% | 95.99% | 95.54% (0.25, 1.00, 0.80) 95.61% (0.30, 1.00, 0.80) 95.57% (0.35, 1.00, 0.80) | Wine | 97.20% | 98.04% | 97.55% (0.35, 0.00, 0.80) 97.84% (0.40, 0.00, 0.80) 97.55% (0.45, 0.00, 0.80) | Seed | 90.35% | 93.40% | 91.16% (0.33, 1.00, 0.80) 91.18% (0.33, 1.00, 1.00) 91.35% (0.33, 0.00, 1.00) | Breast cancer | 71.71% | 73.27% | 93.66% (0.20, 0.10, 0.80) 93.93% (0.20, 0.10, 0.50) | Sonar | 67.33% | 75.39% | 69.02% (0.01, 1.00, 0.80) 69.09% (0.01, 0.50, 0.80) 68.38% (0.01, 0.00, 0.80) | Iono | 80.86% | 87.97% | 84.72% (0.01, 0.00, 0.80) 81.55% (0.10, 0.00, 0.80) 82.08% (0.20, 0.00, 0.80) |
|
The experimental results in Table 4 show that the performance of layer-2 TBM is significantly better than that of TBM, which proves that the simultaneous consideration of qualitative and quantitative information can solve the previously posed issues to a certain extent, and can clearly show the advantages of layer-2 through the classification accuracy. Since there are many complicated processes of optimizing hyper-parameters in RPSR, we cannot reproduce the corresponding results, therefore, the results shown in Table 4 are those obtained by the original authors in their own experimental environment. Additionally, the results of RPSR correspond to those obtained using optimal hyper-parameters, while the layer-2 TBM is a rudimentary model that does not incorporate any contextual knowledge or optimization methods, and only uses randomly selected hyper-parameters. Therefore, although the results of RPSR are better than those of the layer-2 TBM, this does not indicate that layer-2 TBM is infeasible. The reason we include the experimental results of RPSR in the comparison is to provide the reader with a more objective understanding, demonstrating that the use of PES can offer additional advantages in uncertainty handling. The effectiveness of the method proposed in this paper is sufficiently demonstrated by its comparison with TBM.
5 Conclusion
This paper discusses RPST from the perspective of the TBM and develops a method for processing uncertain information using PerMF, termed the layer-2 TBM. Compared to the classical TBM, the proposed method leverages the higher-dimensional information representation of PES, interpreting the order of elements as a weak propensity for the values of uncertain variables. In the context of uncertain information processing, this paper first discusses the principles for handling information within the layer-2 TBM, followed by the implementation of the credal level and the pignistic level. A simple numerical example is provided to demonstrate the advantages of considering weak propensity in decision-making problems. More generally, the unique ordered focal sets in the layer-2 TBM are introduced into the TBM classifier, and a method is proposed for implementing PerMFs based on distribution fitting. This method is then used to perform classification tasks using the previously discussed information processing approach. The effectiveness of the proposed method is validated on real datasets.
In Ref. [29], Deng et al. also provided a perspective for extending evidential attribute fusion-based classifier into random permutation sets. However, this work focuses on achieving accurate classification results, and the interpretation of qualitative and quantitative uncertainty, as well as the link to TBM, is not discussed. In addition, the work suffers from many complex parameter optimization methods, in contrast to this paper’s emphasis on concise reasoning logic. Therefore, it is not necessary to discuss the differences between the two from an information processing perspective.
Future research directions for this work primarily encompass both theoretical and applied aspects. On the theoretical side, further exploration of information processing methods within the layer-2 TBM will be conducted, extending more classical methods and examining their feasibility and applicability. On the application side, the uncertainty modeling approach of the layer-2 TBM will be applied to a broader range of machine learning methods, offering a new perspective for information processing in uncertain environments.
Disclosures
The authors declare no conflicts of interest.