999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

A clustering based method to complete frame of discernment

2023-05-19 03:40:46WenrnYANGXindeLIYongDENG
CHINESE JOURNAL OF AERONAUTICS 2023年4期

Wenrn YANG, Xinde LI, Yong DENG,c,d,e,*

aInstitute of Fundamental and Frontier Science, University of Electronic Science and Technology of China, Chengdu 610054, China

bKey Laboratory of Measurement and Control of CSE, School of Automation, Southeast University, Nanjing 210096, China

cSchool of Education, Shannxi Normal University, Xi’an 710062, China

dSchool of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1211, Japan

eDepartment of Management, Technology, and Economics, ETH Zurich, Zurich 8092, Switzerland

KEYWORDSDempster-Shafer evidence theory;Generalized evidence theory;Information fusion;K-means clustering;Open world assumption;Target recognition

AbstractWhen the existing information does not contain all categories,the Generalized Evidence Theory (GET) can deal with information fusion.However, the question of how to determine the number of categories through GET is still intriguing.To address this question, a modified kmeans clustering, named centers initialized clustering is proposed, filling the gap of identification and complement of the frame of discernment.Based on this clustering method, the number of categories is determined.The initialized centers selected by center density keep the cluster results constant, enhancing the stability of clustering results.Besides, constructing Generalized basic Probability Assignment (GBPA) modules in a conservative way improves the reliability of the results.The mass of empty set in combined GBPAs is the indicator of the number of categories.Experiments on real and artificial data sets are conducted to show the effectiveness.

1.Introduction

Dempster-Shafer evidence theory (DS theory)1,2has many advantages in dealing with uncertainty and unknown information, which requires less information than probability theory.3,4DS theory has been studied deeply through the past few years, such as probabilistic data modelling,5risk analysis,6,7evidential reasoning,8,9pattern recognition,10–12decision making13,14and the generalization of evidence.15,16

Although DS theory has been widely used, the conflict problem is still unsolved, that is, the combination results of highly conflict evidence are counter-intuitive.To address this problem, some new combination methods based on network,17,18gravity,19conflict measurement20and others21are proposed.However, these solutions above are based on the close world,which assumes that all kinds of targets are known,but the close world assumption conflicts with the open world reality.In some real world situations, target numbers are not constant.Take COVID-19 as an example, the number of species is changing all the time because of mutations, so the close world assumption is not reliable all the time.Therefore, when the existing information does not contain all categories, the incompleteness of the frame of discernment is one source of the conflict.

Based on the open world assumption, the Generalized Evidence Theory (GET) is developed.15The basic framework of mass function is extended by viewing the empty set as the representation of the unknown targets.

However, the determination of the exact number of categories is rarely studied.Recently, Liu and Deng found that the target number could be determined by Elbow method22which requires manual observation.Since the empty set in GET represents the unknown, the motivation of this work is to take advantage of GET and output the accurate number of categories.Furthermore, the process of dividing unlabeled samples into several classes is called unsupervised learning and is deeply studied in the field of machine learning.23Among them,typical algorithms are clustering algorithms,including kmeans clustering,24density peak clustering,25and adjacent propagation clustering.26Using clustering algorithms,it is possible to model the division of data into categories.

Therefore,the task of this paper is to complete the frame of discernment by clustering and generating GBPAs for data with incomplete frame.Through clustering, unlabeled data is separated into different clusters.Traverse all possible numbers of clusters and analyze the degree of data conflict through GET.Since the conflict is minimized when the number of types is correct, the correct number of types can be obtained.However,there is always a difference between the clustering results and the real classification and it is a challenge to make the clustering results as close to the real situation as possible.

In the following section, some necessary preliminaries are presented shortly.Then,the proposed clustering based method is demonstrated.To illustrate the effectiveness,experiments on Iris data set and simulated Gaussian data set are conducted.Finally, conclusions are stated.

2.Preliminaries

2.1.D-S evidence theory

To a given question, all hypotheses Hiconstruct the Frame of Discernment (FOD), denoted as Ω.

A Basic Probability Assignment(BPA)is to assign each element in power set P(Ω)a value,under the constrain of Eq.(3)and Eq.(4).

BPAs can be deemed as a generalized probability distribution.5,27,28Compared with the probability distribution, BPAs take the advantage of modeling uncertainty.29–31Many functions are proposed to handling BPAs, such as entropy function,32–34information volume35–37information quality measure,38complexity analysis39,40and uncertainty measure.41–44For two BPAs, the combination rule in DS theory is to combine them to a new BPA, which is represented in Eq.(5) and Eq.(6).

Information fusion is important and widely used.45,46In addition, the inverse process of fusion, the de-combination method, is also proposed.47

2.2.Generalized evidence theory

There are many mathematical models to explain evidence theory.48For example, a geometric approach is proposed to explore evidence theory.49Nevertheless, an intriguing and longtime unsolved problem is the conflict management.50,51A typical way is to consider the reliability of evidence.52–54On the other hand,the incomplete frame of discernment is also an important reason to cause conflict.To address this issue,an open world assumption, called transferable belief model, is developed.55,56In addition, another model to deal with the open world is GET.15In D-S evidence theory, m(?)should always be zero.However, in GET, m(?)is not necessarily zero,and is treated as other focal elements.GET combination rule is shown in Eqs.(7)–(10).

2.3.K-means clustering

K-means clustering24classifies unlabeled data into different categories.To divide n samples to k clusters, K-means clustering assigns each sample to the nearest cluster center.Then centers are updated with the mean of each cluster.The calculation steps are shown in Algorithm 1.

Algorithm 1 K-means clustering.

3.Proposed method

The proposed method is to make use of m(?)and estimate the number of targets.Traverse the number of targets from 2 to N,and calculate the corresponding m(?).N is the maximum number of possible species we believe.The proposed method can be divided into two main steps.Firstly,categorize samples into different groups based on distribution features.Secondly,construct GBPA models and take m(?)as an indicator to determine the possible number of targets.Fig.1(a) visualizes the above steps, and the diagram of centers initialized clustering is shown in Fig.1(b).

3.1.Centers initialized clustering

One popular method to classify samples is K-means clustering.However, this method has a drawback that the clustering results keep changing during the repeated experiences because of the randomly selected initial centers.Inspired by density peak clustering 25,our method initializes centers with samples having high density and large distances from other regions of higher density.

In our method, the density of each point is represented by Eq.(11), where dijdenotes the distance between point i and j.

Since the higher the density, the more likely the data is a cluster center.The point with the biggest q is firstly chosen as an initial center and added to center set Cse.Then, the remains (k-1)centers are chosen by the consideration of both the density of itself and the distribution of centers, that is,the newly added center should be away from existing centers and have high density.The distance between point i and the nearest center is denoted by γi, then the selected center is to maximum q ?γ.Since the choice of centers is based on the order of q ?γ, the set of initial centers just add a point when the cluster number k increases by one.Moreover, the clustering result is constant because of the constant initial centers.The details are shown in Algorithm 2.

With Algorithm 2, the randomly selected initial centers in Algorithm 2 are replaced by centers selected based on density.This change makes the output of clustering remain unchanged in the repeated experiences and make full use of the distribution features of data sets.

Fig.1 Diagram of the proposed method.

3.2.Generate GBPA

The most popular method to generate GBPA is shown in Ref.22.Since parameters of GBPA models totally come from cluster results, some misclassified points have a great influence on GBPA models.In our method,GBPAs are generated in a conservative way.The details are introduced in the following.

With centers initialized clustering, samples are divided into k clusters.Usually,each cluster is used to generate a fuzzy triangle57directly.In this article, the points, whose distance to the nearest center point is greater than the ninth quantile,are ignored in constructing fuzzy triangles as shown in Fig.1(b).The generated fuzzy triangles are supposed to represent the category information based on clustering.However, the clustering result is not all reliable since the boundary point between class A and B with closer distance to center A owns the possibility that it actually belongs to class B.The misclassified point makes the parameters of both fuzzy triangle A and B inaccurate.Therefore, in our method, the points away from centers, with high possibility to be misclassified, are discarded and interpreted as noise.

With k fuzzy triangles {T1;T2;???;Tk}, k intersections{x1;x2;???;xk}are available.{ω1;ω2;???;ωk}are intersections in descending order.Assuming there exists only one positive intersection ω, the degree of belonging to the proposition P is shown in Eq.(12).

Then, the generated GBPA is shown in Eq.(16) and Eq.(17).If the sum of the belonging degrees is greater than one,then normalize directly;if it is less than one,assign the part less than one to the empty set.

Algorithm 2 Density based center finding.

3.3.Combine GBPAs and produce m(?)

Given a data set with n samples and s attributes,each attribute corresponds to a GBPA model.Therefore, a sample with s attributes generates s GBPAs m1;m2;???;ms.Each GBPA represents the degree of membership to subsets from the perspective of each attribute.

Combine s GBPAs with Eq.(18),where ⊕denotes the combination rule illustrated in Eq.(7).Since m(?)is an indicator of conflict degree, if cluster number is larger than real kinds,the increased misclassified samples will increase the value of m(?).

3.4.Iteration

Using the above steps,when the number of clusters k is changing from 2 to a relatively large value, each k corresponds to a value of m(?)respectively.If k is smaller than true species,the empty set value m(?)of the generated GBPA will be larger than the correct clustering.If k is greater than true species,the empty set value m(?)of the generated GBPA will decrease but the m(?)of the combined GBPA will increase,because of the increased degree of conflict.Therefore, m(?)will be minimum if the number of species is correct.

4.Experiments

In the assumption of close world, evidence theory takes the advantage of efficient modeling uncertainty58–60and information fusion.61,62It has been widely used in classification,63–65decision-making systems,66,67autonomous driving,68human reliability analysis69,70and many other engineering applications.71–74However, the real application is often based on the assumption of open world.We construct experiments on Iris data set and simulated Gaussian data set to show the effectiveness under the assumption of incomplete discernment.

4.1.Experiments on Iris data

The experiments are constructed through the following steps.Firstly, centers initialized clustering is used to separate samples.GBPA generation and combination method are used to produce the final m(?).Different numbers of clusters result in different values of m(?).Finally, the number of types is predicted based on m(?).

Iris data set75consists of 150 samples with 4 attributes,sepal length,sepal width,petal length and petal width,shorten as SL,SW,PL and PW.There are 3 categories,setosa(a),versicolar (b), virginica (c), and 50 samples in each category.

First of all, all samples are normalized to [0;1] using

Assuming the known categories are a and b, the FOD is{a;b}.2 center points are selected by Algorithm 2.The randomly selected initial centers in classical k-means clustering are replaced by these two points.The clustering result is used to represent the distribution feature of data and construct GBPA models.However, since clustering results always contain some misclassified points,a relatively conservative method is proposed.The outer points, whose distance to the nearest center point is greater than the ninth quantile, are ignored in generating GBPA.

Fig.2 Cluster results shown in attribute SL, SW, PL and PW.

Table 1 Specific fuzzy triangle feature information of Iris data set.

Distribution features of Iris data are shown in Fig.2 and Fig.2(a) demonstrates data in the perspective of attribute SL and SW.Fig.2(b) is from the perspective of attribute PL and PW.The outer points, the gray points in figures, are far from centers and likely to be the misclassified points.Therefore,these gray points,whose distance to center points is larger than the ninth quantile, are discarded in constructing GBPA modules.

Then, the clustering result is used to construct fuzzy triangles.For each cluster, the maximum, average and minimum values of each attribute are used in Eq.(20).Table 1 shows the specific fuzzy triangle feature information after omitting outer points.

The generated GBPA is shown in Fig.3.Fig.s 3(a)–(d)correspond to attribute SL, SW, PL and PW respectively.

The m(?)of 4 attributes is calculated and combined in Table 2.

Fig.3 GBPA model for attributes SL, SW, PL and PW with 2 clusters.

Table 2 Sum of m().

Then, change the data into 3 clusters and repeat the above steps.The generated GBPAs are demonstrated in Fig.4.Compared with Fig.3, the increased number of clusters decreases the value of m(?).

Increase the number of clusters from 2 to 8 and repeat the generation and combination steps.The change of m(?)along with cluster numbers is illustrated in Fig.5.

As the cluster number k increases, the overall trend of the m(?)of each attribute decreases, since the increased number of triangles occupies more space and the space labeled as unknown decreases.However, the combined m(?)reaches the minimum when samples are clustered into 3 categories.The conflict degree is small when samples are classified correctly.If the number of categories is larger than real kinds,the increased number of misclassified samples will lead to an increased degree of conflict and manifested in the increase of the m(?)value.Since the experiment result reaches the minimum when the cluster number is three, the complete frame of discernment is supposed to contain three elements.

Fig.4 GBPA model for attributes SL, SW, PL and PW with 3 clusters.

Fig.5 Changing of m(?)with cluster numbers.

In order to filter out outliers while retaining the category features, the defined criterion for judging a point as an outlier is that its distance from the nearest center point is greater than the ninth quantile of the distances.The ninth quantile,denoted as q=0.9,corresponds to results shown in Fig.5.The increase of q brings the risk of using misclassified samples, while the decrease of q increases the possibility of missing correctly classified points and destroying category features.The experiment results corresponding to q changing from 0.85 to 0.95 are shown in Fig.6.

With different q, the overall trends of m(?)in combined GBPAs are all increasing.With the decrease of q, the number of outliers screened out increases, therefore, the established GBPA model loses some data features,resulting in the gradual increase of conflict degree after fusion.But only when q ?{0.90;0.91;0.92}, m(?)reaches minimum when cluster number is 3.When q takes other values, the minimum points all occur at the beginning.If q is larger than 0.92, undeleted outliers cause m(?)to keep increasing.If q is less than 0.90,the incorrectly removed points result in the loss of data feature, making the conflict degree large even when samples are separated into the right number of clusters.In conclusion,the choice of q is essential and it is reasonable to choose q ?[0.90;0.92] for Iris data set.

Fig.6 Changing of m(?)in combined GBPAs with cluster numbers.

4.2.Experiments on simulated Gaussian data set

The experiments on simulated Gaussian data set are constructed through the following steps.Firstly, generate data set with pre-set parameters.Then,centers initialized clustering is used to cluster data.GBPA generation and combination methods are used to produce the final m(?).Different numbers of clusters result in different values of m(?).Finally,the number of types is predicted based on m(?).

Since the degree of conflict between GBPAs is revealed through the combination process, the dimension of simulated data is set to 3, so that the generation of m(?)contains two combination processes.

The artificial Gaussian data set consists of 4 distributions with the following characteristics:

where I is identity matrix.Fig.7 demonstrates the generated data from × and y dimension.

The clustering result is shown in Fig.8.Each cluster along with a center denoted as a black point, some outer points labeled as gray points are discarded in generating GBPAs.

Fig.7 A Gaussian data set of 800 instances.

Fig.8 A Gaussian data set with centers and outer points.

Fig.9 GBPA model for attribute x.

Fig.10 Changing of m(?)with cluster numbers.

The generated GBPAs from the perspective of × is shown in Fig.9.

Iterate the process of clustering and generate the corresponding m(?).The results are shown in Fig.10.Just as the result of the Iris data set,the overall trend of m(?)each attribute is decreasing.The value of m(?)also reaches the minimum when cluster number k equals the real number of categories.

5.Conclusions

A clustering based method to determine the number of categories and complete FOD is presented above.Firstly, data samples are clustered into several categories with initialized centers chosen based on density peaks.Then,the cluster results are used to generate GBPAs, after omitting outer points.Finally, the FOD is determined by the value of m(?)in combined GBPAs.The effectiveness is illustrated by both real data set and simulated data set.However,the preset value to define outliers endows an impact on final results.Hence,how to give a data-driven threshold requires further research.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The work was partially supported by the National Natural Science Foundation of China(No.61973332),the JSPS Invitational Fellowships for Research in Japan (Short-term).

主站蜘蛛池模板: 欧美精品一区在线看| 最新痴汉在线无码AV| 免费人成网站在线高清| 国产在线一区二区视频| 欧类av怡春院| 一级毛片免费观看久| 久久天天躁狠狠躁夜夜2020一| 亚洲av日韩av制服丝袜| 亚洲一区二区三区香蕉| 无码电影在线观看| 国产精品 欧美激情 在线播放| 高清无码手机在线观看| a级免费视频| 亚洲AV免费一区二区三区| 国产超碰一区二区三区| 国产产在线精品亚洲aavv| 亚洲国产成熟视频在线多多| 国产凹凸视频在线观看| 国产区福利小视频在线观看尤物| 午夜福利视频一区| 中文精品久久久久国产网址| 97综合久久| 久久国产精品影院| 久久永久免费人妻精品| 欧美精品1区| 久久成人免费| 在线观看国产黄色| 国产精品成| 激情综合网激情综合| 激情成人综合网| 亚洲啪啪网| 国产99视频免费精品是看6| 国产凹凸一区在线观看视频| 国产精品精品视频| 国产在线精彩视频论坛| 欧美日韩一区二区在线播放 | 国产成人免费观看在线视频| 又粗又大又爽又紧免费视频| 91人人妻人人做人人爽男同 | 欧美日韩国产精品综合 | 日韩精品中文字幕一区三区| 欧美综合区自拍亚洲综合天堂| 午夜日本永久乱码免费播放片| 日本黄色a视频| 第九色区aⅴ天堂久久香| 一级在线毛片| 夜夜爽免费视频| 亚洲一区第一页| 97久久免费视频| 亚洲Av综合日韩精品久久久| 欧美亚洲中文精品三区| 国产午夜小视频| 人人艹人人爽| 国产精品乱偷免费视频| 99激情网| 国产精品对白刺激| 亚洲永久色| 精品一区二区无码av| 啪啪啪亚洲无码| 99热国产在线精品99| 91久久偷偷做嫩草影院精品| 在线亚洲小视频| 最新国产麻豆aⅴ精品无| 亚洲女人在线| 国产中文一区a级毛片视频| 久久窝窝国产精品午夜看片| 国产成人无码Av在线播放无广告| 欧美视频在线观看第一页| 蜜桃视频一区二区三区| 久草性视频| 国产在线观看91精品亚瑟| 久久综合久久鬼| 岛国精品一区免费视频在线观看| 国产高清不卡| 欧美日韩资源| 午夜无码一区二区三区| 国产精品尤物在线| 色天天综合| 精品国产成人av免费| 久久精品只有这里有| 天堂成人在线| 在线免费看黄的网站|