999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Multi-Cluster Feature Selection Based on Isometric Mapping

2022-01-26 00:36:20YadiWangZefengZhangandYinghaoLin
IEEE/CAA Journal of Automatica Sinica 2022年3期

Yadi Wang,Zefeng Zhang,and Yinghao Lin

Dear editor,

This letter presents an unsupervised feature selection method based on machine learning.Feature selection is an important component of artificial intelligence,machine learning,which can effectively solve the curse of dimensionality problem.Since most of the labeled data is expensive to obtain,this paper focuses on the unsupervised feature selection method.The distance metric of traditional unsupervised feature selection algorithms is usually based on Euclidean distance,and it is maybe unreasonable to map high-dimensional data into low dimensional space by using Euclidean distance.Inspired by this,this paper combines manifold learning to improve the multi-cluster unsupervised feature selection algorithm.By using geodesic distance,we propose a multi-cluster feature selection based on isometric mapping (MCFS-I) algorithm to perform unsupervised feature selection adaptively for multiple clusters.Experimental results show that the proposed method consistently improves the clustering performance compared to the existing competing methods.

Related work:With the rapid development of data and knowledge management technologies,the amount of data collected in various application areas is growing exponentially.Feature selection method[1] is conducive to reduce dimensionality,remove irrelevant data,and improve resultant learning accuracy of the high-dimensional data.In most of the tasks,the labeled data is often difficult to obtain,which increases the difficulty of the feature selection task.Unsupervised feature selection can be used to process the data without labels which make it better for the distance-based clustering tasks.

The feature selection algorithm could be categorized into three types: filter [2],[3],wrapper [4],and embedded [5].The filter methods first pretreated the data and then throws the processed data into the model for training.The wrapper methods are to continuously optimize the selection by the feedback of the subsequent model.Embedded methods select a feature subset in the learning stage.According to whether or not the label assist feature selection process,the feature selection algorithms can be divided into supervised feature selection [2]–[5] and unsupervised feature selection [6],[7],heuristic-based feature selection exploration is also a very important direction in the feature selection.

The unsupervised feature selection algorithm multi-cluster feature selection (MCFS) [7] could preserve the multi-cluster structure of the data to make it beneficial to multi-cluster tasks.The MCFS algorithm is an unsupervised feature selection algorithm based on manifold learning.It first uses Laplacian Eigenmaps (LE) [8] algorithm to embed the high dimensional manifold data into low-dimensional space,and then process the embedded feature matrix.The LE algorithm uses the Euclidean distance to establish ap-nearest neighbor map.The geodesic distance is the shortest distance that two points on the hypersurface travel along the surface of the hypersurface in high-dimensional space.In dealing with the hypersurface in high dimensional space,when the hypersurface close to the plane,the distance between two points in the low dimensional space approximate the Euclidean distance.When the highdimensional space hypersur face bend degree is large,the geodesic distance can preserve the global structure commendably [9].By introducing the geodesic distance into MCFS,this paper proposes a novel algorithm called multi-cluster feature selection based on isometric mapping (MCFS-I).

Multi-cluster feature selection based on isometric mapping:

1) Manifold data embedding:Manifold data refers to hypersurfaces composed of data in a high-dimensional space.Due to the high dimensional space where these data is located has redundant information,manifold learning [10] is to expand manifold data in high-dimensional space and embed them into low-dimensional subspace.Most manifold learning algorithms use Euclidean distance to establishp-nearest neighbor map to process data,which can only preserve the data in the local manifold structure embedded into low dimensional subspace,and isometric mapping (IsoMap) [9] based on the geodesic distance can well preserve the global structure of the data.

Firstly,IsoMap applies property that the manifold data is locally homeomorphic to Euclidean space.By calculating the Euclidean distance of all data points,its adjacent points can be obtained.After that,each point is connected to its adjacent points,and the geodesic distance between two points is their shortest path in thep-nearest neighbor graph.In this way,the classical shortest path algorithm Dijkstra or Floyd algorithm can be used to approximate the real geodesic distance with the shortest path.Therefore,the global structure of the manifold data can be well preserved,and the global structure can be retained to the maximum extent after the data is embedded in the low-dimensional space.After the distance between any two points is obtained,the multiple dimensional scaling (MDS)algorithms [11] can be used to calculate the coordinates of each data point in the low-dimensional space.The process of IsoMap is shown in Algorithm 1.

2) Learn the sparse coefficient vector:The MCFS-I algorithm is composed of IsoMap algorithm and Lasso regression,the feature matrix is embedded into the low-dimensional space based on geodesic distance and MDS algorithm.Since the dimensions in the low-dimensional space are usually equal to the cluster number,each dimension in the low-dimensional space corresponds to each clustering structure.Based on this,Lasso regression is used to fit the embedding matrix.

When the embedding matrixY∈RN×Kis obtainedby the I so Map algorithm,the feature matrixX∈RN×Mshould benor malizedto make the various feature metric in the feature matrix consistent.A sparse coefficient vector is obtained by fitting each columnykofYwith Lasso regression [12] :

whereakis theM-dimension coefficient vector.Due to the punitive ofL1regularization,whenβis large enough,some coefficients will be reduced to 0 precisely.Since the sparsity of Lasso regression,a sparse matrix will be automatically obtained in the process of solving Lasso regression.Because the metric of features is unified before Lasso regression,the larger the coefficients of some features are,the greater their contribution to resolving cluster structure will be.Moreover,the combination of several features with relatively weak influence can better distinguish different clusters,this property will be ignored when evaluating these features individually.Lasso regression is a combination solution of various features rather than the independent evaluation of features.Therefore,we chose Lasso regression instead of evaluating the contribution of each feature individually.The equivalent form of Lasso regression can be expressed as follows:

It is difficult to control the sparsity of coefficient matrix precisely in Lasso Regression,while the least angel regression (LARs) [13]algorithm can solve (2) effectively by entering the number of nonzero items inak,the sparsity of coefficient matrix can be easily controlled.Hence,we use LARs algorithm to solve (2).

3) Unsupervised feature selection:By combining IsoMap algorithm and Lasso regression,MCFS-I algorithm embedded the feature matrix of the data into the low-dimensional space according to the geodesic distance,and obtains the low-dimensional representation of the clustering structure.The importance of features is comprehensively measured by Lasso regression,and then thedfeatures with the highest score can be selected by scoring the results of Lasso regression.

Ksparse coefficient vectorsak∈RM,k∈{1,2,...,K} can be obtained by Lasso regression in Section II-B,eachakcorresponds to a cluster,and each item inakrepresents a feature.Since the data is normalized first,the largerak,jis,the greater contribution of thejfeature to thekcluster.Since each feature contributes differently to different clustering structures,it is natural to choose its maximum contribution value as the selection criterion for each feature,which is recorded as MCFS score [7] and it can be defined as follows:

whereak,jis thejelement ofak.After that,the MCFS scores of all features are ranked in descending order,anddfeatures with the largest MCFS score will be selected as the output result of the MCFS-I algorithm.The detailed process of the MCFS-I algorithm is shown in Algorithm 2.

Experiments:In this section,we will evaluate the performance of the MCFS-I algorithm in clustering tasks.We compared the following four algorithms:

● For the proposed MCFS-I algorithm,the nearest neighbors parametern_neighboris set to 5,and the dimension of the embedding matrixn_embis set to be the same as the number of clusters on the data set [14].

● MCFS algorithm [7] uses LE [8] algorithm to process manifold data.

● Laplacian (Lap_score) algorithm [6] selects the data that preserves the local manifold structure.

● Nonnegative discriminative feature selection (NDFS) algorithm[15] combines spectral clustering with the unsupervised feature selection process.

1) Data sets: We selected four commonly used benchmark data sets to test the algorithm separately,which are from scikit-learn and scikit-feature.The detailed information of these data sets is shown in Table 1.

Table 1.Data Sets Used in the Experiment

2) Performance comparisons:In our experiment,we use normalized mutual information (NMI) [6] to evaluate the clustering result on the data which are processed by feature selection algorithm.In each test,we repeat the experiments for five times.Each time we run the k-means clustering algorithm with a random starting point and evaluate it with the NMI evaluation metric.The average NMI value of the five algorithms is used as the comparison results.

The comparison of feature selection results is shown in Fig.1.It can be seen that the clustering performance of the MCFS-I algorithm on each data set is better than the MCFS algorithm,and the clustering result of the MCFS-I algorithm gradually reaches the optimal when the number of features is 50 to 100 on the Lung_small data set.

Table 2 shows the clustering results of each feature selection algorithm when the number of selected features is 100/30 (digits).The column of the “All Features” in the table indicates the performance of the k-means algorithm over the original datasets without using feature selection algorithm.It can be seen that the MCFS-I algorithm has shown the best performance on all data sets,and the MCFS-I algorithm performs better than that when all features are used on some of these data sets.The average improvement of the MCFS-I algorithm over the MCFS algorithm on the four data sets is about 3.206%,the average improvement for the Lap_score algorithm is about 14.822%,and the average improvement for the NDFS algorithm is about 7.704%.

It can be observed from Table 2 that the MCFS-I algorithm on the biological data set has a greater improvement than the MCFS algorithm,while the improvement on the other three image recognition data sets is small.This is because biological data sets have higher requirements for global information,while image data pays more attention to local information.

Table 2.The NMI of the Feature Selection Algorithms (%)

Conclusions:This paper proposes a multi-cluster feature selection based on isometric mapping (MCFS-I) algorithm by introducing the geodesic distance into MCFS.The proposed algorithm uses IsoMap to capture global information and embed it into a low-dimensional space,which is extremely beneficial to process the data with global information.In the experimental section,we compare MCFS-I with MCFS and two commonly used unsupervised feature selection algorithms Laplacian score and NDFS on four benchmark data sets,and the experimental results demonstrate the superiority of the proposed algorithm.

Acknowledgments:This work was supported by grants from the National Natural Science Foundation of China (62106066),Key Research Projects of Henan Higher Education Institutions(22A520019),Scientific and Technological Project of Henan Province (202102110121),Science and Technology Development Project of Kaifeng City (2002001).

Fig.1.NMI of the different feature selection algorithms on four data sets.


登錄APP查看全文

主站蜘蛛池模板: 日韩人妻无码制服丝袜视频| 精品無碼一區在線觀看 | 无码人妻热线精品视频| 久久黄色一级视频| 成人亚洲天堂| 黄色片中文字幕| 欧美不卡视频在线| 爆乳熟妇一区二区三区| 欧美无遮挡国产欧美另类| 欧美另类图片视频无弹跳第一页| AV在线麻免费观看网站| 91麻豆精品国产高清在线| 在线观看国产小视频| 最近最新中文字幕免费的一页| 亚洲一区精品视频在线| 亚洲成综合人影院在院播放| 一级毛片在线免费视频| 日韩高清中文字幕| 国产综合在线观看视频| 久久99精品久久久久久不卡| 精品国产自在现线看久久| 啪啪永久免费av| 91久久夜色精品国产网站| 国产成人福利在线视老湿机| 亚洲国产中文综合专区在| 噜噜噜综合亚洲| 国产91线观看| 凹凸精品免费精品视频| 草草影院国产第一页| 91破解版在线亚洲| 毛片视频网址| 日本免费精品| 91免费国产高清观看| www.91在线播放| 亚洲成综合人影院在院播放| 一区二区三区精品视频在线观看| 国产欧美在线视频免费| 久久亚洲中文字幕精品一区| 91无码网站| 又爽又大又黄a级毛片在线视频| 亚洲欧美成人| 亚洲人在线| 视频一本大道香蕉久在线播放| 精品伊人久久久大香线蕉欧美| 国产噜噜在线视频观看| 91精品福利自产拍在线观看| 久久国产亚洲欧美日韩精品| 亚洲经典在线中文字幕| 中文字幕人妻无码系列第三区| 日本黄网在线观看| 国产精品太粉嫩高中在线观看 | 久久性妇女精品免费| 国产日本欧美亚洲精品视| 亚洲 成人国产| 国内精品久久久久鸭| 成人福利免费在线观看| 亚洲一区第一页| 国产精品九九视频| 热九九精品| 久久亚洲中文字幕精品一区| 国产精品性| 国产成人无码AV在线播放动漫| 久久精品波多野结衣| 一级片免费网站| 久久综合色视频| 白丝美女办公室高潮喷水视频| 日韩精品高清自在线| 福利在线不卡一区| 99这里只有精品免费视频| av无码一区二区三区在线| 国产成人亚洲毛片| 久久无码av三级| 国产福利小视频在线播放观看| 国产视频入口| 精品人妻无码中字系列| 日韩一区精品视频一区二区| 免费国产无遮挡又黄又爽| 91小视频在线观看免费版高清| 亚洲美女久久| 日韩中文无码av超清| 日本在线视频免费| 无码一区中文字幕|