Feature Selection with a Local Search Strategy Based on theForest Optimization Algorithm

2019-12-19 07:38:20TinghuaiMaHonghaoZhouDongdongJiaAbdullahAlDhelaanMohammedAlDhelaanandYuanTian

Computer Modeling In Engineering&Sciences 2019年11期

Tinghuai Ma,Honghao Zhou,Dongdong Jia,Abdullah Al-Dhelaan,Mohammed Al-Dhelaan and Yuan Tian

1College of Computer and Software,Nanjing University of information science &Technology,Nanjing,China.

2College of Computer and Information Sciences,KingSaud University,Riyadh,Saudi Arabia.

3Nanjing Institute of Technology,Nanjing,China.

Abstract:Feature selection has been widely used in data mining and machine learning.Its objective is to select a minimal subset of features according to some reasonable criteria so as to solve the original task more quickly.In this article,a feature selection algorithm with local search strategy based on the forest optimization algorithm,namely FSLSFOA,is proposed.The novel local search strategy in local seeding process guarantees the quality of the feature subset in the forest.Next,the ftiness function is improved,which not only considers the classifciation accuracy,but also considers the size of the feature subset.To avoid falling into local optimum,a novel global seeding method is attempted,which selects trees on the bottom of candidate set and gives the algorithm more diversities.Finally,FSLSFOA is compared with four feature selection methods to verify its effectiveness.Most of the results are superior to these comparative methods.

Keywords:Feature selection,local search strategy,forest optimization,ftiness function.

1 Introduction

Feature selection methods are widely used in data mining and machine learning in order to improve the accuracy of classifeirs and accelerate the training speed of models with large amounts of input features[Migdady(2013);Rong,Ma,Cao et al.(2019)].From the original feature set,it allows us to eliminate irrelevant and strongly correlated redundant features and so to extract those valuable features[Paisitkriangkrai,Shen and Hengel(2016);Ma,Shao,Hao et al.(2018)].

Feature selection is a diffciult task due mainly to a large search space,where the total number of possible solutions is 2nfor a dataset withnfeatures[Xue,Zhang,Browne et al.(2016)].It aims to decide whether to choose the feature in the dataset or not.The task becomes more challenging whilenis increasing in many areas with the advances in the data collection techniques[Tzanis(2017)].As we all known,an exhaustive search for the best feature subset of a given dataset is practically impossible in most situations,even for a moderate-sized dataset [Wipf and Nagarajan (2016)].It is not excluded that quantum computing will probably provide enough computing power that makes violent search on today’s dataset possible in the future [Vandersypen and Leeuwenhoek (2017);Lv,Ma,Tang et al.(2016)].At that time,there is no doubt that bigger datasets will be available to scientists [Stuart (2016);Zhang,Ma,Cao et al.(2015)].An effective way to fnid suitable feature from huge datasets is randomly set search start point at frist [Melian and Verdegay(2011)].Then the question is how to optimize the solution rapidly and directly like gradient descent using in optimizing Deep Learning[Keskar,Mudigere,Nocedal et al.(2016)].According to the existing cognition,random search strategies coming up with approximately best solution are still our best choices for feature selection problem [Zhu,Xu,Chen et al.(2016)].

Our previous work on feature selection research obtained a method called FSFOACD[Ma,Jia,Zhou et al.(2018);Rong,Ma,Cao et al.(2019)].That method used the contribution degree strategy to improve the effciiency of the forest optimization algorithm.In another word,it used the measure of the relevance and redundancy to guide the search direction of the forest optimization algorithm.The search direction,to a certain extent,improves the performance of the forest optimization algorithm.However,there are still some defciiencies in FSFOACD.For example,the number of features selected by each tree in the forest in the initialization phase is randomly generated.During the initialization,if most of the random number of selected features of the trees are close to the number of features in the original dataset,the search locations of the trees in the forest will be far away from the optimal solution,which also results in the low effciiency of the algorithm at the beginning of the search.On the other hand,during the local seeding process,some neighbor trees are added to the forest,who is strictly limited by their parents.The limited number and randomness also cost this algorithm a comparatively long time to achieve a well-accepted feature subset.Only selecting one feature at a time cannot guarantee the quality of feature subsets and so the search effciiency is lower.

In this paper,a novel feature selection method with a local search strategy based on the forest optimization algorithm is proposed,namely FSLSFOA.Firstly,a feature subset size determination mechanism is used to initialize the forest optimization algorithm,which allows a small feature subset for trees in the forest during the initialization phase.In this way,more trees are in an ideal position at the beginning,since they are separated more evenly into the search space.This initialization strategy greatly improves the search effciiency of the algorithm.

Secondly,the search strategy of FSFOACD is utilized as well as improved.Each time when adding a neighbor node,the algorithm no longer selects single feature for evaluation.Instead,it selects a probabilistic number of high-quality features and an empirical number of low-quality features at a time.The new search strategy can not only maximize the quality of the feature subsets in the forest,but also improve the search effciiency of the algorithm to a great extent.

Thirdly,a new ftiness function to balance the classifciation accuracy and the dimensionality reduction ratio is proposed,since the core purpose of feature selection is to increase dimensionality reduction ratio on the premise of ensuring the classifciation accuracy.In order to verifying the effectiveness of our algorithm,10 widely-used datasets from UCI are selected and three classifeirs respectively from support vector machine (SVM)[Aburomman and Reaz (2016)],decision tree (DT) [Shirani,Habibi,Besalatpour et al.(2015)] and k-nearest neighbor (KNN) [Benmahamed,Teguar and Boubakeur (2018)].There are 4 feature selection methods are selected for comparison,including UPFS[Dadaneh,Markid and Zakerolhosseini (2016)],ANFS [Luo,Nie,Chang et al.(2018)],PSO(4-2) [Tran,Xue and Zhang (2014b)] and FSFOACD [Ma,Jia,Zhou et al.(2018)].The experiments show that most of the results are superior to previous feature selection algorithms we selected.

The rest of this paper is organized as follows:In Section 2,related works on feature selection are reviewed.Moreover,details about the proposed method are introduced in Section 3.In Section 4,the proposed algorithm is compared with the other existing feature selection methods.Finally,Section 5 summarizes the present study.

2 Related work

Feature selection involves two conflict objectives,which are to maximize the classifciation accuracy and minimize the number of features [Hancer,Xue,Zhang et al.(2017)].Therefore,feature selection can be treated as a multi-objective problem to fnid a set of trade-off solutions between these two objectives[Selvam,Kumar and Siripuram(2017)].An important problem of feature selection is the feature interaction (or epistasis) [Guyon(2003)].There can be two-way,three-way,or complex multi-way interactions among features [Gorelick and Bertram (2010)].A feature weakly relevant to the target concept can signifciantly improve the classifciation accuracy if it is used together with some complementary features[Anthimopoulos,Christodoulidis,Ebner et al.(2016)].In contrast,an individually relevant feature may become redundant when used together with other features[Sun,Goodison,Li et al.(2007)].

Based on the evaluation criteria,feature selection algorithms are generally classifeid into two categories:1) fliter approaches and 2) wrapper approaches [Ambusaidi,He,Nanda et al.(2016);Zawbaa,Emary,Hassanien et al.(2016)].Their main difference is that wrapper approaches include a classifciation/learning algorithm in feature subset evaluation,while fliter algorithms are independent of any classifciation algorithms [Tran,Zhang and Xue (2016)].Filters ignore the performance of the selected features,whereas wrappers not,which usually results in those wrappers getting better solutions[Yang,Liu,Zhou et al.(2013)].Choosing fliters or wrappers depends on your requirement for accuracy and time effciiency[Rodriguezgaliano,Luqueespinar,Chicaolmo et al.(2018)].

According to the fliter and wrapper categories,the unsupervised learning and supervised learning methods are adopted for feature selection.Forest Optimization Algorithm(FOA)and its improvments[Ma,Jia,Zhou et al.(2018)]are supervised methods.Also,Particle Swarm Optimization(PSO)[Tran,Xue and Zhang(2014b)]method is supervised mthods.While choosing features,it will consider the classifciation performance.Differently,on the other hand,UPFS [Dadaneh,Markid and Zakerolhosseini (2016)] and ANFS [Luo,Nie,Chang et al.(2018)]are belong to unsupervised method.The labels of the dataset are masked while using these two methods.Unsupervised methods are the ultimate goal of feature selection,and so they occupy an important position in this feild.

2.1 FOA

Forest Optimization Algorithm (FOA) is an classic supervised learning,which was proposed by Ghaemi et al.[Ghaemi and Feizi-Derakhshi (2014)].And we propsed the feature contribution to promote the effecient based on Forest Optimization Algorithm[Ma,Jia,Zhou et al.(2018)].

FOA as a evolutionary algorithms,which is suitable for optimization task.Ghaemi et al.found FOA can improve the classifciation accuracy comparing with other feature selection methods [Ghaemi and Feizi-Derakhshi (2016)].Fernández-García et al.compared the FOA with other feature selection method on Academic Data to foresee if students will fniish their degree after fniishing their frist year in college [Fernández-García,Iribarne,Corral et al.(2018)].

Although the FOA get the better result in above experiments,the search effciiency is still low and the algorithm complexity is relative higher.At the beginning,the random initiation of parameter can not ensure the best results.So,we will do future work to how to optimize the initiate the parameter and optimize the search strategies.

2.2 PSO

Feature selection is a combinatorial optimization problem,which can be solved by Particle Swarm Optimization (PSO) [Zhang,Wang,Sui et al.(2017)].Xue et al.studied on multi-objective particle swarm optimization (PSO) for feature selection,and it achieved comparable results with the existing well-known multi-objective algorithms in most cases[Xue,Zhang and Browne(2013)].In 2014,Tran et al.improved the PSO to classifciation problems with thousands or tens of thousands of features[Tran,Xue and Zhang(2014a)].At same time,Tran et al.proposed three new initialization strategies and three new personal best and global best updating mechanisms in PSO to develop feature selection approaches,which named as PSO(4-2)[Tran,Xue and Zhang(2014b)].

As the particle swarm in the search process will fall into the local optimal solution,there need solution to make the particles escape from the local optimal solution,and achieve the purpose of large data optimization.Usually,PSO can not assure the global optimization results.

2.3 UPFS

Unsupervised feature selection just evalue the similarity between features and remove the redundancy features therein.Mitra et al.proposed an unsupervised feature selection algorithm suitable for data sets,large in both dimension and size [Mitra,Murthy and Pal(2002)].Hou et al.frist combined the embedding learning and feature ranking together,and proposed joint embedding learning and sparse regression(JELSR)to perform feature selection[Hou,Nie,Li et al.(2014)].

Dadaneh proposed unsupervised probabilistic feature selection using ant colony optimization (UPFS) [Dadaneh,Markid and Zakerolhosseini (2016)].They utilized the inter-feature information that showed the similarity between the features that led the algorithm to decreased redundancy in the fnial set.As ant colony optimization is heuristic algorithm as FOA,we choose this method as reference to verify our proposed algorithm.

Luo et al.[Luo,Nie,Chang et al.(2018)] poposed an adaptive unsupervised feature selection with structurer regularization,which characterized the intrinsic local structure by an adaptive reconstruction graph and simultaneously consider its multiconnectedcomponents(multicluster)structure.

3 Methods

In order to solve part of the problems in related works,a feature selection algorithm with local search strategy based on forest optimization algorithm is proposed,namely FSLSFOA.There are quite lot of improvements and modifciations while comparing with FSFOACD.FSLSFOA involves fvie main stages:(1) Initialize trees,(2) Group features,(3)Local seeding,(4)Size limitation,(5)Global seeding.

3.1 Initialization based on feature subset size determination mechanism

Since the main purpose of feature selection is to fnid the minimum feature subset necessary and suffciient to identify the target,the optimal feature subset generally has fewer features selected.In the initialization phase,the starting location of the tree in the forest is hoped to be closer to the optimal solution.In other word,the difference between the optimal feature subset size and the initialized feature subset size is not very large.Probabilistic random function is utilized to determine the initial size of the feature subset.The probability formula is defnied as follows:

wherefrepresents the number of original features of the dataset,sfrepresents the number of selected features andLindicates the distance betweenfandsf,that isL=f -sf.P(sf) determines the initial probability of the number of selected features.According to Eq.(1),whensfis minimized,P(sf) reaches the peak.Next,the roulette gamble algorithm is utilized to determine the number ofsf.That is to saysfis randomly selected,its range of values is [m,M].The value ofmis set to 3,since it is an effciient and comprehensive lower bound ofsffor most of the dataset.It is also allowed to make necessary adjustments to some extremely simple datasets in practice.Besides,M=ε ?f.Wherefdepends totally on the given dataset andεis used to adjust the size ofM.Ifεis set to be close to 1,the value ofMwill certainly be close tof.This will make the search space very large,which leads to the increasing of the computation cost and lots of invalid feature subsets.Our goal is to select a signifciant subset of features with a small number of features.The discussion onεwill be explained in our experiment parameter setting section.The roulette gambling algorithm is designed as follows:

Algorithm 1 Roulette Gambling Algorithm 1:Randomly generated U between[0,1]2: sum=0 3:for each sf ∈[m,M]do 4:sum=sum+P(sf)5:if U＜sum then 6:break;7:end if 8:end for 9:return sf

3.2 Grouping strategy based on feature importance

After the initialization phase,all features are divided into high-quality feature groups and low-quality feature groups according to their importance.The purpose of this grouping is to provide a reference for the forest optimization algorithm when selecting features.Those high-quality features are distributed into the forest optimization algorithm,thereby improving the quality of the trees (character subsets) in the forest.Here the Pearson correlation coeffciient is applied to measure the correlation between different features[Mu,Liu and Wang (2017)].Here the Pearson correlation coeffciient is applied because of its simplicity and because it is simpler to manipulate as a univariate method [Coelho,Braga and Verleysen(2010)].Extending to the subsequent importance,this part aims at the frist round of coarse-grained partitioning of features.The defniition is as follows:

whereC(i,j) is the correlation coeffciient of featureiand featurej.fis the number of samples.xi(k)andxj(k)represents the value ofiandjin thekth sample.andrepresents the mean value ofxiandxjin the totalfsamples,respectively.According to Eq.(2),the correlation coeffciient between two features weighs the similarity between features.The higher the correlation coeffciient is,the higher the similarity between two features is.On the contrary,a lower correlation coeffciient means that the similarity between two features is also comparatively low.After calculating the correlation coeffciients for all possible combinations of features,the importance of featureiis calculated as follows:

whereC(i,c)represents the correlation coeffciient between featureiand class attributec.The larger the value ofC(i,c)is,the greater the influence of featureiis on the classifciation effect.Conversely,the smaller the value is,the less the influence of the feature on the classifciation effect will be.If the importance value of a feature is high,it means this feature has high correlation with the class attribute and its redundancy is low.On the contrary,if the importance value of a feature is low,it means this feature has low correlation with the class attribute and its redundancy is high.For the forest optimization algorithm to select higher-quality feature subsets in the process of exploring the optimal solution,all the features are sorted according to their importance values in descending order and divided into two groups.The frist half of features are placed in a high quality feature group (Group_high) and the second half feature in a low quality feature (Group_low).The importance of features inGroup_highis greater than those inGroup_low,and the features of both groups are sorted in descending order according to their importance value.Fig.1 is a schematic diagram of the feature grouping strategy.

Figure 1:Feature grouping based on the importance of features

3.3 Local search strategy

AfterGroup_highandGroup_lowproduced by the feature grouping strategy,local search strategy is utilized to select suitable features from the two groups.This strategy aims to provide better locations for forest optimization algorithms to further explore optimal solutions.The local search strategy mainly improves the optimal solution searching of forest optimization algorithm through"add"and"delete"operations.For short,the"add"operation is defnied as to select the desired number of features and use the "delete"operation is to remove a certain number of features from the current position of the tree.The"add"operation is to add high-class-correlation and low-redundancy features into the tree.The "delete" operation removes low-class-correlation and high-redundancy features from the tree.In the local seeding stage,a novel local search strategy is proposed to dynamically change the position of the neighbor tree by adding and deleting as many features as possible fromGroup_highandGroup_lowin each time of adding neighbor tree.

The specifci steps of the local search strategy are as follows:

Figure 2:Feature grouping based on the importance of features

Firstly,for each tree of age 0 in the forest,its features which equals 1 are put into a subsetT.This subsetTincludes all the randomly selected features at the beginning of local search strategy.As is shown in Fig.2,if the current tree is likeTree=[1,1,0,1,1,0,0,1,0,1,age= 0],a subsetTwith six features is obviously obtained,T=f1,f2,f4,f5,f8,f10.In this example,it is assumed thatGroup_high=f5,f3,f1,f7,f9andGroup_low=f8,]f4,f6,f10,f2,which should be calculated through grouping strategy in part 3.2.

Secondly,Tis grouped to two subsets,ThighandTlow,byGroup_highandGroup_low.Thighcontains high-quality features inT,which also included inGroup_high.On contrast,Tlowincludes low-quality features inT,which contains inGroup_low.ThighandTloware sorted in descending order according to their importance value respectively.Thigh=f5,f1andTlow=f8,f4,f10,f2can be easily inferred from Fig.2,becausef5,f1∈Group_highandf8,f4,f10,f2∈Group_low.

Thirdly,the "add" operation is used to select the desired features,and the "delete"operation is to remove a certain number of features from the current tree.How many features empirically should being inThighandTlowdepends on parameterαandβ,whereα=λ ?sf,β=(1-λ)?sf.Whereλis specifeid by the user andsfis initialized by the feature subset size determination mechanism in part 3.1.The parameterλis used to control the proportion of features selected fromGroup_highandGroup_low.The discussion onλwill be explained in our experiment parameter setting section.

If the number of high quality features in the tree,|Thigh|,is smaller thanα,there will beα -|Thigh|of high-quality features added into the current tree selected from the set{Group_high-Thigh},which means included inGroup_highand not included inThigh.Otherwise,if|Thigh|is larger thanα,add_numof high-quality features will be added into the current tree from the set{Group_high-Thigh}.The parameteradd_numis defnied as follow:

whereadd_numandyare natural number ranged in [1,|Group_high - Thigh|].|Group_high-Thigh|indicates the feature number of the set{Group_high- Thigh}.Random(1) is an uniformly distributed random variable ranged in (0,1].The functionSign(v)means ifv ＜0,vedwill be set to 1,which announces thatvwill be withdrawn from the competition ofargmin(·).Herevrepresents the value inside functionSign(·).Theargmin(·)help us obtain theadd_numthat minimize the result ofSign(·).

wheredel_numandzare natural number ranged in [1,β -|Tlow|],and the defniition ofRandom(1) andSign(x) are same with them in Eq.(4) and Eq.(5).According to this step,the sample in Fig.2 fnially comes to theNewTree=[1,0,1,1,1,0,0,1,0,0,age=0].Besides,the old tree’s age should grown up to 1 in this sample.Note that the "add"operation always selects the highest-quality feature,and the "delete" operation always drops the lowest-quality feature.Our strategy is different from the strategy proposed by Moradi et al.[Moradi and Gholampour(2016)].Especially when dealing with two cases,|Thigh|＞αand|Tlow|＜β.In these two cases,neither the number of high-quality features is pull back toα,nor the number of low-quality features is compelled toβ.The calculation ofadd_numanddel_numcan be recognized as the probability in considering of feedback theory.

3.4 Fitness function

The defniition of the novel ftiness function is as follows:

whereAccuracy(Tree) means the classifciation accuracy of the current tree (feature subset) on the classifeir.|C|represents the number of features selected in the current tree.|f|is the total number of features of the dataset.The parameterμis defnied to describe the importance of the classifciation accuracy and the dimensionality reduction ratio of the current feature subset.Eq.(7) aim to fnid a balance point between the classifciation accuracy and the dimensionality reduction ratio,since the core purpose of feature selection is to increase the rate of dimensionality reduction ratio on the premise of ensuring the classifciation accuracy.Generally,accuracy is more important than dimensionality reduction ratio.In the experiments of parameter selection,the best ftiness value achieved whenμis set to 0.8.Note that all trees in the forest are assessed by this novel ftiness function instead of considering the accuracy only.

3.5 Size limitation

In order to speed up the local seeding process,"life time"and"area limit"is given to control the number of trees in the forest.Those trees who does not match the restrictions will be moved from the forest to a candidate set.In our algorithm,it works like a recycle bin for that it collects those trees with older age or lower ftiness value.However,candidate trees are useful in the next step called global seeding.Note that if the tree with highest ftiness value reaches"life time",it is age will be reset to 0.

3.6 Global seeding

This step gives the candidate trees a chance of renascence.We select"flood rate"of trees with the lowest ftiness from the bottom of the candidate set.After a big rainfall,these weakest trees are carried by the flood to far far away.Their features and ages are completely reversed.For example,if a tree is like [1,0,1,1,1,0,0,1,0,0,age= 10],it will be reversed into[0,1,0,0,0,1,1,0,1,1,age=0].The flood motivates such a global seeding process,which brings us many strange new trees that seems quite different from those we got from local seeding.These strange new trees will participate in the next iteration and effectively prevent the algorithm falling into local optimal solution.

The whole pseudo code of our FSLSFOA is as follows:

Algorithm 2 FSLSFOA Input:life time,area limit,flood rate,ε,λ,μOutput:the best feature set with the highest fitness value STEP 1 Initialization 1:use feature subset size determination mechanism to calculate sf which can be affected by ε 2: sf of features are initialized to 0 and 1 randomly for the"area limit"of trees created 3:each tree is represented to(D+1)dimension vector(D is features and age is set to 0)STEP 2 Feature Grouping 4:compute the importance of each feature and divide the feature set into Group_high and Group_low 5:while iterations ＜maximum iterations do STEP 3 Local Seeding(use λ to empirically control the proportion of selected features)6:use local search strategy to create new trees from trees aged 0 (new trees for last iteration)7:the old trees’age add 1 STEP 4 Fitness computation 8:compute fitness value which can be affected byμSTEP 5 Size Limitation 9:sort all trees according to fitness value and move low fitness trees into candidate set 10:set the best tree’s age to 0 STEP 6 Global Seeding 11:select"flood rate"of trees with the lowest fitness from the bottom of the candidate set 12:reverse the front D features of the tree vectors and set their age to 0 13:end while 14:return the best feature set with the highest fitness value

4 Experiment

10 standard datasets along with 3 classical classifiers are selected to conduct experiments.At the same time,our algorithm is compared with other feature selection methods.It includes feature selection using forest optimization algorithm based on contribution degree(FSFOACD) [Ma,Jia,Zhou et al.(2018)],feature selection based on ant colony optimization unsupervised probability algorithm (UPFS) [Dadaneh,Markid and Zakerolhosseini (2016)],feature selection based on supervised rough sets (ANFS) [Luo,Nie,Chang et al.(2018)] and the feature selection of a novel initialization and update mechanism of particle swarm optimization(PSO(4-2))[Tran,Xue and Zhang(2014b)].

4.1 Data sets

In the experiment,10 standard datasets from UCI machine learning database are selected to conduct experiments.These include Glass,Wine,Vehicle,Hepatitis,Parkinsons,Ionosphere,Dermatology,SpamBase,Sonar and Arrhythmia datasets.These datasets cover small,medium and large datasets,and they have been widely used in many research areas of machine learning.Tab.1 shows the details of the dataset mentioned,such as the number of features,the number of categories,and the number of samples.

In experiments,the average of corresponding features is used to flil missing values.In addition,in real cases,the numerical ranges between eigenvalues vary greatly.Therefore,the characteristics associated with large range values dominate those associated with small range values.In order to overcome this problem,all of the features are normalized.

Table 1:Data sets

4.2 Classifiers

To verify the generality of our proposed algorithm,several classical classifeirs are utilized to test the classifciation effect of the selected feature subset.It includes support vector machine (SVM),decision tree (DT) and k-nearest neighbor (KNN).These classifeirs are widely used in the feild of data mining.We use the SMO,IBK and J48 algorithms in WEKA to implement the functions of SVM,KNN and DT classifeir.

4.3 Evaluation index

In the experiment,classifeirs above are able to classify datasets according to the features selected by feature selection algorithm.The accuracy of a classifeir on one dataset is the percentage of samples that can be correctly labeled by the classifeir.In order to measure the stability,the feature selection algorithm is repeatedly run 10 times in each condition.Immediately behind it,standard deviation is calculated by 10 accuracies of classifeir.Moreover,the ftiness value who leads in the feature number penalty is utilized inside the model to fnid a balance between the goal of feature selection and classifciation.In practice,Accuracy(Tree)in Eq.(7)can be calculated by feeding the classifeir with the selected features and the label of the dataset.The ftiness value takes the place of accuracy as the internal evaluation index of the model,while external experimental results still use more common accuracy for the help to other researchers.

4.4 Parameter discussion

In this section,a discussion on the parameters of FSLSFOA is conducted.The number of trees in the forest is initialized to 50,which is satisfactory for most datasets we used.The values of"life time","flood rate"and"area limit"are set to 15,5 and 50 respectively."life time"is of great use to control the vitality of the forest."flood rate"gives a chance to those trees at the bottom of candidate set,but may also bring confusion to the forest.So,it is recommended to set"flood rate"within 10%of"area limit",which directly affects the time cost of our algorithm.There is no doubt that parameters above are greatly affected by the size of actual datasets,but performance discussion on a specifci dataset means little in comparison with the average accuracy on all datasets we used.

Next,the baseline settings ofε,λandμare 0.5,0.5 and 1 respectively.Then,discussion on them are given respectively according to the average accuracy of all classifeirs on all datasets.Each time,only one parameter is changed from the baseline,namely variablecontrolling approach,which ensures the reflection of parameter’s influence on average accuracy.

Table 2:Performance evaluation on ε according to average accuracy on all datasets using all classifeirs

In this part,the evaluation indicator is the total average of 30 accuracy rates of three classifeirs on ten datasets.Firstly,in Tab.2,εshows best performance when it takes 0.4.This result shows that only about 40% of features are worth being selected.In another word,feature selection is universally necessary for common datasets.Secondly,the average accuracy peaks at 87.58% whenλis set to 0.6 in Tab.3.The truth is that the contribution of feature grouping process is very limited.60% ofGroup_highalong with 40%ofGroup_lowrather than 100%selecting fromGroup_highachieves the best result.Thirdly,μis used to adjust composition of ftiness value.As is shown in Tab.4,the best value ofμis 0.8,which indicates that even better average accuracy can be achieved than using accuracy as evaluation index by optimizing the ftiness value containing 20%of feature number penalty.What surprises us is that feature number penalty,which aims to reduce the number of features,brings better performance for our algorithm.We speculate that this is probably because those feature sets with the highest training accuracy has been over ftited.

Table 3:Performance evaluation on λ according to average accuracy on all datasets using all classifeirs

Table 4:Performance evaluation onμaccording to average accuracy on all datasets using all classifeirs

4.5 Results and comparisons

To verify the generality of the proposed algorithm,different classifeirs are utilized to evaluate the performance of the proposed algorithm,including KNN,DT (decision tree)and SVM.In order to obtain steady results,the classifciation accuracy of feature selection algorithm is measured by 10-fold cross validation.Tabs.5 to 7 show the mean value and standard deviation of the classifciation accuracy of different classifeirs SMO,IBK and J48 with different feature selection algorithms UPFS,ANFS,PSO(4-2),FSFOACD and FSLSFOA over the 10 datasets.Note that the best results for our FSLSFOA in the parameter discussion above have been used in this part from beginning to the end.

From the results shown in Tab.5 and Fig.3,along with KNN classifeir,the proposed FSLSFOA algorithm performs best on most datasets compared with other feature selection algorithms.Only on the dataset Vehicle,FSLSFOA,whose classifciation accuracy is 81.21%,is second to PSO(4-2) algorithm,whose classifciation accuracy is 84.31%.On the other hand,the standard deviation of our algorithm is the lowest in all datasets,which improves that our algorithm is very stable.Especially on the datasets Glass,Hepatitis,Sonar and Arrhythmia,our FSLSFOA has a signifciant improvement in classifciation accuracy compared with other feature selection algorithms.

Table 5:Performance of FSLSFOA and other feature selection algorithm using KNN classifeir

Figure 3:Performance of FSLSFOA and other feature selection algorithm using KNN classifeir

From the results shown in Tab.6 and Fig.4,with the SVM classifeir,the proposed FSLSFOA algorithm has the highest average classifciation accuracy compared with other feature selection algorithms on all datasets.Particularly on the datasets Wine,when the accuracy of FSFOACD is 99.17%,the accuracy of FSLSFOA achieves 99.09%.We infer that this is because of the comparatively high standard variance of FSLSFOA and the high accuracy of the dataset.For the rest of the datasets,our local search strategy and global seeding process have played a great role in improving the feature selection.

Table 6:Performance of FSLSFOA and other feature selection algorithm using SVM classifeir

Figure 4:Performance of FSLSFOA and other feature selection algorithm using SVM classifeir

As can be seen from Tab.7 and Fig.5,cooperating with DT classifeir,FSLSFOA also has the highest classifciation accuracy on most datasets compared with other algorithms.This is benefti from the local search strategy.In the stage of local seeding and global seeding,the local search strategy guides the forest optimization algorithm to select more features with high class correlation and low redundancy,while eliminating low class correlation and high redundancy features.Thus it guarantees the quality of the feature subset in the forest.The state-of-art probability model in our local search strategy allows more variance for each tree,which allows more possibility for the limited trees to fnid the optimal solution within a limited number of iterations.

Table 7:Performance of FSLSFOA and other feature selection algorithm using DT classifeir

Figure 5:Performance of FSLSFOA and other feature selection algorithm using DT classifeir

In addition,several experiments have been conducted to compare the classifciation accuracy of the proposed FSLSFOA with other feature selection methods based on the number of selected features.Fig.6 and Tab.8 show the results on datasets Sonar and Arrhythmia.In each graph,the X-axis represents the subset size of the selected feature,while the Y-axis represents the accuracy.In Figs.6(a),6(b) and Tabs.8(a),8(b),the accuracy of FSLSFOA based on SVM and DT classifeir is obviously better than other feature selection algorithms.

Figs.6(c),6(d)and Tabs.8(c),8(d)show similar results.Tab.8(c) shows that when the number of selected features in Arrhythmia dataset is set to 70,the accuracy of FSLSFOA is 63.66%.In this case,the accuracy of UPFS,ANFS,PSO(4-2)and FSFOACD is 54.12%,55.83%,48.14%and 61.41%,respectively.As shown in Fig.6(d),FSLSFOA outperforms over other feature selection algorithms in the range of 10-70 features,but is defeated by PSO(4-2)when the feature number go over 100.This deficiency is worthy of our further research.

Figure 6:Comparison of classification accuracy of feature subsets of different sizes on Sonar and Arrhythmia datasets

Table 8:Comparison of classifciation accuracy of feature subsets of different sizes on Sonar and Arrhythmia datasets

Based on the average accuracy in Tabs.5 to 7,the average classifciation accuracy can be illustrated in Fig.7.Obviously,our proposed FSLSFOA achieves the best results in combination with three classifeirs.Among the three classifeirs,DT achieves the best performance in combination with different feature selection methods.However,FSLSFOA+KNN get an accuracy of 88.81%,which is the better than SVM and DT.

To sum up the unsatisfactory results,FSLSFOA performs not very well on the Vehicle and Parkinsons.Besides,its standard deviation on accuracy is worth that FSFOACD,which is caused by the randomness in local search strategy.Eqs.(4)-(6) not only bring these randomness,but also improve the accuracy.After weighing the pros and cons,this mechanism is preserved.

In our future work,FOA based feature selection methods is still the main task.More deep learning methods can be applied to further improve the performance of our work.The proposed methods should be compared with more recent study and methods on more datasets.In order to make the research results more convincing,it is also necessary to apply the model to some areas of urgent need.Abdel-Basset M will be our guide i the future work.The works of his team is worth study to improve our model.

Figure 7:Average classifciation accuracy of feature selection methods on all datasets

5 Conclusion

This article proposes a feature selection algorithm with local search strategy based on the forest optimization algorithm,namely FSLSFOA.Our local search strategy guides the forest optimization algorithm to select features with higher class correlation and lower redundancy,and remove features with lower class correlation and higher redundancy.While searching for the optimal solution,this strategy guarantees the quality of the feature subset in the forest.Next,the ftiness function is improved,which not only consider the classifciation accuracy,but also consider the size of the feature subset,thus to get an optimal solution with smaller feature subset.To avoid falling into local optimum,a new global seeding method is attempted,which picks up trees on the bottom of candidate set and gives the algorithm more possibilities.Finally,10 datasets with different attribute benchmarks are selected to verify the effectiveness of the proposed algorithm.The experiment is implemented in terms of classifciation accuracy and the size of the selected feature subset.Most of the results are superior to previous feature selection algorithms we selected as the comparative method.

Acknowledgement:This work was supported in part by National Science Foundation of China (Nos.U1736105,61572259,41942017).The authors extend their appreciation to the Deanship of Scientifci Research at King Saud University for funding this work through research group no.RGP-VPP-264.

Computer Modeling In Engineering&Sciences2019年11期

Computer Modeling In Engineering&Sciences的其它文章: A Parametric Study of Mesh Free Interpolation Based Recovery Techniques in Finite Element Elastic Analysis; Some Remarks on the Method of Fundamental Solutions for Two-Dimensional Elasticity; Vibration Analysis of a Drillstring in Horizontal Well; Synchronization of Robot Manipulators Actuated By Induction Motors with Velocity Estimator; IDSH:An Improved Deep Su pervised Hashing Method for Image Retrieval; Research on Nonlinear Frequency Compression Method of Hearing Aid with Adaptive Compression Ratio