Multi-Label Learning Based on Transfer Learning and Label Correlation

2019-11-07 03:12:24KehuaYangChaoweiSheWeiZhangJiqingYaoandShaosongLong

Computers Materials&Continua 2019年10期

Kehua Yang,Chaowei SheWei Zhang Jiqing Yao and Shaosong Long

Abstract:In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local label correlations can appear in real-world situation at same time.On the other hand,we should not be limited to pairwise labels while ignoring the high-order label correlation.In this paper,we propose a novel and effective method called GLLCBN for multi-label learning.Firstly,we obtain the global label correlation by exploiting label semantic similarity.Then,we analyze the pairwise labels in the label space of the data set to acquire the local correlation.Next,we build the original version of the label dependency model by global and local label correlations.After that,we use graph theory,probability theory and Bayesian networks to eliminate redundant dependency structure in the initial version model,so as to get the optimal label dependent model.Finally,we obtain the feature extraction model by adjusting the Inception V3 model of convolution neural network and combine it with the GLLCBN model to achieve the multi-label learning.The experimental results show that our proposed model has better performance than other multi-label learning methods in performance evaluating.

Keywords:Bayesian networks,multi-label learning,global and local label correlations,transfer learning.

1 Introduction

Nowadays,we live in an information age.An instance cannot be labeled with just single label,so the instance is often associated with more than one class label.For example,an image can be annotated with several labels[Su,Chou,Lin et al.(2011)],a piece of music can belong to many types[Turnbull,Barrington,Torres et al.(2008)],a text can reflect different themes[Ueda and Saito(2002)].Therefore,multi-label classification attracts more and more researchers to research.

There are two categories in multi-label learning algorithms[Zhang and Zhou(2007)]:problem transformation and algorithm adaption.Problem transformation is a straightforward method.The main idea is to convert multi-label problem as one or more traditional single label problems.Algorithms include Binary Relevance(BR)[Boutell,Luo,Shen et al.(2004)],Pruned Problem Transformation(PPT)[Read,Pfahringer and Holmes(2009)]and so on.Algorithm adaptation is an adaptive method.The main idea is to use single-label classification algorithm to adapt to multi-label classification.Classic algorithms include C4.5 Decision Tree[Quinlan(1992)],Multi-label Dimensionality reduction via Dependence Maximization(MDDM)[Zhang and Zhou(2010)],Multi-label Informed Latent Semantic Indexing(MLSL)[Yu,Yu and Tresp(2005)]and so on.

Label correlation can provide important information for multi-label classification.For example,“blue sky” and “white cloud” are highly symbiotic labels,while “sunny” and“black clouds” are highly exclusive labels.“ocean” and “sailboat” appear at the same time,it is highly likely that the “fish” label will be included,while the “desert” label will not appear.However,most of the existing methods mainly focus on the sharing characteristics of global label and ignore the label correlation of local data sets.For example,“Jack Ma” is associated with “Alibaba” in the IT company data set[Liu,Peng and Wang(2018)],but it is weakly related to global label correlation.Therefore,according to the above analysis,it is more practical and comprehensive to consider the global and local label correlations in multi-label classification.

Each instance has characteristics of multi-dimensional label in multi-label learning.If the label of instance is annotated simply by manual labeling,human may sometimes ignore labels that they do not know or of little interest,or follow the guide by some algorithm to reduce labeling costs[Huang,Chen and Zhou(2015)].Some labels may be missing from the training set,which is a kind of weakly supervised learning.So subjectivity factors are unavoidable in the labels.As a result,some labels may be missing from the data set,resulting in label imbalance,which makes it more difficult and potentially negatively impacting performance to estimate label correlations.

In this paper,we propose a novel and effective method called “Bayesian Networks with Global and Local Label Correlation”(GLLCBN).The main idea of GLLCBN is to use the global label semantics correlation and local label correlation of data set to balance label correlation and reduce the impact of label noise on data set.First of all,the probability of each individual label is obtained by analyzing the data set.Similarly,we get the probability between pairwise labels by using the data set.And then,the global label correlation matrix is constructed by label semantic similarity.After that,according to the relevant probability information received in the first to three steps,the initial Bayesian networks topology is constructed to obtain the high-order label correlation.In addition,redundant edge(label correlation)in the network structure are optimized by graph theory and probability theory.Subsequently,GLLCBN model is constructed.Finally,the initial prediction label is obtained by using transfer learning to adjust and train the Inception V3 model,and then,the prediction result is combined with GLLCBN to achieve multi-label classification.

The remainder of this paper is organized as follows.Section 2 introduces the related work of multi-label learning.Section 3 presents our proposed algorithm in detail.We experimented to verify the performance of our proposed method in Section 4.Finally,conclusions and future work are given in Section 5.

2 Related work

In recent years,multi-label learning has been extensively studied and many methods have been proposed[Zhang and Zhou(2007)].In addition,the role of label correlation has gradually become the focus of researchers.Methods can be divided into three categories according to the degree of label correlation[Zhang and Zhang(2010)].

First-order method is to convert a multi-label classification into multiple one-dimensional independent classifiers.For example,the classic BR[Boutell,Luo,Shen et al.(2004)]trained a corresponding classifier for each label independently.Obviously,the advantage of this approach is its simplicity,but it ignores label correlation.Second-order method refers to the correlation between pairs of labels.For example,the CLR[Brinker(2008)]achieved conversion of multi-label classification problems by analyzing the correlation of pairwise labels and establishing label rankings.Although the advantage of this method is that it considers the internal pairwise label correlations,which has a certain efficiency improvement.However,multi-label learning generally has high dimensions,we should not be limited only to consider the existence of pairwise labels.Therefore,higher-order methods are proposed.High-order method refers to analyzing the correlation between the high-dimensional of the labels and is not limited to the pairwise labels.For example,MLLRC[Huang and Zhou(2012)]solves multi-label classification problem by using characteristics of the matrix rank.Obviously,the advantage of high-order method is to extract the intrinsic connection of label and strengthen the dependency of labels,but label correlation analysis is more difficult and the label correlation structure is more complicated.

Labeling of an instance may result in label imbalance due to subjectivity factors.For example,the actual label for this image should contain “bull”,“mountain” and “road” in Fig.1.By manual labeling,on the one hand,the picture on the left can be marked in the order of “cattle”,“mountain”,“road”.On the other hand,the picture on the right may be marked as “mountain”,“road”,“bull”.Sometimes the label of “bull” even be lost by visual effects.GLOCAL[Zhu,Kwok and Zhou(2017)]indicated that missing label and label order are influential factors for multi-label classification.

Figure 1:Image annotation

In summary,for the study of multi-label classification,not only global label correlation should be considered,but also local label correlation.Therefore,a more balanced and comprehensive label correlation can be received.

3 The proposed approach

In this section,details of the proposed approach GLLCBN will be presented.Firstly,we perform predefined of model and analysis global and local label correlation to obtain GLLCBN model.Secondly,we combine the optimized Inception V3[Szegedy,Vanhoucke,Ioffe et al.(2016)]model by transfer learning with GLLCBN to achieve multi-label classification.

3.1 Preliminaries

Since multi-label classification has high-dimensional features,this is the difference between multi-label classification and single-label classification,we have following predefined processing.LetD=Rnbe n-dimensional sample space andwheremis the number of labels in dataset.On the one hand,the correspondence between the data set instance and the sample label is defined asQ=（{Ni,Mi）|i=1,2,...,n},wherenrepresents the total number of data set andNi∈Dis an n-dimensional feature vector.So we defineto represent the feature vector of a sample instance.On the other hand,we denoteas sample label matrix,whereis the label vector of instance associated withNi.In addition,we denoteas each elementif thei-th instance hasj-th label,otherwise.

3.2 Label correlation

Label correlation contains potentially important information for multi-label classification problem,so label correlation is an essential part of our analysis[Punera,Rajan and Ghosh(2005)].However,there are certain difficulties in the analysis of this aspect,then how to solve this problem has become a new research direction.In order to analyze label correlation more reasonably and comprehensively,we deal with local correlation of the data set and global correlation of the label semantics.

3.2.1 Local label correlation

We consider local label correlation from the data set.Since data in the data set is random,the probability of different labels is inconsistent.According to this feature,we denoteP=[p(l1),p(li),...,p(lm)]as the probability of each label occurrence,wheremrepresents the total number of sample labels andp(li)indicate the probability of thei-th label in the data set.Since label correlation is at least second-order,we need to calculate the probability of pairwise labels.The local label correlation is defined as:

wherei,jare a single label in the label set andi,j∈m.We denoteT(Nlj)as the number of sample instances with the labellj.However,to avoid anomalous expressions,ifT(Nlj)value is equal to 0,it means thatp(li|lj)is also equal to 0.Similarly,T(Nli|lj)represents the number of sample instances that simultaneously have both labelsliandlj.In addition,we denotep(li|lj)andX(li|lj)as the probability of pairwise label correlations.It is important to note that pairwise label correlations is not a symmetric equivalent relationship,which is defined as:

For example,there is a data set as shown in Tab.1:

Table 1:Data set

According to the above table,,there isp(lA|lB)≠p(lB|lA),so Eq.(2)is correct.

3.2.2 Global label correlation

We obtain global correlation by analyzing the word similarity.At present,the correlation between words mainly uses context semantics of words,the word vector is used to judge correlation between two words and Word2vec[Mikolov,Chen,Corrado et al.(2013)]is a classic algorithm.For example,words “man” and “woman” are highly relevant to “man”and “beautiful”,because they are used in a similar context.For example the word “man”can be used in the position of the sentence and the word “woman” can be replaced.Therefore,we defineW=[W1,W2,...,Wm]T∈[0,1]m×mas word matrix,whereW1=[w(l1|l1),w(l1|l2),...,w(l1|lm)]is a vector of pairwise words correlation andw(li|lj)is the word correlation probability between labelsiandj.The process is defined as:

As shown in Eq.(3),each label is perfectly correlated with itself,so the value is 1 and a small value means that the label correlation is low,otherwise the opposite.

3.3 GLLCBN model

According to the analysis in Section 3.2,we have dealt with global and local label correlations.The relationship between them is defined as Eq.(4):

wherei,jare pairwise labels,andλ1,λ2∈[0,1]，λ1+λ2=1are trade-off parameters for controlling the weight between global and local label correlations.Then that,E(li|lj)is the comprehensive label correlation.

It is not enough to finally acquire pairwise label correlations through theE(li|lj),because according to the global and local label correlation,this will result in a cyclic relationship between pairwise labels,the labelsliandljhave a relationship betweenE(li|lj)andE(lj|li).When a symmetrical relationship occurs,it becomes ambiguous because it is impossible to determine which side of the pairwise labels is strongly dependent on the other.Therefore,in order to solve this problem,it is necessary to eliminate the ambiguous dependencies of pairwise labels,so that the definition can be defined as Eq.(5):

whereli,ljare pairwise labels,andlx1|lx2is finalized label dependency.

For example,there is a structure in Fig.2,where a circle represents a label(e.g.,A,B)and the edges between the circles represent the probability of pairwise label correlations(e.g.,E(lB|lA),E(lA|lB)).

Figure 2:Correlation between label A and B

According to the description of the Eq.(5),since correlation follow the principle of maximum value,correlation between label A and label B should be such that the structure of Fig.2 should be optimized to the structure of Fig.3.Therefore,this shows that label A is more dependent on label B.

Figure 3:Optimized label A and B correlation

As shown in Fig.4,it is worth noting that if there are multiple reachable paths for one label to another.As shown in the Eq.(6),it is used to determine label dependencies of multiple reachable edges.

Figure 4:Multiple reachable paths between label

Eq.(6)can be equivalent to the form as shown in the Eq.(7):

wherek1,k2,...are the middle label node on the path from labelitoj.Then that,Q1=The

specific proof of the Eq.(7)is as follows:

Proof.First,we are based on Fig.4.On the one hand,N(A),N(B),N(C),N(A,B),N(A,C),N(B,C),N(A,B,C)are the number of sample instances.On the other hand,p(lC|lA),p(lB|lA),p(lC|lB)are the probability of pairwise local label correlations.Then,w(lB|lA),w(lC|lA),w(lC|lB)are the probability of label semantic correlation.Moreover,their values are known according to the analysis from Eq.(1)to Eq.(5).

Second,according to graph theory and probability theory,we have the following defines:

If we have to proveE(lC|lA)≥(E(lB|lA),E(lC|lB))，according to Eq.(8)and Eq.(9),we only need to prove.

We observe that the data set can be able to guaranteeN(A,C)≥N(A,B,C),because both of them belong to the inclusion relationship,and the former has a larger scope.By the same logic,we know thatN(A)≥N(A,B)and the values ofλ1and λ2 on both sides of the equation are the same,so it does not require additional consideration.In addition,w(lABC|lAB)=w(lB|lA)×w(lC|lB)×(lC),wherew(lC)is equal 1,thus,w(lABC|lAB)=w(lB|lA)×w(lC|lB).

According to the above analysis,we know that only if the sample data set satisfies the value of,thenE(lC|lA)≥(E(lB|lA),E(lC|lB))can be obtained,otherwise,the opposite is true.Therefore，we prove that Eq.(7)is true.

According to the Eq.(7),the reachable path for eliminating the dependency of the label correlation can be performed,but the following two cases require special handling.

Case 1.E(li|lj)=E(li|lk1|lk2...|lj)

According to the principle of maximum label correlation,Fig.4 is optimized as shown in Fig.5.

Figure 5:Optimized GLLCBN model

Case 2.E(li|lj)=E(li|lj)

Fig.4 does not need to be changed.Because there may be pairwise label correlations,the intermediate nodes in them cannot be eliminated,and the correlation structure of each intermediate node should be retained.

In summary,directed graph model of GLLCBN can be constructed by analyzing label correlation and Bayesian networks[Friedman,Linial,Nachman et al.(2000)].The GLLCBN model can be used to optimize the label correlation and facilitate the extraction of potential association information between labels,and reduce the impact of label imbalances in the sample data set.

3.4 Adjustment of Inception V3 model

Convolutional Neural Network(CNN)plays a very important role in the research of image classification[Song,Hong,Mcloughlin et al.(2017)].There are many excellent models of CNN,such as AlexNet[Krizhevsky,Sutskever and Hinton(2012)],VGGNet[Russakovsky,Deng,Su et al.(2015)],ResNet[He,Zhang,Ren et al.(2015)]and GoogleNet[Szegedy,Liu,Jia et al.(2014)].Among them,Inception V3[Szegedy,Vanhoucke,Ioffe et al.(2016)]created by Google is a very portable and highly usable model.Therefore,we use transfer learning approach to make related adjustments for the Inception V3 model to improve the performance of the multi-label classification problem.In order to adjust the Inception V3 model,we need to make some adjustments.First of all,since the Inception V3 model was initially trained for single classification,but our images are multi-label attributes,we need to treat the label storage of the input data as multidimensional,rather than only as single label.Secondly,in order to achieve applicability,it is generally necessary to remove the top-level structure and then add some new various layers of customization.Therefore,we add a fully connected layer of 1024 nodes for association with the last pooling layer.Finally,since the softmax in Inception V3 outputs 1000 nodes(the ImageNet[Deng,Dong,Socher et al.(2009)]data set has 1000 categories),we need to modify the last layer of the network and convert it to the number of nodes(it equivalent to the label type in the data set)so that label classification is achieved through our model.

4 Experiments

In this section,to evaluate the performance of GLLCBN,a description of the multi-label data set used in the experiments,the performance evaluation of multi-label classification and comparative algorithm with GLLCBN model are explained.Finally,the experimental results and analysis are presented.

4.1 Data sets

In order to verify the performance of GLLCBN,we chose the open source data set collected by Nanjing University.The download link for the data set ishttp://lamda.nju.edu.cn/files/miml-image-data.rar.The data set contains 2,000 landscape images and five labels(desert,ocean,sunset,mountains,trees).In addition,each instance has an average of two labels.It is worth noting that the original data set archive contains a file called miml_data with a.mat suffix(Matlab file format).It contains three files:bags.mat,targets.mat and class_name.mat.The first file can be ignored directly because it has no special effect.The second file is the label definition for each image,which means a matrix of 5×2000,each column represents label for an instance image,and the value 1 indicates the presence of the label,and-1 means that there is no corresponding label,and the label order is the same as the class_name.mat file.In addition,we need to deal with label format content and convert the matrix into a.txt format file in order to facilitate follow-up training of Inception V3 model.The last file shows all possible label names for the data set.

4.2 Performance evaluation

Since an instance has multiple label attributes in the multi-label classification,prediction label may belong to a subset of the actual label,which is represented as.To evaluate the performance of GLLCBN model,we select five widely-used evaluation metrics[Gibaja and Ventura(2015)].

Hamming Loss expresses the degree of inconsistency between the prediction label and the actual label.Eq.(10)shows the expression of the Hamming Loss.

Coverage evaluates how far it is needed,on average,to go down the ranked list of labels in order to cover all ground true labels.Eq.(11)shows the expression of the Coverage.

Ranking Loss evaluates the average fraction of mis-ordered label pairs.Ranking Loss is defined as follows：

Average Precision represents the average accuracy of the predicted instance label set,just like Eq.(13).

Average Predicted Time expresses the average time to predict each instance and it time unit is second,which is expressed as follows:

It is worth noting that in the above five performance evaluation,Hamming Loss,Ranking Loss,Coverage,Ranking Loss and Average Predicted Time,the smaller value means the better performance.But for Average Precision,the larger value means better performance.

4.3 Comparative algorithms

In order to validate the validity of GLLCBN model,we compare GLLCBN to the following most advanced multi-label learning algorithms:

1.Binary Relevance(BR)[Boutell,Luo,Shen et al.(2004)]is first-order method.The main idea is to train a binary linear SVM classifier independently for each label.

2.Calibarated Label Ranking(CLR)[Brinker(2008)]is second-order method.The main idea is to establish a label ranking by analyzing the pairwise labels.

3.Multi-label Learning Using Local Correlation(ML-LOC)[Huang and Zhou(2012)]is high-order method.The main idea is to analyze the local label correlation by encoding instance features.

4.Random K-Labelsets(RAKEL)[Tsoumakas,Katakis and Vlahavas(2011)]is highorder method.The main idea is to transform the multi-label classification problem into several multi-class learning problems by exploiting high-order global label correlation.

All compared algorithms are summarized in Tab.2.

Table 2:Compared methods

4.4 Experimental results

In our experiments,we randomly use 30%,50% and 70% of the data in data set as the training set,and the rest of data as test data set.Experimental results are shown from Tab.3 to Tab.5.In addition,all of our experimental methods are studied by using the Python or Matlab environment.

Table 3:Performance evaluation of different algorithms for randomly marking 30% data sets as training data sets:mean ±std(rank)

Table 4:Performance evaluation of different algorithms for randomly marking 50% data sets as training data sets:mean ±std(rank)

Table 5:Performance evaluation of different algorithms for randomly marking 70% data sets as training data sets:mean ±std(rank)

4.5 Experimental analysis

According to experimental results in Section 4.4,we get the following summary:

1.When the training data set is 30% of the data set,BR algorithm has some advantages in the performance evaluation of Hamming Loss,Coverage,Ranking Loss,Average Precision and Average Prediction Time.Global label correlation algorithm is superior to local label correlation algorithm.

2.When the training data set is 50% of the data set,BR algorithm has better performance evaluation in Hamming Loss than the label correlation algorithm.In terms of Coverage,Ranking Loss,Average Precision and Average Prediction Time,the CLR,ML-LOC and GLLCBN algorithms that consider local label correlation have a better advantage than the RAKEL algorithm that considers global label correlation.

3.When the training data set is 70% of the data set,the advantages of BR are gradually replaced by other algorithms.CLR algorithm in Hamming Loss is higher than other algorithms.For the label correlation algorithm,GLLCBN has better performance evaluation in terms of Coverage,Ranking Loss and Average Precision than MLLOC and RAKEL algorithms,but it is longer in Average Prediction Time.

4.By analyzing the above three points,we can know that when the training data set is gradually increased,the advantage of the label correlation algorithm in the performance evaluation is gradually reflected,indicating that the label correlation has a certain influence on the multi-label classification problem.

5 Conclusion and future work

For the study of multi-label classification,how to mine the potential label correlation information is still a worthy direction in the future.In this paper,we propose a novel and effective approach named GLLCBN for multi-label learning.In the GLLCBN model,the node represents label space,and edge represents global and local comprehensive label correlation.We firstly obtain a complex model by analyzing label,global semantic relevance and local label correlation of data set(This process is called building node association graph),and secondly by using probability theory,Bayesian networks and graph theory to optimize label dependency graph(This process is called eliminating redundant edges),thus we construct a label-dependent network called GLLCBN model.Finally,the multi-label classification is solved by combining the initial prediction results by the Inception V3 model with the GLLCBN model.In addition,experimental results show that our proposed approach has certain effectiveness in performance evaluation.

In the future,we consider optimizing the performance of our proposed methods in large scale label space data sets and applying this approach to more different multi-label data sets.

Acknowledgement:The authors gratefully acknowledge support from National Key R&D Program of China(No.2018YFC0831800)and Innovation Base Project for Graduates(Research of Security Embedded System).

Computers Materials&Continua2019年10期

Computers Materials&Continua的其它文章: A Heterogeneous Virtual Machines Resource Allocation Scheme in Slices Architecture of 5G Edge Datacenter; Multiple Kernel Clustering Based on Self-Weighted Local Kernel Alignment; A Physical Layer Algorithm for Estimation of Number of Tags in UHF RFID Anti-Collision Design; A Compensation Controller Based on a Nonlinear Wavelet Neural Network for Continuous Material Processing Operations; A Robust Zero-Watermarking Based on SIFT-DCT for Medical Images in the Encrypted Domain; Bus Priority Control for Dynamic Exclusive Bus Lane