999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

LEARNING RATES OF KERNEL-BASED ROBUST CLASSIFICATION*

2022-06-25 02:13:18ShuhuaWANG王淑華

Shuhua WANG (王淑華)

School of Information Engineering,Jingdezhen Ceramic University,Jingdezhen 333403,China

E-mail:w614sh@126.com

Baohuai SHENG (盛寶懷)?

Department of Finance,Zhejiang Yuexiu University,Shaoxing 312030,China Department of Applied Statistics,Shaoxing University,Shaoxing 312000,China

E-mail:shengbaohuai@163.com

Abstract This paper considers a robust kernel regularized classification algorithm with a non-convex loss function which is proposed to alleviate the performance deterioration caused by the outliers.A comparison relationship between the excess misclassification error and the excess generalization error is provided;from this,along with the convex analysis theory,a kind of learning rate is derived.The results show that the performance of the classifier is effected by the outliers,and the extent of impact can be controlled by choosing the homotopy parameters properly.

Key words Support vector machine;robust classification;quasiconvex loss function;learning rate;right-sided directional derivative

1 Introduction

The problem of classification is one of the most considered in learning theory.The goal of classification is to construct a classifier with a small misclassification error.Of the many classical learning methods that have been provided,the support vector machine (SVM) has been one of the most successful learning algorithms (see for example[1–6]).

LetXbe a given compact set in thed-dimensional Euclidean space Rd(input space),and letY={-1,1}(representing the two classes).Letρbe a fixed but unknown probability distribution onZ=X×Ywhich yields its marginal distributionρXonXand its conditional distributionρ(·|x)=P (·|x) atx∈X.Then we denote bythe samples i.i.d.(independently and identically drawn) according to the distributionρ.

As we know,the goal of classification is to produce a binary classifier C:X→Y.The misclassification error of C,which is used to measure the prediction power of C,is defined by

The best classifier,called the Bayes rule,is given by

The classifiers considered in this paper are induced by real-valued functionsf:X→R as Cf=sgn (f),which is defined by sgn (f)=1 iff(x)≥0 and sgn (f)=-1 otherwise.Recall that the regression ofρis

It is easy to see thatfc(x)=sgn (fρ)(x).

The Tikhonov regularization technique is an effective method for solving ill-posed problems ([7–10]).In recent years,many classification algorithms generated from Tikhonov regularization schemes have been developed and the convergencerates have been investigated (see,for example,[11–17]and references therein).We now briefly introduce the general framework of the kernel regularization classification algorithm.

Given a classifying loss functionV(r):R→R+,the generalization error is defined as

where F is the set of all of the measurable functions onX.

We call HKa reproducing kernel Hilbert space (RKHS) associated with a Mercer kernelK(x,y)(see for example[17–23]) if,for allf∈HKandx∈X,it holds that

The general kernel regularized classification associated with a given reproducing kernel Hilbert space HKtakes the form

whereλ>0 is the regularization parameter.

Based on model (1.1),a large number of scholars have studied the kernel regularization classification algorithms with some convex loss functions (see,for example,[11–14,16,17,19]).However,most of these research results do not consider the impact of outliers on the performance of the algorithms.It is known that,in many practical applications,outliers often occur in real data sets.Outliers can be described as data points that are far away from the pattern set by the majority of the data (see[24–26]).Outliers usually have a large impact on non-robust methods and can even make the whole statistical analysis useless if a non-robust loss is used (see,for example,[27,28]).

In the literature of learning theory,a great deal of effort has been made to improve the robustness of the SVM and other similar learning algorithms.Robust loss functions for SV classification have been introduced and have been intensively studied (see,for example,[27–30]).Unfortunately,robustness comes with a cost:when minimizing a convex loss,even a single data point can dominate the result;that is,any convex loss function exhibits necessarily unbounded sensitivity to even a single outlier.The classical literature achieves bounded outlier sensitivity by considering redescending loss functions or bounded loss functions[27],both of which are inherently non-convex.As observed,non-convex loss functions are necessary to alleviate the effect of outliers.Along these lines,S.Suzumura,K.Ogawa,et al.[31]introduced a novel homotopy approach for the non-convex robust SVM learning.The basic idea is to introduce parameters,which bridge the standard SVM and fully robust SVM to express the influence of outliers.The loss functionVθ,s(r):R→[0,∞) that they defined is

where[1-r]+:=max{0,1-r},θ∈[0,1] ands≤0.

It can be seen thatVθ,s(r) is characterized by a pair of parameters.We refer toθandsas the tuning parameters which govern the influence of outliers.Figure 1 shows the loss functions for severalθands.

Figure 1 The robust classification loss function with different values of θ and s

The robust SV classification algorithm defined in[31]is

wheref(x)=w?φ(x)(withφ(x) being the feature map implicitly defined by a kernelK),wis a vector in the feature space,and?denotes the transpose of vectors and matrices.C>0 is the penalty parameter.

Algorithms and experiments in[31]illustrate the significance of this homotopy approach.The purpose of this paper is to analyze the convergence rate of the excess misclassification error associated with the non-convex loss functionVθ,s(r) defined as in (1.2),and to understand how the choice of the homotopy parameters affect the error.The main difference between our work and reference[31]is that the error bound of the algorithm is estimated theoretically,while reference[31]mainly studies the solution method of the model from the perspective of numerical calculation.

The convex analysis method has become an effective error analysis method in the theoretical research field of machine learning (see,for example,[32–39]).Since the loss functionVθ,sis a non-convex function,the usual convex method cannot be used.In[39],the convergence rate of kernel regularized robust SV regression is provided by using an improved convex analysis method.Differently to the regression problem,the performance of classification algorithm is usually reflected by the convergence rate of the excess misclassification error rather than the excess generalization error considered in[39].Thus,the convergence analysis of classification algorithms is more difficult than that of regression algorithms.In order to overcome this difficulty,we improve the method in reference[39]by the technique of a Fenchel-Legendre conjugate,and an important comparison inequality is derived.

The rest of this paper is organized as follows:in Section 2,the kernel regularized robust SV classification model considered in this paper is given.Section 3 presents the main results of the paper,where a comparison inequality and the learning rate are provided.Some proofs for the main results are given in Section 4.

2 The Kernel Regularized Robust SV Classification

In this section,a regularized robust SV classification model is provided.

With the loss functionVθ,s(r) given by (1.2),we define the generalization error off:X→R as

Also,the general kernel regularized classification associated with the reproducing kernel Hilbert space HKtakes the form

In this paper,we consider a partition of the samplesinto two disjoint sets I and O.The samples in I and O are defined as Inliers and Outliers,respectively.We impose the restriction that the marginyif(xi) of an Inlier should be larger than or equal tos,while that of an Outlier should be smaller thans.In other words,

where M:={1,...,m}.Let P={I,O}∈2mdenote the partition,where 2mis the power set of

With these notions in hand,the kernel regularized homotopy algorithm is defined as (see[31])

where|I|sand|O|sare the cardinality of I and O,respectively,and|I|s+|O|s=m.

Take

We know,from the analysis process of[31]and[31,Proposition 2.1],that(f) is a convex function on HK.Then,(2.2) can be rewritten as

Note that whenθ=1 ors=-∞,algorithm (2.3) is equivalent to the standard formulation of the SV classification with the hinge lossV(r)=[1-r]+(see[11]).

Our goal is to investigate the excess misclassification error for algorithm (2.3).In the present work,the sample error estimate for algorithm (2.3) is investigated with an improved convex approach and a capacity independent error is provided.The excess misclassification error will be shown by a K-functional associated with model (2.1).

3 Error Analysis

3.1 The comparison inequality

We need to bound the excess misclassification errorto measure the performance of algorithm (2.3).The algorithm is designed by minimizing a penalized empirical error associated with the loss functionVθ,s,so relations between the excess misclassification error and the excess generalization errorbecome crucial.A systematic study of this problem is done in[37].

In[11],the following important results are derived:

Lemma 3.1([11]) LetVbe a classifying loss satisfyingV′′(0)>0.Then there exists a constantcV>0 such that for any measurable functionf:X→R,it holds that

LetV(r)=[1-r]+.Then for any measurable functionf:X→R,it holds that

In this subsection we shall give a comparison inequality forVθ,s(r).

Denoteη=η(x)=P (Y=1|X=x) and Φη(r)=ηVθ,s(r)+(1-η)Vθ,s(-r).Then it holds that

Forη∈[0,1],we define the optimal conditionalVθ,s-error as

Then the optimalVθ,s-error satisfies

We definer*:[0,1]→R as

It is easy to see that,for any nonnegative loss functionV,H-(η)≥H(η) holds.

Definition 3.2([37]) We say that the loss functionVis classification-calibrated if,for any,it holds thatH-(η)>H(η).

Definition 3.3([37]) Given a loss functionV:R→[0,∞),we define the Ψ-transform function Ψ:[-1,1]→[0,∞) by,where

andg**:[-1,1]→R is the Fenchel-Legendre biconjugate ofg:[-1,1]→R,which is characterized by epig**=epig.HereSis the closure of the convex hull of the setS,and epigis the epigraph of the functiong,that is,the set{(x,t):x∈[0,1],g(x)≤t}.

Lemma 3.4([37]) For any nonnegative loss functionV,classification-calibration implies that Ψ is invertible on[0,1],and that the following statements are equivalent:

(a)Vis classification-calibrated;

(b)Ψ(τ)>0 for allτ∈(0,1].

For the loss functionVθ,s(r) considered in this paper,we have the following result:

Lemma 3.5LetVθ,s(r) be the loss function defined as (1.2),τ∈[-1,1].Fors≤-1,we have

and for-1<s≤0,we have

wheredθ,s=2θ+(1-θ)(1-s).

Proof(1) Fors≤-1,we discussH(η) andH-(η).We consider the form of Φη(r) which is a convex combination ofVθ,s(r) andVθ,s(-r).

Forη=0,Φη(r)=Vθ,s(-r),eachr≤-1 makes Φη(r)=0.Similarly,forη=1,Φη(r)=Vθ,s(r),eachr≥1 makes Φη(r)=0.For 0<η<,Φη(r) decreases strictly on (-∞,-1]and increases on[-1,+∞).For<η<1,Φη(r) decreases strictly on (-∞,1]and increases on[1,+∞).Forη=,Φη(r) decreases strictly on (-∞,-1]and increases on[1,+∞),and Φη(r)≡1 on[-1,1].

Therefore,by the definition ofr*(η),we have

This implies thatr*(η)=sgn (η-) for allη∈[0,1]and

By the definition ofH(η),we have

On the other hand,we need to derive the concrete expression ofH-(η).According to (3.2),we combine the monotonicity of Φηdiscussed above with the constraint conditionr(2η-1)≤0.The detailed derivation of this is given below.

Forη=0,Φη(r)=Vθ,s(-r).Since 2η-1=-1<0,the constraint condition implies thatr≥0.Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=Vθ,s(0)=1.Similarly,forη=1,Φη(r)=Vθ,s(r).Since 2η-1=1>0,the constraint condition implies thatr≤0.Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=Vθ,s(0)=1.,2η-1≤0,the constraint condition implies thatr≥0.Since Φη(r) increases on[0,+∞),Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=ηVθ,s(0)+(1-η)Vθ,s(0)=1.,the constraint condition implies thatr≤0.Φη(r) decreases strictly on (-∞,0],Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=ηVθ,s(0)+(1-η)Vθ,s(0)=1.

Thus,for any 0≤η≤1,we have

(2) For-1<s≤0,we discussH(η) andH-(η).

Forη=0,Φη(r)=Vθ,s(-r),it is easy to see that Φη(r)=0 for eachr≤-1.Similarly,forη=1,Φη(r)=Vθ,s(r),eachr≥1 makes Φη(r)=0.For,Φη(r) decreases on (-∞,-1]∪[-s,1]and increases on[-1,-s]∪[1,+∞).

By a simple calculation,we get that

Then it follows immediately from 0<η<that Φη(-1)<Φη(1).Therefore the minimum value must be attained at-1.

According to the above analysis,we have that

Hence,for any 0≤η≤1,

Letdθ,s=2θ+(1-θ)(1-s).Then

Now we are in a position to discussH-(η).

Forη=0 orη=1,in a fashion similar to the case ofs≤-1,we have that Φη(r) attains the minimum value Φη(0)=Vθ,s(0)=1.For,2η-1≤0,the constraint condition implies thatr≥0.Φη(r) increases on[0,-s]∪[1,+∞),and decreases on[-s,1],so Φη(r) has two local minimum values:Φη(0)=1 and Φη(1)=(1-η)Vθ,s(-1)=dθ,s(1-η).Ifthendθ,s(1-η)≥1,otherwisedθ,s(1-η)<1.It can be concluded that

Combining (3.10) and (3.11),for anyη∈[0,1],we have that

By (3.9) and (3.12),we get that

and by using the facts that we have,a,b∈R,it follows that

This completes the proof. □

For the loss functionVθ,s(r) defined as (1.2),F(xiàn)igures 2 and 3 show the graphs of Φη(r) with some values of homotopy parameters,while the graphs of Ψ(τ) andH(η) are shown in Figure 4.

Figure 2 Φη(r) with θ=0.25.s=-1.5 in the left sub figure,s=-1 in the middle sub figure,and s=-0.5 in the right sub figure

Figure 3 Φη(r) with θ=1.s=-1.5 in the left sub figure,s=-1 in the middle sub figure,and s=-0.5 in the right sub figure

Figure 4 Ψ(τ) and H (η) for the loss function Vθ,s with θ=0.25 and s≤-1 or-1<s≤0.Ψ(τ) with θ=0.25 and s≤-1 or s=-0.5 in the left sub figure,H (η) with θ=0.25 and s≤-1 or s=-0.5 in the right sub figure

Letting Ψ be the function defined in Definition 3.3,according to the analysis of[37],the following lemma holds:

Lemma 3.6([37])(1) For any measurable functionf:X→R,any nonnegative loss functionV,and any probability distribution onX×{-1,+1},it holds that

Proposition 3.7LetVθ,s(r) be the loss function defined as (1.2).Then for any measurable functionf:X→R,it holds that

ProofFor any measurable functionf:X→R,it is immediately apparent from (3.13) that

According to the results given in Lemma 3.5,for allτ∈(0,1],we have that Ψ(τ)>0.By Lemma 3.4,it follows that the loss functionVθ,s(r) is classification-calibrated,and that Ψ is invertible on[0,1].We now analyze the comparison inequality between R (sgn (f))-R (fc) andfrom two cases according the value of the parameters.

Case 1:s≤-1.Since~Ψ(τ)=|τ|is a convex function,by Lemma 3.6,we have that

Takingτ=R (sgn (f))-R (fc),it follows that,fors≤-1,we have

It is easy to see that forθ=1 ors→-∞,the comparison relation,given by Proposition 3.7,is in accordance with the comparison relation (3.1) associated with the hinge loss.

3.2 The learning rates

In this subsection,we give results about the excess misclassification error of the algorithm (2.3).Let us begin with some concepts regarding convex analysis (see[40]).

Let (H,‖·‖H) be a Hilbert space,and letF(f) be a function defined on a convex setS?H.We say thatFis a quasiconvex function onSif

holds for allf1,f2∈Sand anyα∈[0,1].

If,for allf1,f2∈Sand anyα∈[0,1],it holds that

then we callFa convex function onS.

Letting B (f,ε)={g∈H:‖g-f‖H<ε}be theε-ball of Hilbert space H,we callV:H→R the local Lipschitz nearf∈H if,for someε>0,it holds that

whereLis called the Lipschitz constant,which depends uponfandε.

A nonnegative continuous functionV:R→[0,∞) is called locally Lipschitz loss if,for alla≥0,there exists a constantca>0 such that

Moreover,fora≥0,the smallestcais denoted byLa.Finally,if we havethen we say thatVis Lipschitz continuous.Ifa=+∞,then we say thatV(t) is a Lipschitz loss on R.

For the loss functionVθ,s(r) defined as (1.2),it is easy to show thatVθ,s(r) is a quasiconvex function on R and thatVθ,s(r) is a Lipschitz continuous function with the Lipschitz constantLθ,s=1 forr>sandLθ,s=θforr≤s,i.e.,

Definition 3.8Let (H,‖·‖H) be a Hilbert space,and letF(f) be a quasiconvex function from H to R∪±∞.We callD+F(f,h) the right-sided Gateaux directional derivative ofFatf∈H with respect tohif there existsD+F(f)∈H such that

Moreover,D+F(f,h) can be written as

It is easy to see that ifFis Lipschitz continuous on H,then,for?h∈H,we have that

whereLis the Lipschitz constant,i.e.,

By the definition of D+F (f),we know that ifF(f) attains its minimal value atf0∈H,then

We now state the main result of this paper.

Theorem 3.9Let (HK,‖·‖K) be the reproducing Hilbert space associated with the kernelKx(·)=K(x,·),Vθ,s(r) be the loss function defined as (1.2),andand R (f) be defined as above.Then,for anyδ∈(0,1),with 1-δ,it holds that

Remark 3.10The K-functionaldenotes the approximation error.By the Lipschitzian ofVθ,s,we know that there exists a constantC>0 such thatsatisfies

The decay rate for the right-hand side of (3.23) may be described by a modulus of smoothness ([33]),and the convergence ofis determined by the capacity of HK(see[41,42]).

Remark 3.11If the parametersθandλare chosen such that form→+∞we have that,then,for a givenδ∈(0,1),with 1-δ,we have that

and the convergence rate can be controlled by choosing the homotopy parameters properly.

Remark 3.12Whens→-∞,we have that|I|s→mand|O|s→0.Therefore,the learning rates given have the homotopy continuous property.

4 Proofs

Now we are in a position to prove Theorem 3.9.We need to prove the following lemmas first:

Lemma 4.1LetF(f):HK→R∪±∞be a convex function.Then,for anyf,g∈HK,it holds thatF(f)-F(g)≥D+F(g,f-g)=〈D+F(g),f-g〉HK.

ProofSinceF(f) is a convex function,for anyf,g∈HKandα∈[0,1],it holds thatF(αf+(1-α)g)≤αF(f)+(1-α)F(g).This implies thatF(g+α(f-g))≤F(g)+α(F(f)-F(g)),so we have that

Lettingα→0+in the left of inequality (4.1),our conclusion follows. □

Lemma 4.2Let (HK,‖·‖K) be the reproducing Hilbert space associated with the kernelKx(·)=K(x,·),and letVθ,s(r) be the loss function defined as in (1.2).Then,for anyh∈HK,it holds that

ProofBy the definition of,we have that

The reproducing property of HKyields that

Taking the above formula into (4.4),we have that

This proves (4.2).Using the same argument as above,we obtain equation (4.3).The proof is complete. □

By the same token,we have that

Since Ω(f) and Ωz(f) are also strict convex functions,we have that

Lemma 4.3Letbe defined as in (4.7) and (4.8),respectively.Then,

ProofCombining the definitions ofwith Lemma 4.1,we have that

Combining (4.11) with (4.2) and (4.3),we have that

We have thus proven (4.9). □

Lemma 4.4([43]) Letξbe a random variable taking values in a real separable Hilbert space H on a probability space (Ω,F(xiàn),P).Assume that there is a positive constantLsuch that

Then,for alln≥1 and 0<δ<1,it holds that

The next Lemma gives a bound for the sample error.

Lemma 4.5Letbe defined as in (4.7) and (4.8),respectively.Then,for anyδ∈(0,1),it holds,with 1-δ,that

ProofAccording to Lemma 4.3,it follows that

Then (4.12) implies that

where we have used the fact that|D+[1-·]+|≤1.

According to Lemma 4.4,for anyδ∈(0,1),with,it holds that

By the same token,for anyδ∈(0,1),with,we have that

(4.13) and (4.15),together with (4.16),yield our conclusion. □

A bound for the generalization error is provided in the following Lemma:

Lemma 4.6LetVθ,s(r) be the loss function defined as in (1.2),and letandbe defined as above.Then,for anyδ∈(0,1),with 1-δ,it holds that

ProofApplying the Lipschitz continuity of the loss functionVθ,swith the Lipschitz constantsatisfyingLθ,s≤1,and according to Lemma 4.5,we obtain that

It is now obvious that the lemma holds. □

For anyδ∈(0,1),by Lemma 4.6 and Proposition 3.7,with 1-δ,it holds that

Thus,we have completed the proof of Theorem 3.9. □

主站蜘蛛池模板: 国产精品综合色区在线观看| 国产在线麻豆波多野结衣| 欧美区国产区| 国产www网站| 国产激情影院| 亚洲精品波多野结衣| 多人乱p欧美在线观看| 亚洲AV无码久久精品色欲| 国产专区综合另类日韩一区| 狠狠做深爱婷婷综合一区| 久热中文字幕在线| 免费无码又爽又黄又刺激网站| 亚洲全网成人资源在线观看| 女人爽到高潮免费视频大全| 日韩国产一区二区三区无码| 亚洲最黄视频| 欧美不卡二区| 狠狠亚洲五月天| 极品尤物av美乳在线观看| 日韩福利视频导航| 国产在线一二三区| 久久精品嫩草研究院| 91无码国产视频| 国产亚洲精品97AA片在线播放| 中文字幕人成人乱码亚洲电影| 成人国产小视频| 国产在线拍偷自揄观看视频网站| AV熟女乱| 亚洲av无码牛牛影视在线二区| 国产香蕉在线| 亚洲人成在线精品| 精品国产亚洲人成在线| 在线观看国产小视频| 国产91av在线| 亚洲午夜国产精品无卡| 中文字幕免费播放| 激情爆乳一区二区| 亚洲无码视频一区二区三区 | 国产剧情伊人| 亚洲欧美精品日韩欧美| 99在线国产| 日本尹人综合香蕉在线观看| 亚洲免费福利视频| 一区二区三区四区精品视频 | 综合色亚洲| 人妻无码AⅤ中文字| 欧美不卡视频在线观看| 亚洲成A人V欧美综合| 欧美a在线视频| 综合色天天| 无码一区二区波多野结衣播放搜索| 青青草原国产免费av观看| 日韩a在线观看免费观看| 国内精品久久久久久久久久影视| 国产产在线精品亚洲aavv| 毛片网站在线看| 亚洲欧美日本国产综合在线| 久操线在视频在线观看| 久久青青草原亚洲av无码| 国产精品欧美在线观看| 久久久久久久蜜桃| 女人天堂av免费| 日韩国产亚洲一区二区在线观看| 国产精品免费久久久久影院无码| a毛片免费观看| 国产18页| 日韩国产精品无码一区二区三区| 色婷婷亚洲十月十月色天| 国产自在线播放| 国产成人精品一区二区秒拍1o| 久久国产高清视频| 全部无卡免费的毛片在线看| 亚洲视频影院| 国产69囗曝护士吞精在线视频| 日本不卡在线视频| 国产微拍一区| 这里只有精品在线| 日韩黄色在线| 黄片在线永久| 波多野结衣中文字幕一区| 丁香六月激情综合| 99久久亚洲综合精品TS|