999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

A Gradient Iteration Method for Functional Linear Regression in Reproducing Kernel Hilbert Spaces

2022-10-10 07:12:02HongzhiTongandMichaelNg
Annals of Applied Mathematics 2022年3期

Hongzhi Tong and Michael Ng

1 School of Statistics,University of International Business and Economics,Beijing 100029,China

2 Institute of Data Science and Department of Mathematics,The University of Hong Kong,Pokfulam,Hong Kong,China

Abstract.We consider a gradient iteration algorithm for prediction of functional linear regression under the framework of reproducing kernel Hilbert spaces.In the algorithm,we use an early stopping technique,instead of the classical Tikhonov regularization,to prevent the iteration from an overfitting function.Under mild conditions,we obtain upper bounds,essentially matching the known minimax lower bounds,for excess prediction risk.An almost sure convergence is also established for the proposed algorithm.

Key words: Gradient iteration algorithm,functional linear regression,reproducing kernel Hilbert space,early stopping,convergence rates.

1 Introduction

Due to advance in technology,data are increasingly collected in the form of random functions or curves,as opposed to scalars or vectors.Functional data analysis(FDA)is developed to handle this situation,has drawn considerable attention in recent decades.Various approaches for the analysis of functional data have been developed and proposed in the literature [6,8,12,13,17],offering a comprehensive overview.

Among many problems involving functional data,functional linear regression is widely used to model the prediction of a functional predictor.Consider the following functional linear regression model

whereYis a scalar response,X(·) is a square integrable random function (with respect to Lesbesgue measure) on a bounded interval I,α0is the intercept,β0(·)is an unknown slope function andis a centered noise random variable.Without loss of much generality,throughout the paper we assume E(X)0 and the interceptα00,since the intercept can be easily estimated.

The goal of the prediction problems is to estimate the functional

based on a set of training data{(Xi,Yi):i1,···,n}consisting ofnindependent copies of (X,Y).Define the risk for a predictionηas

where (X*,Y *) is a copy of (X,Y) independent of the training data,and E*represents expectations taken overX*andY *only.Let ?ηbe a prediction constructed from the training data.Then,its accuracy can be naturally measured by the excess risk:

In the context of functional linear regression,most of existing methods are based upon functional principal analysis (FPCA),see,e.g.,[2,7,13,19].The FPCA approach expands the unknown functionβ0using the eigenfunction of the predictor covariance operator.For such a strategy to work well,it is usually necessary to assume that such a basis provides a good presentation of the slope function,which may not have anything to do with the predictor in terms of basis representation accuracy.A more general assumption for slope function may be on its smoothness,it makes a reproducing kernel Hilbert space (RKHS) [9] an interesting alternative,see [3,11,20].In particular,it has already been shown in [3] that the approach based on RKHS performs better when the slope function does not align well with the eigenfunctions of the covariance kernel.

Motivated by these observations,in this paper we will develop an iterative estimation procedure for functional linear model (1.1) within the framework of RKHS,under which the unknown slope functionβ0is assumed to reside in an RKHSHK.

More specifically,we are concerned with a kind of iteration path from origin.To prevent the iteration to converge to an overfitting function,it is necessary to choose the moment when the iteration stops.Through a detailed theoretical analysis for a form of gradient-descent applied to the least-squares loss function,we have come up with an early stop rule,which plays a role of regularization.Under the early stopping strategy and some realistic conditions,we show that the same minimax convergence rate as that of [3] is preserved for our algorithm.Different from the result in [3] established in expectation,our convergence rate is in probability.As a consequence,we deduce almost sure convergence of the gradient iteration algorithm by applying the Borel-Cantelli lemma.

Although the gradient iteration methods with early stopping have been well studied in the context of vector data,see e.g.,[14,18],it is still challenging to deal with the more complicated functional data.To our knowledge,this is the first time we provide a theoretical guarantee for use of these algorithms in functional settings.It is expected to bring some new insights into broadening the methods of FDA.The outline of this paper is given as follows.In Section 2,we review the mathematical framework to formulate the algorithm.In Section 3,we show our main results.A brief discussion are given in Section 4.

2 Notation and algorithm

We begin by introducing some notations on RKHS and kernel-induced integral operators,before turning to a precise formulation of the algorithm studied in this paper.

2.1 Integral operators

Recall a reproducing kernelK:I×I→R is a real,symmetric,continuous,and nonnegative definite function.The RKHSHKassociated with the kernelKis the completion of the linear span of functions{Ks:K(s,·)I}with the inner product given byK(s,u). The so-called reproducing property ofHKis critical in both theoretical analysis and computation,which states that

DenoteL2the Hilbert space of square integrable functions on I endowed with inner product〈·,·〉and corresponding norm‖·‖(i.e.,‖f‖2〈f,f〉).We consider that‖·‖opstands for the usual operator norm,that is,for an operator U:L2→L2.Define the integral operator LK:L2→L2associated with kernelKby

where for2,f ?g:L2→L2is defined by (f ?g)(h)〈g,h〉ffor any2.The empirical version of T is given by

Both two operators will play the key role in the formulation and analysis of our gradient iteration algorithm.

2.2 Gradient iteration algorithm

Consider the empirical risk minimization based on the observation{(Xi,Yi):i1,···,n}of least square scheme

for any.In the following,we assume thatHKis dense inL2,which ensures thatf0andfare uniquely defined.This assumption can be satisfied whenKis a universal kernel,such as Gaussian kernel,see [15].Now (2.1) can be rewritten as

Our algorithm is then taken as a gradient descent method for (2.2),which can be stated iteratively as a sequence2by

whereγ >0 is the step size.Note that

it follows by induction with0 that

where I is the identity operator and

It is not hard to see that

From the iteration relation (2.4),we can obtain

In the next section,we shall estimate each of terms of (2.5) so that the bound of the excess riskE()-E(η0) can be obtained.

3 Convergence analysis

This section presents our main results.Theorem 3.1 and Theorem 3.2 contain separate estimates of the first and second term of (2.5) and lead to Theorem 3.3 which gives an upper bound ofE()-E(η0).

3.1 Assumptions and lemmas

To derive nontrivial rates of convergence,we need impose some basic assumptions.The following boundedness condition is introduced in [16]

(A1) There exists a constantκ>0 such that≤κ2almost surely.

It follows from (A1) that both T and Tnare compact trace-class operators.

We assume the noisein model (1.1) satisfies Bernstein condition:holds almost surely,for every integerl≥2 and some constantsM,v>0.

Assumption (A2) is a broad model for the noise,it is satisfied when the noise is uniformly bounded,Gaussian or sub-Gaussian.

We also introduce a regularity condition of the operator T,which is measured in terms of the so-called effective dimensionality (see [4,22]).Forλ>0,the effective dimension of T is defined to be the trace of the operator (T+λI)-1T as follows:

(A3) There exist constants 0<θ≤1 andcθ >0,such that

In order to derive the error bounds of the terms in (2.5),we still need some preliminary lemmas.The following lemma was proved in [16],which will play a key role in our analysis.

Lemma 3.1.Under the assumption in(A1),for any0<δ <1,with confidence at least1-δ/2,there holds

The next lemma was established in [4],based on a Bernstein-type inequality for random variables taking values in Hilbert space,see [10,21].

Lemma 3.2.Let H be a Hilbert space and ξ be a random variable with values in H.Let {ξ1,ξ2,···,ξn} be a sample of n independent observations for ξ.If for some constants u,v>0,the bound

holds for every2N,then for any0<δ<1,

with confidence at least1-δ.

3.2 Main results

We first bound the first term on the right-hand side of (2.5) by using Lemma 3.1.

Theorem 3.1.Under the assumption in(A1)and0<γ <κ-2,for any0<δ <1,with confidence at least1-δ/2,there holds

Proof.Since

Using a operator norm inequality for self-adjoint positive operators A and B,see [1,Theorem X.1.1],where the result is stated for positive matrices,but the proof applies as well to positive operators on a Hilbert space.We can get from Lemma 3.1 that with confidence at least 1-δ/2,

Hence,

This together with (3.2) proves the theorem.

Next we turn to bound the second term of (2.5) by using Lemmas 3.1 and 3.2.

Theorem 3.2.Under the assumption of(A1),(A2)and0<γ<κ-2,for any0<δ<1,with confidence at least1-δ,there holds

Proof.Since

Here in the last inequality we have used the fact (3.1) again.By Lemma 3.1,we have with the same confidence set of (3.2),

To estimate

and forl≥2,

Note that

Hence we obtain from (A2) that

By applying Lemma 3.2 toξ,we have with confidence at least 1-δ/2,

This together with (3.3),(3.4) proves the theorem.

Now we can derive the explicit learning rates of gradient iteration algorithm(2.3)for functional linear regression model (1.1).

where C is a constant independent of n or δ,denotes the smallest integer greater or equal than xR.

Since Theorems 3.1 and 3.2 hold simultaneously with probability at least 1-δ,we get from (2.5) that

This proves the theorem with

3.3 Almost sure convergence

Based on the confidence-based error estimate in Theorem 3.3,we can deduce almost sure convergence of gradient iteration algorithm (2.3) by applying the following Borel-Cantelli lemma [5,p.262].

Lemma 3.3.Let μi be a sequence of events in some probability space and νi be a sequence of positive numbers satisfyinglimi→∞νi0.If

then μi converges to μ almost surely.

almost surely.

Proof.Takeδδnn-2in Theorem 3.3.Then for anynandε >0,we get from Theorem 3.3,

We can write

it is easy to see

and limn→∞νn0.Then our conclusion follows by Lemma 3.3.

4 Discussion and summary

In this paper we have introduced a gradient iteration approach to functional linear regression model.To prevent the iteration to converge to an overfitting function,we develop an early stopping rule.Rigorous theoretical analysis grantees the proposed algorithm converge to target almost surely.

The main difference is that for functional linear regression,the unknown slope function is assumed in an RKHS,and the functionsXiare not in the RKHS.The minimization problem is written directly inL2,and gradients are taken w.r.t.theL2inner product.While in most of the papers for classical non-parametric regression in RKHS (see,e.g.,[14,18]),the gradients are taken w.r.t.the inner product in the RKHS,and the feature mapsbelong to the RKHS.

It is interesting to discuss what happens if the slope function is out of the RKHS,a promising way is to quantify the regularity of the slope function by means of operator T.It is involved in the alignment of the reproducing kernelKand the covariance function ofX,and will be a topic in the future work.

We finally present a detailed comparison of our result with related results for model (1.1).In [3],the authors derive minmax optimal rates for the Tikhonov regularized least squares scheme

Under two crucial assumptions: the eigenvalues of T decay

withr>1/2 and

for some constantc>0 and any2.The authors choose a regularization parameterleading to the excess prediction risk

being minimax optimal.

Inequality (4.2) may be very difficult to verify except for Gaussian dataX.The boundedness assumption (A1) is used to replace it.Although condition (A1) is not satisfied for any non-degenerate Gaussian process,real-data processes are bounded as usual.Hence,Assumption (A1) is more realistic and at least can be considered as complimentary conditions to (4.2).

On the other hand,Assumption(A3)is more general than(4.1).In fact,if(4.1)is satisfied,it is easy to check that

The convergence rate coincides the minimax rate (4.3) of [3],and thus is optimal.

Acknowledgements

H.Tong’s research is supported in part by National Natural Science Foundation of China(Grant No.11871438).M.Ng’s research is supported in part by the HKRGC GRF Nos.12300218,12300519,17201020,17300021,C1013-21GF,C7004-21GF,and Joint NSFC-RGC N-HKU76921.


登錄APP查看全文

主站蜘蛛池模板: 欧美国产三级| 国产原创自拍不卡第一页| 欧洲一区二区三区无码| 国产精女同一区二区三区久| a级毛片免费在线观看| 国产成人精品综合| 在线高清亚洲精品二区| 午夜电影在线观看国产1区| 青青热久免费精品视频6| 成年人免费国产视频| 2021国产乱人伦在线播放| 粉嫩国产白浆在线观看| 国产chinese男男gay视频网| 日本一本正道综合久久dvd| 国产精品一老牛影视频| 国产流白浆视频| 人妻中文久热无码丝袜| 国产另类乱子伦精品免费女| 久久久久国产精品嫩草影院| 天堂av综合网| 永久毛片在线播| 91口爆吞精国产对白第三集| 91九色视频网| 3D动漫精品啪啪一区二区下载| 91精品国产自产在线老师啪l| 亚洲最大综合网| 青青青视频蜜桃一区二区| 欧美亚洲国产日韩电影在线| 国产网友愉拍精品视频| 国产精品林美惠子在线播放| 六月婷婷综合| 国产精品9| 中国一级特黄大片在线观看| 91精品小视频| 国产人成在线观看| 高清无码一本到东京热| 精品亚洲欧美中文字幕在线看| 国产91麻豆免费观看| 亚洲欧洲日本在线| a亚洲天堂| 亚洲aaa视频| 色成人亚洲| 在线亚洲小视频| 伊人久久大香线蕉影院| 国产一区免费在线观看| 亚洲综合久久成人AV| 玩两个丰满老熟女久久网| 国产免费久久精品99re丫丫一| 国产成人精品视频一区二区电影| 无码aaa视频| 久久人与动人物A级毛片| 国产精品久久久久久久久kt| 六月婷婷精品视频在线观看| 色婷婷成人| 亚洲高清中文字幕| 亚洲av色吊丝无码| 青青久久91| 亚洲第一天堂无码专区| 亚洲一区毛片| 亚洲精品卡2卡3卡4卡5卡区| 亚洲欧美人成电影在线观看| 久久精品国产999大香线焦| 无码免费试看| 九色视频在线免费观看| 久久久久人妻精品一区三寸蜜桃| 国产成人8x视频一区二区| 日本免费一级视频| 最近最新中文字幕在线第一页| 久久国产精品嫖妓| 国产成熟女人性满足视频| 99国产在线视频| 亚洲Av综合日韩精品久久久| 亚洲AⅤ无码国产精品| 日韩中文欧美| 亚洲AⅤ无码日韩AV无码网站| 国产在线专区| 国产久草视频| 精品国产Ⅴ无码大片在线观看81| 午夜人性色福利无码视频在线观看| 日韩成人高清无码| 欧美亚洲综合免费精品高清在线观看| 狠狠色婷婷丁香综合久久韩国|