随机加速梯度算法的回归学习收敛速度

数学物理学报 ›› 2021, Vol. 41 ›› Issue (5): 1574-1584.

• 论文 • 上一篇

随机加速梯度算法的回归学习收敛速度

程一元¹,查星星^1,*(),张永全²

¹ 巢湖学院数学与统计学院合肥 238024
² 浙江财经大学数据科学学院杭州 310018

收稿日期:2020-04-21 出版日期:2021-10-26 发布日期:2021-10-08
通讯作者: 查星星 E-mail:cyymath@163.com
基金资助:
国家自然科学基金(61573324);安徽省高校自然科学研究项目(KJ2018A0455);安徽省高校青年人才支持基金(gxyq2019082);巢湖学院校级科研基金(XLY-201903)

On Stochastic Accelerated Gradient with Convergence Rate of Regression Learning

Yiyuan Cheng¹,Xingxing Zha^1,*(),Yongquan Zhang²

¹ School of Mathematics and Statistics, Chaohu University, Hefei 238024
² School of Data Sciences, Zhejiang University of Finance & Economics, Hangzhou 310018

Received:2020-04-21 Online:2021-10-26 Published:2021-10-08
Contact: Xingxing Zha E-mail:cyymath@163.com
Supported by:
the NSFC(61573324);the Natural Science Research Project in Anhui Province(KJ2018A0455);the Program in the Youth Elite Support Plan in Universities of Anhui Province(gxyq2019082);the Fund of Chaohu University(XLY-201903)

摘要/Abstract

摘要：

该文考虑两个经典监督学习问题（即最小二乘和logistic回归）的随机逼近.在损失函数假设非强凸性基础上，减弱了梯度的Lipschitz连续条件，提出了两种加速随机梯度算法.通过对大多数现有工作中的经验风险（期望）的非渐近分析，得到该算法的收敛速度为 $O(1/n)$ ，其中 $n$ 是样本数量.与已知的结果相比，只需要较少的条件就可以得到最小二乘回归和logistic回归问题的收敛速度.

关键词: 最小二乘回归, 逻辑回归, 收敛速度

Abstract:

This paper studies the regression learning problem from given sample data by using stochastic approximation (SA) type algorithm, namely, the accelerated SA.We focus on problems without strong convexity, for which all well known algorithms achieve a convergence rate for function values of $O(1/n)$ . We consider and analyze accelerated SA algorithm that achieves a rate of $O(1/n)$ for classical least square regression and logistic regression problems respectively. Comparing with the well known results, we only need fewer conditions to obtain the tight convergence rate for least square regression and logistic regression problems.

Key words: Least-square regression, Logistic regression, Convergence rate

中图分类号:

O174.13

程一元,查星星,张永全. 随机加速梯度算法的回归学习收敛速度[J]. 数学物理学报, 2021, 41(5): 1574-1584.

Yiyuan Cheng,Xingxing Zha,Yongquan Zhang. On Stochastic Accelerated Gradient with Convergence Rate of Regression Learning[J]. Acta mathematica scientia,Series A, 2021, 41(5): 1574-1584.

参考文献 20

1	Robbins H , Monro S . A Stochastic approximation method. Annals of Mathematical Statistics, 1951, 22 (3): 400- 407 doi: 10.1214/aoms/1177729586
2	Polyak B T , Juditsky A B . Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 1992, 30 (4): 838- 855 doi: 10.1137/0330046
3	Bottou L, Bousquet O. The tradeoffs of large scale learning//Platt J C, Koller D, Singer Y, et al. Adv Neural Inform Process Syst 20. Cambridge: MIT Press, 2008
4	Shalev-Shwartz S , Singer Y , Srebro N , Cotter A . Pegasos: Primal estimated sub-gradient solver for SVM. Math Prog, 2007, 127 (1): 3- 30
5	Nemirovski A , Juditsky A , et al. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19 (4): 1574- 1609 doi: 10.1137/070704277
6	Lan G , Monteiro R D C . Iteration-complexity of first-order penalty methods for convex programming. Mathematical Programming, 2013, 138, 115- 139 doi: 10.1007/s10107-012-0588-x
7	Ghadimi S , Lan G . Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 2016, 156 (1/2): 59- 99
8	Bach F, Moulines E. Non-asymptotic analysis of stochastic approximation algorithms for machine learning//Platt J C, Koller D, Singer Y, et al. Adv Neural Inform Process Syst 20. Cambridge: MIT Press, 2011: 451-459
9	Bach F, Moulines E. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)//Platt J C, Koller D, Singer Y, et al. Adv Neural Inform Process Syst 20. Cambridge: MIT Press, 2013
10	Duchi J , Hazan E , Singer Y . Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2010, 12, 2121- 2159
11	Gasnikov A V , Nesterov Y E , Spokoiny V G . On the efficiency of a randomized mirror descent algorithm in online optimization problems. Computational Mathematics&Mathematical Physics, 2015, 55, 580- 596 doi: 10.1134/S0965542515040041
12	Beck A , Teboulle M . A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sciences, 2009, 2, 183- 202 doi: 10.1137/080716542
13	Tseng P , Yun S . Incrementally updated gradient methods for constrained and regularized optimization. Journal of Optimization Theory & Applications, 2014, 160 (3): 832- 853 doi: 10.1007/s10957-013-0409-2
14	Nesterov Y . Smooth minimization of nonsmooth functions. Mathematical Programming, 2005, 103, 127- 152 doi: 10.1007/s10107-004-0552-5
15	Nesterov Y . Subgradient methods for huge-scale optimization problems. Mathematical Programming, 2014, 146 (1/2): 275- 297
16	Lan G . An optimal method for stochastic composite optimization. Mathematical Programming, 2012, 133 (1): 365- 397
17	Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization Ⅰ: A Generic Algorithmic Framework. SIAM Journal on Optimization, 2012, 22, 1469- 1492 doi: 10.1137/110848864
18	Ghadimi S , Lan G . Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 2016, 156 (1/2): 59- 99
19	Ghadimi S , Lan G , Zhang H C . Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 2016, 155 (1/2): 267- 305
20	Lan G . Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Mathematical Programming, 2013, 149, 1- 45

Metrics

Viewed

Full text

237

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	237

Abstract

Cited

Shared

Discussed

[1]	王凯光,高岳林. Riemannian流形中DE算法算子最优特征量的量子渐进估计[J]. 数学物理学报, 2020, 40(1): 31-43.
[2]	姚庆六. 一个非线性分数微分方程奇异解的存在性与逐次迭代方法[J]. 数学物理学报, 2016, 36(2): 287-296.
[3]	陶宝. 负极值指标估计量的渐近性质[J]. 数学物理学报, 2014, 34(3): 611-618.
[4]	凌能祥, 丁洁. 相依函数型数据条件密度估计的渐近性质[J]. 数学物理学报, 2012, 32(3): 547-556.
[5]	涂天亮, 莫炯. 插值逼近具边界数据的调和函数[J]. 数学物理学报, 2010, 30(2): 397-404.
[6]	沈陆明, 张继宏, 镇志勇. 形式Laurent级数域上交错Oppenheim展开的研究[J]. 数学物理学报, 2010, 30(1): 207-216.
[7]	邢国东, 杨善朝. 负相协随机变量的指数不等式[J]. 数学物理学报, 2009, 29(6): 1679-1688.
[8]	席福宝. 带马氏切换的马氏过程的指数收敛速度[J]. 数学物理学报, 2009, 29(4): 1051-1057.
[9]	尹长明, 李永明, 王朋炎. 广义线性模型中极大拟似然估计的强收敛速度[J]. 数学物理学报, 2009, 29(4): 1058-1064.
[10]	唐庆国; 王金德. 系数函数光滑度互不相同的变系数模型的一步估计法[J]. 数学物理学报, 2008, 28(4): 701-710.
[11]	甘师信;陈平炎. NOD序列加权和的强收敛速度[J]. 数学物理学报, 2008, 28(2): 283-290.
[12]	孙燕, 柴根象. 固定设计下回归函数的小波估计[J]. 数学物理学报, 2004, 4(5): 597-606.
[13]	薛留根. 混合误差下回归函数小波估计的一致收敛速度[J]. 数学物理学报, 2002, 22(4): 528-535.
[14]	杨筱菡, 柴根象. 相依样本下线性模型误差分布的相合估计[J]. 数学物理学报, 2002, 22(4): 536-547.
[15]	薛留根, 胡玉萍. 条件t-分位数核估计的逼近速度[J]. 数学物理学报, 2001, 21(2): 215-224.

随机加速梯度算法的回归学习收敛速度

On Stochastic Accelerated Gradient with Convergence Rate of Regression Learning

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献 20

相关文章 15

Metrics

本文评价

推荐阅读 10