数学物理学报, 2024, 44(1): 195-208

非凸多分块优化的 Bregman ADMM 的收敛率研究

陈建华,, 彭建文,*

重庆师范大学数学科学学院 重庆 401331

Research on the Convergence Rate of Bregman ADMM for Nonconvex Multiblock Optimization

Chen Jianhua,, Peng Jianwen,*

College of Mathematical Sciences; Chongqing Normal University, Chongqing 401331

通讯作者: 彭建文, E-mail:jwpeng168@hotmail.com

收稿日期: 2022-06-9   修回日期: 2023-08-16  

基金资助: 国家自然科学基金(12271071)
国家自然科学基金(11991024)
重庆英才·创新创业领军人才·创新创业示范团队项目(CQYC20210309536)
重庆英才计划"包干制"项目(cstc2022ycjh-bgzxm0147)
重庆市高校创新研究群体项目(CXQT20-014)
重庆市自然科学基金项目(cstc2021jcyj-msxmX0300)

Received: 2022-06-9   Revised: 2023-08-16  

Fund supported: National Natural Science Foundation of China(12271071)
National Natural Science Foundation of China(11991024)
Team Project of Innovation Leading Talent in Chongqing(CQYC20210309536)
Chongqing Talent Plan contract system project(cstc2022ycjh-bgzxm0147)
Chongqing University Innovation Research Group Project(CXQT20-014)
Basic and Advanced Research Project of Chongqing(cstc2021jcyj-msxmX0300)

作者简介 About authors

陈建华,E-mail:1224888402@qq.com

摘要

Wang 等提出了求解带线性约束的多块可分非凸优化问题的带 Bregman 距离的交替方向乘子法 (Bregman ADMM), 并证明了其收敛性. 该文将进一步研究求解带线性约束的多块可分非凸优化问题的 Bregman ADMM 的收敛率, 以及算法产生的迭代点列有界的充分条件. 在效益函数的 Kurdyka-Łojasiewicz (KŁ) 性质下, 该文建立了值和迭代的收敛速率, 证明了与目标函数相关的各种 KŁ 指数值可获得 Bregman ADMM 的三种不同收敛速度. 更确切地说, 该文证明了如下结果:如果效益函数的 KŁ 指数$\theta=0$, 那么由 Bregman ADMM 生成的序列经过有限次迭代后收敛; 如果$\theta \in \left (0, \frac{1}{2}\right ]$, 那么Bregman ADMM是线性收敛的;如果$\theta \in \left ( \frac{1}{2}, 1\right )$, 那么 Bregman ADMM 是次线性收敛的.

关键词: 非凸优化问题; 交替方向乘子法; Kurdyka-Łojasiewicz 性质; Bregman 距离; 收敛率; 有界性

Abstract

Wang et al proposed the alternating direction method of multipliers with Bregman distance (Bregman ADMM) for solving multi-block separable nonconvex optimization problems with linear constraints, and proved its convergence.In this paper, we will further study the convergence rate of Bregman ADMM for solving multi-block separable nonconvex optimization problems with linear constraints, and the sufficient conditions for the boundedness of the iterative point sequence generated by the algorithm.Under the Kurdyka-Łojasiewicz property of benefit function, this paper establish the convergence rates for the values and iterates, and we show that various values of KŁ-exponent associated with the objective function can obtain Bregman ADMM with three different convergence rates. More precisely, this paper proves the following results:if the(KŁ) exponent of the benefit function$\theta=0$, then the sequence generated by Bregman ADMM converges in a finite numbers of iterations; if$\theta \in \left (0, \frac{1}{2}\right ]$, then Bregman ADMM is linearly convergent; if$\theta \in \left ( \frac{1}{2}, 1\right )$, then Bregman ADMM is sublinear convergent.

Keywords: Nonconvex optimization problem; The alternating direction method of multipliers; Kurdyka-Łojasiewicz property; Bregman distance; Convergence rates; Boundedness

PDF (638KB) 元数据 多维度评价 相关文章 导出 EndNote| Ris| Bibtex  收藏本文

本文引用格式

陈建华, 彭建文. 非凸多分块优化的 Bregman ADMM 的收敛率研究[J]. 数学物理学报, 2024, 44(1): 195-208

Chen Jianhua, Peng Jianwen. Research on the Convergence Rate of Bregman ADMM for Nonconvex Multiblock Optimization[J]. Acta Mathematica Scientia, 2024, 44(1): 195-208

1 引言

本文考虑如下带线性约束的非凸优化问题

$\begin{aligned} \min \, \:f_{1}\left ( x_{1}\right )+f_{2}\left ( x_{2}\right )+\cdots +f_{N}\left ( x_{N}\right ) \\ {\rm s.t.} \:A_{1}x_{1}+A_{2}x_{2}+\cdots +A_{N}x_{N}=0{, } \end{aligned}$

其中$A_{i}\in \mathbb{R}^{m\times n_{i}}, f_{i}:\mathbb{R}^{n_{i}}\rightarrow\mathbb{R}, i=1, 2, \cdots, N-1$是正常下半连续函数,$f_{N}:\mathbb{R}^{n_{N}}\rightarrow \mathbb{R}$是一个连续可微函数. 问题 (1.1) 在信号处理, 图像处理, 稀疏优化, 机器学习等[1]领域中都有着广泛的应用.问题(1.1)的增广拉格朗日函数$L_{\alpha}:\mathbb{R}^{n_{1}}\times \mathbb{R}^{n_{2}}\times\cdots \times \mathbb{R}^{n_{N}}\times\mathbb{R}^{m}\rightarrow\mathbb{R}$定义为

$L_{\alpha}\left ( x_{1}, x_{2}, \cdots x_{N}, p\right )=\sum\limits_{i=1}^{N}f_{i}\left ( x_{i}\right )+\sum\limits_{i=1}^{N}<p, A_{i}x_{i}>+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}\right \|^{2}, $

这里$p\in \mathbb{R}^{m}$是拉格朗日乘子,$\alpha > 0$是惩罚参数.

交替方向乘子法 (ADMM) 是在 20 世纪 70 年代初由 Glowinski, Marrocco 和 Gabay 等[2,3] 提出, 是求解可分优化问题的一种分裂算法, 当$N=2$时, 其求解问题 (1.1) 的经典迭代格式如下

$\left\{\begin{array}{l} x_{1}^{k+1}\in \arg\min \left \{L_{\alpha }\left ( x_{1}, x_{2}^{k}, p^{k}\right )|x_{1}\in \mathbb{R}^{n_{1}}\right \}, \\ x_{2}^{k+1}\in \arg\min \left \{L_{\alpha }\left ( x_{1}^{k+1}, x_{2}, p^{k}\right )|x_{2}\in \mathbb{R}^{n_{2}}\right \}, \\ p^{k+1}=p^{k}+\alpha \left (A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1} \right ). \end{array}\right.$

当$N=2$,$f_{1}$和$f_{2}$是凸函数时, ADMM(1.2) 及其变体的收敛性和收敛率的研究已较为完善与成熟, 如文献[4-11]. He[5] 讨论了 ADMM 在遍历意义下和点列意义下的$O\left ( \frac{1}{t}\right )$迭代复杂性, He 等[7]针对上面的问题提出带松弛系数$r$和$s$的对称 ADMM (PR 分裂算法), 算法如下

$ \left\{ \begin{array}{l} {x_{1}^{k+1}} = \arg \min \left\{ {{L_\alpha }\left( {x_{1}, x_{2}^{k}, {p ^k}} \right)|x \in \mathbb{R}^{n_{1}}} \right\}, {\rm{ }}\\ \notag {\rm{ }}{p ^{k + \frac{1}{2}}} = {p ^k} - r\alpha \left( {A{x_{1}^{k+1}} + Bx_{2}^{k} - b} \right), \\ \notag {x_{2}^{k+1}} = \arg \min \left\{ {{L_\alpha }\left( {{x_{1}^{k + 1}}, x_{2}, {p ^{k + \frac{1}{2}}}} \right)|y \in \mathbb{R}^{n_{2}}} \right\}, {\rm{ }}\\ \notag {p ^{k + 1}} = {p ^{k + \frac{1}{2}}} - s\alpha \left( {A{x_{1}^{k + 1}} + B{x_{2}^{k + 1}} - b} \right). \notag \end{array} \right.$

Fazel 等[8] 提出下面的带半定邻近项的 ADMM (sPADMM)

$ \left\{ \begin{array}{l} {x_{1}^{k + 1}} = \arg \min \left\{ {{L_\alpha }\left( {x_{1}, {x_{2}^k}, {p ^k}} \right) + \frac{1}{2}\left\| {x_{1} - {x_{1}^k}} \right\|_P^2|x_{1} \in \mathbb{R}^{n_{1}}} \right\}, {\rm{ }}\\ \notag {x_{2}^{k + 1}} = \arg \min \left\{ {{L_\alpha }\left( {{x_{1}^{k + 1}}, x_{2}, {p ^{k}}} \right) + \frac{1}{2}\left\| {x_{2} - {x_{2}^k}} \right\|_Q^2|x_{2} \in \mathbb{R}^{n_{2}}} \right\}, {\rm{ }}\\ \notag {\rm{ }}{p^{k + 1}} = {p ^{k }} - s\alpha \left( {A{x_{1}^{k + 1}} + B{x_{2}^{k + 1}} - b} \right), \notag \end{array} \right.$

其中$s\in \left ( 0, \left ( 1+\sqrt{5}\right )/2\right )$,$P, Q$是半正定矩阵. 邻近 ADMM 可以简化子问题的求解, 使算法更灵活.

Li 等[9]提出带不定邻近项的 majorized ADMM (iPADMM), 在一些温和的条件下得出了该算法的收敛性和次线性收敛率, 且通过数值实验得出 iPADMM 的数值效果优于 sPADMM.

Zhang 等[10]给出了 iPADMM 的线性收敛率, 并给出适用更广的条件, 简化了算法的收敛性证明过程, 还将该算法应用到三种广义的 Lasso 问题上, 数值效果表明 iPADMM 优于 sPADMM.

当$f_{1}$和$f_{2}$中至少有一个是非凸函数时, 非凸两分块ADMM 及其变体的收敛性和收敛率分析也不断涌现, 如文献[12-16].Wang 等[12] 对经典 ADMM 的两个子问题都加上 Bregman 距离函数, 在$A_{1}$为行满秩,$x_{2}$-子问题的 Bregman 距离函数强凸, 且$L_{\alpha }\left ( x_{1}, x_{2}, p\right )$关于$x_{1}$或$x_{1}$-子问题的 Bregman 距离函数强凸的情况下, 分析了相应算法收敛结果. Guo 等[13] 在增广拉格朗日函数满足 KŁ 性质且惩罚参数大于 2L 等假设下, 证明了由 ADMM 生成的迭代序列收敛到所考虑问题的临界点, 在更多条件下还给出了 ADMM 的收敛率, 其中 L 表示目标函数中光滑函数梯度的利普希茨常数. 最近, Jia 等[16] 证明了如果满足某些误差的上下界条件, 那么 ADMM 在非凸优化问题中生成的序列以线性速率局部收敛.

当$N\geqslant 3$, 且$f_{i}\left ( x\right )$都是凸函数时, 相应的 ADMM 或其他分裂算法的收敛性仍有待深入研究, 甚至无法收敛[17]. Han 和 Yuan[18] 首证明了当所有目标函数$f_{i}$均强凸, 罚参数小于某一阈值时, 由经典两分块 ADMM 直接推广的多分块 ADMM 版本具有全局收敛性. Lin 等[19] 在函数$f_{3}$是强凸函数且梯度利普希茨连续,$f_{1}$和$f_{2}$都是凸函数,$A_{1}$和$A_{2}$是列满秩,$A_{3}$是单位矩阵的假设下, 证明了对于问题 (1.1)$(N=3)$的三块 ADMM 的收敛性. 由于直接推广的多块 ADMM 的收敛性很难保证, 但其实际数值表现却很好, 所以近些年许多学者致力于设计收敛的多块ADMM, 这些算法不仅有收敛性的保证而且数值上优于直接推广的多块 ADMM. He, Tao 和 Yuan 通过修正并推广 ADMM, 获得多分块 ADMM, 如部分平行分裂 ALM 的预测校正法[20], 带高斯回代 ADMM[21] 等. 最近, Chen 等[22] 提出了一类非精确的基于块对称高斯赛德尔分解的 majorized 多块邻近 ADMM, 解决了一类高维的凸锥规划问题, 证明了该算法的收敛性和收敛率等结论, 并通过数值实验体现了该算法的优越性. 文献[23]提出了一类多块的雅可比 ADMM, 并证明了在以下两种情况的收敛性: (i) 所有的系数矩阵都相互接近正交, 且都是列满秩, (ii) 对子问题附以邻近项, 在此情况下还证明了该算法具有$o\left ( 1/k\right )$的收敛速度.

目前关于多分块优化分裂算法研究更多的工作是聚焦于凸问题. 对于非凸多分块优化的 ADMM 或其他分裂算法的研究相对较少. 在目标函数是光滑非凸的或是凸的非光滑的情形下, Hong 等[24] 在系数矩阵都是列满秩的假设下, 证明了多块 ADMM 的收敛性, 并用来解决一系列非凸共识和共享问题. Guo 等[25] 将非凸两分块的经典 ADMM 直接推广到多分块情形, 在$A_{i}$列满秩且罚参数局限于某一范围时, 分析了算法的全局收敛性. 此外, 若效益函数满足 KŁ 性质, 还获得了算法的强收敛性, 在更多假设下还给出了算法的收敛率分析. Jiang 等[26] 提出了一种不同于 (1.1) 式的结构化非凸非光滑优化方法, 证明了所提出的框架的收敛性, 还分析了其迭代的复杂性. Yashtinin [27]提出了一类多块非凸非光滑的邻近ADMM, 并在效益函数满足 KŁ 性质下证明了该算法的收敛性和收敛率. Jian 等[28]提出了一类非凸多分块优化部分对称正则化交替方向乘子法 (PSRADMM), 在最后一个目标函数相应的系数矩阵是单位矩阵等假设下证明了其收敛性和收敛率. Wang 等[29]提出了一个多块的 Bregman ADMM 用于解决问题 (1.1), 其求解问题 (1.1) 的迭代格式如下

$\left\{\begin{array}{l}x_{1}^{k+1}\in \arg\min \left \{L_{\alpha }\left ( x_{1}, x_{2}^{k}, \cdots, x_{N}^{k}, p^{k}\right )+\Delta _{\phi_{1}}\left (x_{1}, x_{1}^{k} \right )|x_{1}\in \mathbb{R}^{n_{1}}\right \}, \\x_{2}^{k+1}\in \arg\min \left \{L_{\alpha }\left ( x_{1}^{k+1}, x_{2}, \cdots, x_{N}^{k}, p^{k}\right )+\Delta _{\phi_{2}}\left (x_{2}, x_{2}^{k} \right )|x_{2}\in \mathbb{R}^{n_{2}}\right \}, \\ \vdots\\x_{N}^{k+1}\in \arg\min \left \{L_{\alpha }\left ( x_{1}^{k+1}, \cdots, x_{N-1}^{k+1}, x_{N}, p^{k}\right )+\Delta _{\phi_{N}}\left (x_{N}, x_{N}^{k} \right )|x_{N}\in \mathbb{R}^{n_{N}}\right \}, \\p^{k+1}=p^{k}+\alpha \left (A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+\cdots +A_{N}x_{N}^{k+1} \right ),\end{array}\right.$

其中,$\Delta _{\phi_{i}}$是相对于函数$\phi_{i}, i=1, \cdots, N$的一个适当选择的 Bregman 距离. 多块 Bregman ADMM 是经典 ADMM 的一个重要变体, 它在有效性和效率方面都有许多优点. 首先是多块 Bregman ADMM 的全局收敛性不需要目标函数的任意强凸性, 其次是通过选取适当的 Bregman 距离, 可以简化子问题的求解, 提高算法的性能, 详情见文献[29]. 但遗憾的是, 没有对多块 Bregman ADMM 收敛率的研究, 且多块 Bregman ADMM 的收敛性证明中需假设由算法产生的迭代点列有界, 这是一个比较强的假设, 本文将对这一假设给出一个充分条件来保证该假设成立.

非凸多块 ADMM 的收敛性需要很强的假设, 且研究非凸 ADMM 及其变体的收敛率的研究也较少, 受以上文献的启发, 本文借助 KŁ 性质来研究非凸多块 Bregman ADMM 的收敛率. 在效益函数满足 KŁ 性质且指数为$\theta$的假设下, 本文得到了函数和序列的收敛率. KŁ 性质已经被广泛的用于研究各种一阶算法的收敛率[30]. KŁ 性质及其相关的 KŁ 指数起源于代数几何, 它们描述了一个合适的势函数的值 (取决于优化模型和所考虑的算法) 与势函数的一些一阶信息 (梯度或次梯度) 之间的定性关系. 我们将讨论基于 KŁ 指数的三种可能的收敛率.

2 预备知识

本文用$\mathbb{R}^{n_{i}}$表示${n_{i}}$维欧式空间,$\left \| \cdot\right \|$表示欧式范数. 对任意的$x, y\in R^{n_{i}}$, 规定它们的内积为$\left \langle x, y\right \rangle=x^{T}y$, 其中$T$代表转置,$x$的范数$\left \| x\right \|=\sqrt{\left \langle x, x\right \rangle}$.$\mathbb{R}$代表实数集,$\mathbb{Z}$代表整数集,$\mathbb{R_{+}}$代表正实数集,$\mathbb{Z_{+}}$代表正整数集.为了方便表示, 我们令

$u^{k}=\left ( x_{1}^{k}, x_{2}^{k}, \cdots, x_{N}^{k}\right ), w^{k}=\left ( x_{1}^{k}, x_{2}^{k}, \cdots, x_{N}^{k}, p^{k}\right ), \hat{w}^{k}=\left ( x_{1}^{k}, x_{2}^{k}, \cdots, x_{N}^{k}, p^{k}, x_{N}^{k-1}\right ).$

如果$n_{1}, \cdots, n_{N}\in \mathbb{Z_{+}}, N\in \mathbb{Z_{+}}$, 那么对任意的$u=\left ( x_{1}, x_{2}, \cdots, x_{N}\right )\in \mathbb{R}^{n_{1}}\times \mathbb{R}^{n_{2}}\times \cdots \times \mathbb{R}^{n_{N}}$和${u}'=\left ( {x_{1}}', {x_{2}}', \cdots, {x_{N}}'\right )\in \mathbb{R}^{n_{1}}\times \mathbb{R}^{n_{2}}\times \cdots \times \mathbb{R}^{n_{N}}$, 规定它们的笛卡尔积为:$\left \langle \left \langle u, {u}'\right \rangle\right \rangle=\sum\limits_{i=1}^{N}\left \langle x_{i}, {x_{i}}'\right \rangle$, 定义$u=\left ( x_{1}, x_{2}, \cdots, x_{N}\right )\in \mathbb{R}^{n_{1}}\times \mathbb{R}^{n_{2}}\times \cdots \times \mathbb{R}^{n_{N}}$的范数

$\frac{1}{\sqrt{N}}\sum\limits_{i=1}^{N}\left \| x_{i}\right \|\leqslant \left \| \left | u\right |\right \|=\sqrt{\sum\limits_{i=1}^{N}\left \|x_{i}\right \|^{2}}\leqslant \sum\limits_{i=1}^{N}\left \| x_{i}\right \|.$

对任意集合$S\subseteq \mathbb{R}^{n}$, 任意点$x\in \mathbb{R}^{n}$到$S$的距离定义如下

$\text{dist}\left ( x, S\right )=\left\{\begin{matrix}\underset{y\in S}{\inf}\left \| y-x\right \|,& S\neq \varnothing, \\+\infty, &S=\varnothing, \end{matrix}\right.$

对在$\mathbb{R}^{n}$上任意的实值函数$f$, 有:$\text{dist}\left ( 0, \partial f \left ( x\right )\right )=\inf\left \{\left \| s^{\ast}\right \|:s^{\ast }\in \partial f \left ( x\right ) \right \}.$

定义 2.1[31] 函数$f:\mathbb{R}^{n}\rightarrow \mathbb{R}\cup \left \{+\infty\right \}$的有效域为: dom$f=\left \{x\in \mathbb{R}^{n}:f\left ( x\right )< +\infty\right \}$, 若 dom$f\neq\varnothing$, 则称$f$为正常函数.

定义 2.2[31] 若函数$f:\mathbb{R}^{n}\rightarrow \mathbb{R}\cup \left \{+\infty\right \}$在$x_{0}$处满足$f\left ( x_{0}\right )\leqslant \underset{x\rightarrow x_{0}}{\lim\inf}\, f\left ( x\right )$, 则表示函数$f$在$x_{0}$处是下半连续的, 假设$f$在其有效域内的每一点处都满足下半连续性, 则称$f$为下半连续函数.

定义 2.3[31] 设函数$f:\mathbb{R}^{n}\rightarrow \mathbb{R}\cup \left \{+\infty\right \}$正常下半连续,

(1) 函数$f$在$x\in$dom$f$处的 Fréchet 次微分, 定义为所有满足下面条件的$x^{\ast}$的集合, 记作$\hat{\partial}f\left ( x\right )$,

$\underset{y\neq x, y\rightarrow x}{\lim\inf}\frac{f\left ( y\right )-f\left ( x\right )-\left \langle x^{\ast}, y-x\right \rangle}{\left \| y-x\right \|}\geqslant 0, $

若$x\notin$dom$f$, 令$\hat{\partial}f\left ( x\right ): =\varnothing$.

(2) 函数$f$在$x\in$dom$f$处的极限次微分, 记为$\partial f\left ( x\right )$, 定义如下

$\partial f\left ( x\right ):=\left \{x^{\ast}\in \mathbb{R}^{n}:\exists x_{n}\rightarrow x, f\left ( x_{n}\right )\rightarrow f\left ( x\right ), x_{n}^{\ast}\in \hat{\partial}f\left ( x_{n}\right ), x_{n}^{\ast}\rightarrow x^{\ast}\right\}.$

推论 2.1[31] (1) 对$\forall x\in \mathbb{R}^{n}$,$\hat{\partial}f\left ( x\right )\subseteq \partial f\left ( x\right )$, 其中$\hat{\partial}f\left ( x\right )$为闭凸集, 而$\partial f\left ( x\right )$仅为闭集.

(2) 设$x_{k}^{\ast}\in \partial f\left ( x_{k}\right )$,$\underset{k\rightarrow +\infty}{\lim}\left ( x_{k}, x_{k}^{\ast}\right )=\left ( x, x^{\ast}\right )$, 且$\underset{k\rightarrow +\infty}{\lim}f\left ( x_{k}\right )=f\left ( x\right )$, 则$x^{\ast }\in \partial f\left ( x\right )$, 即$\partial f\left ( x\right )$具有闭性.

(3 )若$x\in\mathbb{R}^{n}$为$f$的极小值点, 则$0\in \partial f\left ( x\right )$. 若$0\in \partial f\left ( x\right )$, 则称$x$为$f$的稳定点 (临界点), 函数$f$稳定点的集合记为 crit$f$.

(4) 若函数$f:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup \left \{+\infty\right \}$为正常下半连续函数,$g:\mathbb{R}^{n}\rightarrow\mathbb{R}$连续可微, 则对于任意$x\in$dom$f$, 有$\partial \left ( f+g\right )\left ( x\right )=\partial f\left ( x\right )+\bigtriangledown g\left ( x\right ).$

定义 2.5[31] 一个可微函数$f$称为凸函数, 若对任意的$x, y\in$dom$f$, 都有

$f\left ( y\right )\geqslant f\left ( x\right )+\left \langle \bigtriangledown f\left ( x\right ), y-x\right \rangle, $

一个可微函数$f$称为$\mu$-强凸 ($\mu > 0$), 若对任意的$x, y\in$dom$f$, 都有

$\begin{equation}f\left ( y\right )\geqslant f\left ( x\right )+\left \langle \bigtriangledown f\left ( x\right ), y-x\right \rangle+\frac{\mu }{2}\left \| y-x\right \|^{2}.\end{equation}$

令$f :\mathbb{R}^{n}\rightarrow \mathbb{R}\cup \left \{+\infty\right \}$为正常下半连续函数. 令$-\infty < \eta _{1}< \eta _{2}\leqslant +\infty$, 记$\left [ \eta _{1}< f < \eta _{2}\right ]=\left \{x\in \mathbb{R}^{n}:\eta _{1}< f\left ( x\right )< \eta _{2}\right \}$. 令$\eta \in \left ( 0, +\infty\right ]$, 令$\Psi _{\eta}$为满足下面三个条件的所有连续凹函数$\psi :\left [ 0, \eta\right ]\rightarrow \left [ 0, +\infty\right )$的集合:(1)$\psi \left ( 0\right )=0$;(2)$\psi$在$\left ( 0, \eta\right )$上是连续可微的, 并且$\psi$在 0 处也是连续的;(3) 对$\forall s\in \left ( 0, \eta\right ), {\psi }'\left ( s\right )> 0$.

定义 2.6[35] 令$f :\mathbb{R}^{n}\rightarrow \mathbb{R}\cup \left \{+\infty\right \}$为正常下半连续函数, 称函数$f$在$x^{\ast}\in \text{dom}\, \partial f$有 Kurdyka-Łojasiewicz 性质, 若存在$\eta \in \left ( 0, +\infty\right ]$,$x^{\ast}$的某邻域$U$及连续的凹函数$\psi \in \Psi _{\eta}$使得

$\begin{equation}{\psi}'\left ( f\left ( x\right )-f\left ( x^{\ast}\right )\right )\text{dist}\left ( 0, \partial f\left ( x\right )\right )\geqslant 1, \forall x\in U.\end{equation}$

若$f$在有效域 dom$\partial f$的每一点处都满足 KŁ 性质, 则称$f$为 KŁ 函数.

引理 2.1[35] 假设$\Omega$为紧集, 函数$f :\mathbb{R}^{n}\rightarrow \mathbb{R}\cup \left \{+\infty\right \}$为正常下半连续函数. 若函数$f$在$\Omega$上为一个常值并且在$\Omega$的每一点都满足 KŁ 性质, 则存在$\varepsilon > 0, \eta > 0,$和$\psi \in \Psi _{\eta}$, 使得对任意$x^{\ast}\in\Omega$及任意$x$属于如下交集

$\left \{x\in \mathbb{R}^{n}:\text{dist}\left ( x, \Omega \right )< \varepsilon\right \}\cap \left [ f\left ( x^{\ast}\right )< f\left ( x\right )<f\left ( x^{\ast}\right )+\eta \right ], $

有 (2.2) 式成立.

引理 2.2[35] 令正常下半连续函数$f$在$x^{\ast }\in \text{dom}\, \partial f$满足 KŁ 性质, 且相应的函数$\psi \in \Psi _{\eta}$可以选择为$\psi \left (s \right )=\bar{c}s^{1-\theta}$, 其中$\bar{c}> 0, \theta \in \left [ 0, 1\right )$. 如果存在$c> 0$,$x^{\ast}$的一个邻域$U$和$\eta \in \left ( 0, +\infty\right ]$使得

$\begin{equation}\left ( f\left ( x\right )-f\left ( x^{\ast }\right )\right )^{\theta }\leqslant c\, \text{dist}\left ( 0, \partial f\left ( x\right )\right ), \; \forall x\in U,\end{equation}$

成立, 则称$f$在$x^{\ast}$满足 KŁ 性质且指数为$\theta$. 这表明$\left ( f\left ( x\right )-f\left ( x^{\ast }\right )\right )^{\theta }/\text{dist}\left ( 0, \partial f\left ( x\right )\right )$在$x^{\ast}$的邻域有界.

在一些实际应用中, 很多函数都满足 KŁ 性质, 比如: 如果$f$是一个正常闭半代数函数, 则$f$是一个 KŁ 函数且指数$\theta \in \left [ 0, 1\right )$[32]; 函数$l\left ( Ax\right )$是一个 KŁ 函数, 其中$l$是二次可微强凸函数,$A\in \mathbb{R}^{m\times n}$; 凸分片线性二次函数;$\left \| x\right \|_{1}$;$\left \| x\right \|_{0}$; 实解析函数; 次解析函数; 强凸函数等都是 KŁ 函数 (见文献[29,30,32]).

Bregman 距离在各种优化算法中起着重要的作用. 作为平方欧氏距离的推广, Bregman 距离与欧氏距离有许多相似的优良性质. 然而, Bregman 距离不是度量, 因为它既不满足三角不等式, 也不满足对称性. 对一个可微凸函数$\phi$, 与它相关的 Bregman 距离的定义为

$\Delta _{\phi}\left ( x, y\right )=\phi \left ( x\right )-\phi \left ( y\right )-\left \langle \bigtriangledown \phi \left ( y\right ), x-y\right \rangle.$

特别的, 如果令上式中的$\phi \left ( x\right )=\left \| x\right \|^{2}$, 那么上式简化为$\left \| x-y\right \|^{2}$, 即是经典的欧氏距离. 更多的: 如果$\phi$是$\mu$-强凸, 由 (2.1) 式可得

$\begin{equation}\Delta _{\phi }\left ( x, y\right )\geqslant \frac{\mu }{2}\left \| x-y\right \|^{2}.\end{equation}$

关于 Bregman 距离的更多相关知识, 请阅读文献[33].

引理 2.3$w^{\ast }:=\left ( x_{1}^{\ast}, x_{2}^{\ast}, \cdots, x_{N}^{\ast}, p^{\ast}\right )$为问题 (1.1) 的增广拉格朗日函数$L_{\alpha} ( x_{1}, x_{2}, \cdots,$$x_{N}, p )$的一个稳定点, 即$0\in \partial L_{\alpha }\left ( w^{\ast}\right )$,当且仅当

$\left\{\begin{array}{l}A_{1}^{T}p^{\ast}\in -\partial f_{1}\left ( x_{1}^{\ast}\right )\\ \vdots\\A_{N-1}^{T}p^{\ast}\in -\partial f_{N-1}\left ( x_{N-1}^{\ast}\right )\\A_{N}^{T}p^{\ast}= -\bigtriangledown f_{N}\left ( x_{N}^{\ast}\right )\\A_{1}x_{1}^{\ast}+A_{2}x_{2}^{\ast}+\cdots +A_{N}x_{N}^{\ast}=0.\end{array}\right.$

3 多块 Bregman ADMM 的收敛率证明

多块 Bregman ADMM 的收敛率证明, 需要如下的假设

(A1) 效益函数$\hat{L}$满足 KŁ 性质;

(A2) 存在$\sigma > 0$, 使得对任意的$x\in \mathbb{R}^{m}$都有$\sigma \left \| x\right \|^{2}\leqslant \left \| A_{N}^{T}x\right \|^{2}$;

(A3)$f_{N}$是连续可微的,$\triangledown f_{N}$是$L$-利普希茨连续的;

(A4)$\phi _{i}$是$\mu _{i}$-强凸的,$\triangledown\phi _{i}$是$L_{i}$-利普希茨连续的, 其中$i=1, 2, \cdots, N$;

(A5)$\alpha \mu \sigma > 6\left ( L^{2}+2L_{N}^{2}\right )$, 其中$\mu =\min\left \{\mu _{1}, \mu _{2}, \cdots, \mu_{N} \right \},$

其中效益函数$\hat{L}:\mathbb{R}^{n_{1}}\times \mathbb{R}^{n_{2}}\times\cdots \times \mathbb{R}^{n_{N}}\times\mathbb{R}^{m}\times\mathbb{R}^{n_{N}} \rightarrow\mathbb{R}$的定义为:

$\begin{equation}\hat{L}\left ( x_{1}, x_{2}, \cdots x_{N}, p, \hat{x}_{N}\right )=L_{\alpha}\left ( x_{1}, x_{2}, \cdots x_{N}, p\right )+\frac{\tau }{2}\left \| x_{N}-\hat{x}_{N}\right \|^{2},\end{equation}$

上述定义中的$\tau =6L_{N}^{2}\left ( \alpha \sigma \right )^{-1}.$

若假设 (A1)-(A5) 成立, 由文献[29]对多块 Bregman ADMM 收敛性的证明可得, 下面的引理 3.1-3.4 成立

引理 3.1[29] 对任意的$k\in\mathbb{N}$, 存在$a> 0$使得$\hat{L}\left ( \hat{w}^{k+1}\right )\leqslant \hat{L}\left ( \hat{w}^{k}\right )-a\left \| \hat{w}^{k+1}- \hat{w}^{k}\right \|^{2}$.

引理 3.2[29] 若序列$\left \{w^{k}\right \}$有界, 则$\sum\limits_{k=1}^{\infty }\left \| w^{k}-w^{k+1}\right \|^{2}<\infty$. 更多的, 序列$\left \{w^{k}\right \}$是渐进正则的, 即当$k\rightarrow \infty$时,$\left \| w^{k}-w^{k+1}\right \|\rightarrow 0$. 序列$\left \{w^{k}\right \}$的任意一个聚点都是增广拉格朗日函数$L_{\alpha}$的一个稳定点.

引理 3.3[29] 存在$b>0$, 使得对任意的$k\in\mathbb{N}$都有:$\text{dist}\left ( 0, \partial \hat{L}\left ( \hat{w}^{k+1}\right )\right )\leqslant b\left \| \hat{w}^{k}-\hat{w}^{k+1}\right \|$.

引理 3.4[29] Bregman ADMM 算法 (1.3) 产生的每个有界序列$\left \{w^{k}\right \}$都收敛到增广拉格朗日函数$L_{\alpha}$的一个稳定点. 更多的,$\sum\limits_{k=1}^{\infty }\left \| w^{k+1}-w^{k}\right \|<\infty$.

文献[12,29]对非凸 Bregman ADMM 的收敛性证明中, 均需要假设由算法产生的迭代序列有界, 即引理 3.2 及引理 3.4 需假设迭代序列$\left \{w^{k}\right \}$有界, 且证明算法的收敛率也需要这一假设, 但遗憾的是文献[12,29]中并没有给出这一假设成立的条件. 这一假设在非凸分裂算法收敛性分析中普遍被采用, 如文献[12-15,25,27-29]. 下面给出一个充分条件保证迭代序列$\left \{w^{k}\right \}$有界.

引理 3.5 若假设 (A1)-(A5) 及下述条件 (1), (2) 成立, 则由算法多块 Bregman ADMM 产生的点列$\left \{w^{k}\right \}$有界.

(1)$f_{i}\left ( i=1, 2, \cdots, N-1\right )$均为强制函数, 即$\liminf\limits_{\left \| x_{i}\right \|\rightarrow +\infty} f_{i}\left ( x_{i}\right )=+\infty$;

(2) 函数$\bar{f_{N}}\left ( x_{N}\right ):=f_{N}\left ( x_{N}\right )-\frac{3}{\alpha \sigma }\left \| \triangledown f_{N}\left ( x_{N}\right )\right \|^{2}$有下界且是强制的, 即

$\inf\, \bar{f_{N}}\left ( x_{N}\right )> -\infty, \underset{\left \| x_{N}\right \|\rightarrow +\infty }{\lim\inf}\bar{f_{N}}\left ( x_{N}\right )=+\infty.$

由多块 Bregman ADMM 算法 (1.3) 的最优性必要条件有

$\triangledown f_{N}\left ( x_{N}^{k+1}\right )+A_{N}^{T}\left [ p^{k}+\alpha \left ( \sum\limits_{i=1}^{N}A_{i}x_{i}^{k+1}\right )\right ]+\triangledown \phi _{N}\left (x_{N}^{k+1} \right )-\triangledown \phi _{N}\left (x_{N}^{k} \right )=0, $

结合乘子的迭代式$p^{k+1}=p^{k}+\alpha \left ( \sum\limits_{i=1}^{n}A_{i}x_{i}^{k+1}\right )$可得

$\triangledown f_{N}\left ( x_{N}^{k+1}\right )+A_{N}^{T}p^{k+1}+\triangledown \phi _{N}\left (x_{N}^{k+1} \right )-\triangledown \phi _{N}\left (x_{N}^{k} \right )=0.$

从而

$\begin{align}\left \| A_{N}^{T}p^{k}\right \|^{2}&=\left \| \triangledown f_{N}\left ( x_{N}^{k}\right )+\triangledown \phi _{N}\left (x_{N}^{k} \right )-\triangledown \phi _{N}\left (x_{N}^{k-1} \right )\right \|^{2} \notag \\&\leqslant \left ( \left \| \triangledown f_{N}\left ( x_{N}^{k}\right )\right \|+L_{N}\left \| x_{N}^{k}-x_{N}^{k-1}\right \|\right )^{2} \notag \\&\leqslant 3\left ( \left \| \triangledown f_{N}\left ( x_{N}^{k}\right )\right \|^{2}+L_{N}^{2}\left \| x_{N}^{k}-x_{N}^{k-1}\right \|^{2}\right ). \notag\end{align}$

结合假设 (A2) 可得:$\sigma\left \|p^{k}\right \|^{2}\leqslant 6\left ( \left \| \triangledown f_{N}\left ( x_{N}^{k}\right )\right \|^{2}+L_{N}^{2}\left \| x_{N}^{k}-x_{N}^{k-1}\right \|^{2}\right ).$

因为$\alpha > 0, \sigma > 0,$所以

$\frac{1}{2\alpha}\left \|p^{k}\right \|^{2}\leqslant \frac{3}{\alpha \sigma}\left \| \triangledown f_{N}\left ( x_{N}^{k}\right )\right \|^{2}+\frac{\tau }{2}\left \| x_{N}^{k}-x_{N}^{k-1}\right \|^{2},$

其中$\tau =6L_{N}^{2}\left ( \alpha \sigma \right )^{-1},$在效益函数$\hat{L}$的定义中给出.

由效益函数$\hat{L}$的定义 (3.1) 式可得

$\begin{align}\hat{L}\left ( \hat{w}^{k}\right )&=\sum\limits_{i=1}^{N}f_{i}\left ( x_{i}^{k}\right )+\frac{\tau }{2}\left \| x_{N}^{k}-x_{N}^{k-1}\right \|^{2}+\sum\limits_{i=1}^{N}\left \langle p^{k}, A_{i}x_{i}^{k}\right \rangle+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}^{k}\right \|^{2} \notag \\&=\sum\limits_{i=1}^{N}f_{i}\left ( x_{i}^{k}\right )+\frac{\tau }{2}\left \| x_{N}^{k}-x_{N}^{k-1}\right \|^{2}-\frac{1}{2\alpha}\left \| p^{k}\right \|^{2}+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}^{k}+\frac{p^{k}}{\alpha}\right \|^{2} \notag \\&\geqslant\sum\limits_{i=1}^{N}f_{i}\left ( x_{i}^{k}\right )-\frac{3}{\alpha \sigma}\left \| \triangledown f_{N}\left ( x_{N}^{k}\right )\right \|^{2}+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}^{k}+\frac{p^{k}}{\alpha}\right \|^{2} \notag \\&=\sum\limits_{i=1}^{N-1}f_{i}\left ( x_{i}^{k}\right )+\bar{f_{N}}\left ( x_{N}^{k}\right )+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}^{k}+\frac{p^{k}}{\alpha}\right \|^{2}, \notag\end{align}$

又由引理 3.1 可知序列$\left \{\hat{L}\left ( \hat{w}^{k}\right )\right \}$单调递减, 结合上式可得

$\hat{L}\left ( \hat{w}^{1}\right )\geqslant \hat{L}\left ( \hat{w}^{k}\right )\geqslant \sum\limits_{i=1}^{N-1}f_{i}\left ( x_{i}^{k}\right )+\bar{f_{N}}\left ( x_{N}^{k}\right )+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}^{k}+\frac{p^{k}}{\alpha}\right \|^{2}.$

此外, 由$f_{i}$为正常下半连续的强制函数, 易知$\inf_{x_{i}}\, {f_{i}}\left ( x_{i}\right )> -\infty$$\left ( i=1, \cdots, N-1\right ).$因此有

$\begin{align}\left\{\begin{matrix} \sum\limits_{i=1}^{N-1}f_{i}\left ( x_{i}^{k}\right )+\bar{f_{N}}\left ( x_{N}^{k}\right )+\frac{\alpha }{2}\left \| \sum\limits_{i=1}^{N}A_{i}x_{i}^{k}+\frac{p^{k}}{\alpha}\right \|^{2}\leqslant \hat{L}\left ( \hat{w}^{1}\right )< +\infty, \\\notag\underset{x_{i}}{\inf}\, {f_{i}}\left ( x_{i}\right )> -\infty \left ( i=1, \cdots, N-1\right ), \underset{x_{N}}{\inf}\, \bar{f_{N}}\left ( x_{N}\right )> -\infty.\notag\end{matrix}\right.\end{align}$

由此不难依次推出,$\left \{x_{i}^{k}\right \}\, \left ( i=1, \cdots, N\right )$及$\left \{\sum\limits_{i=1}^{N}A_{i}x_{i}^{k}+\frac{p^{k}}{\alpha}\right \}$均有界, 从而$\left \{p^{k}\right \}$也有界. 因此,$\left \{w^{k}\right \}$的有界性得证. 证毕.

注 3.1 现在我们分析一些具体问题以验证引理 3.5 中的条件. 例如, 我们考虑$l_{\frac{1}{2}}$正则化问题[34]

$\min\left \{c\left \| x_{1}\right \|_{\frac{1}{2}}^{\frac{1}{2}}+\frac{1}{2}\left \| x_{2}\right \|^{2} \mid Dx_{1}-x_{2}-b=0\right \}, $

其中$c$是正则化参数,$\left \| x_{1}\right \|_{\frac{1}{2}}=\bigg ( \sum\limits_{j=1}^{n}\left | x_{1}^{j}\right |^{\frac{1}{2}}\bigg )^{2}.$

令$f_{1}\left ( x_{1}\right )=c\left \| x_{1}\right \|_{\frac{1}{2}}^{\frac{1}{2}}, f_{2}\left ( x_{2}\right )=\frac{1}{2}\left \| x_{2}\right \|^{2},$存在充分大的$\alpha > \frac{6}{\sigma},$有

$\bar{f_{2}}\left ( x_{2}\right ):=f_{2}\left ( x_{2}\right )-\frac{3}{\alpha \sigma}\left \| \triangledown f_{2}\left ( x_{2}\right )\right \|^{2}=\frac{1}{2}\left \| x_{2}\right \|^{2}-\frac{3}{\alpha \sigma}\left \| x_{2}\right \|^{2}=\frac{\alpha \sigma -6}{2\alpha \sigma}\left \| x_{2}\right \|^{2}> -\infty.$

结合$\liminf_{\left \| x_{1}\right \|\rightarrow +\infty}f_{1}\left ( x_{1}\right )=+\infty, \, \liminf_{\left \| x_{2}\right \|\rightarrow +\infty}f_{2}\left ( x_{2}\right )=+\infty,$从而引理 3.5 中的条件成立.

该节将证明基于由 (3.1) 式所定义的效益函数$\hat{L}$的 KŁ 性质的多块 Bregman ADMM 的收敛率. 令$w^{\ast }=\left ( x_{1}^{\ast}, x_{2}^{\ast}, \cdots, x_{N}^{\ast}, p^{\ast}\right )$是由算法 Bregman ADMM 产生的序列$\big \{w^{k}= ( x_{1}^{k}, x_{2}^{k}, \cdots,$$x_{N}^{k}, p^{k} )\big \}$的唯一极限点, 令

$\hat{w}^{k}=\left ( x_{1}^{k}, x_{2}^{k}, \cdots, x_{N}^{k}, p^{k}, x_{N}^{k-1}\right ), \hat{w}^{\ast}=\left ( x_{1}^{\ast}, x_{2}^{\ast}, \cdots, x_{N}^{\ast}, p^{\ast}, x_{N}^{\ast}\right ).$

由文献[29, 定理 1] 可知, 序列$\left \{\hat{L}_{k}\right \}$收敛, 设$\underset{k\rightarrow\infty}{\lim}\hat{L}_{k}=\hat{L}_{\ast}=\hat{L}\left ( \hat{w}^{\ast}\right )$. 设$e_{k}=\hat{L}_{k}-\hat{L}_{\ast}, k\geqslant1$, 则$e_{k}$是非负的, 且序列$\left \{e_{k}\right \}$单调递减收敛到 0.

引理 3.6 假设 (A1)-(A5) 成立,$\hat{L}$在$\hat{w}^{\ast}$满足 KŁ 性质, 且指数为$\theta$, 引理 3.1 中的$a> 0$, 则对任意的$k\geqslant k_{0}\geqslant 1$, 有

$\begin{equation}\bar{\alpha }e_{k}^{2\theta}\leqslant e_{k-1}-e_{k}, \bar{\alpha }=\frac{a}{b^{2}c^{2}}.\end{equation}$

固定$k\geqslant 2$, 则由引理 3.3 和引理 3.1 有

$\text{dist}^{2}\left ( 0, \partial \hat{L}\left ( \hat{w}^{k}\right )\right )\leqslant b^{2}\left \|\hat{w}^{k-1}-\hat{w}^{k} \right \|^{2}\leqslant \frac{b^{2}}{a}\left (\hat{L}_{k-1}-\hat{L}_{k} \right )=\frac{b^{2}}{a}\left (e_{k-1}-e_{k} \right ),$

令$\varepsilon >0$任意小, 因为$\hat{w}^{k}$收敛到$\hat{w}^{\ast}$, 所以存在一个正整数$k_{1}\geqslant 2$使得对任意的$k\geqslant k_{1}$有$\text{dist}\left (\hat{w}^{k}, \hat{w}^{\ast} \right )<\varepsilon$. 又因为$\hat{L}$在$\hat{w}^{\ast}$点满足 KŁ 性质且指数为$\theta$, 由引理 3.1 知序列$\left \{\hat{L}_{k}\right \}$单调递减, 且由文献[29]对 BADMM 算法收敛性的证明可知, 序列$\left \{\hat{L}_{k}\right \}$收敛到$\hat{L}_{\ast}$, 所以存在$\theta\in\left [0, 1\right ), c>0, k_{2}\geq 2$使得对$\forall k\geqslant k_{2}$都有

$\left (\hat{L}_{k}-\hat{L}_{\ast} \right )^{\theta}\leqslant c\,\text{dist}\left (0, \partial \hat{L}_{k} \right ), $

等价为:$e_{k}^{2\theta}\leqslant c^{2}\text{dist}^{2}\left (0, \partial \hat{L}_{k} \right )$, 结合 (3.3) 式有:$e_{k}^{2\theta}\leqslant c^{2}\cdot \frac{b^{2}}{a}\left (e_{k-1}-e_{k} \right )$, 即

$\bar{\alpha }e_{k}^{2\theta}\leqslant e_{k-1}-e_{k}, \bar{\alpha }=\frac{a}{b^{2}c^{2}}.$

定理 3.1 假设 (A1)-(A5) 成立,$\hat{L}$在$\hat{w}^{\ast}$满足 KŁ 性质且指数为$\theta$, 引理 3.1 中的$a> 0$. 则序列$\left \{e^{k}\right \}_{k\geqslant 1}$以下面的速率收敛到$0$.

(1) 如果$\theta =0$, 那么序列$\left \{e^{k}\right \}_{k\geqslant 1}$在有限次迭代后收敛到$0$;

(2) 如果$\theta \in \left (0, \frac{1}{2}\right ]$, 那么存在$\tau _{1} \in \left [ 0, 1\right ),$使得$\left \|e_{k} \right \|=O\left ( \tau _{1} ^{k}\right ),$即序列$\left \{e^{k}\right \}_{k\geqslant 1}$是线性收敛的;

(3) 如果$\theta \in \left ( \frac{1}{2}, 1\right )$, 那么$\left \| e^{k}\right \|=O\left ( k^{\frac{1 }{1-2\theta}}\right ),$即序列$\left \{e^{k}\right \}_{k\geqslant 1}$是次线性收敛的.

证 (1) 令$\theta =0$, 如果对$k\geqslant k_{0}$都有$e_{k}> 0$, 又由引理 3.6 可得:$\bar{\alpha}\leqslant e_{k-1}-e_{k}$, 令$k\rightarrow \infty$, 则有$0<\bar{\alpha }\leqslant 0$, 从而矛盾. 因此对所有的$k\geqslant k_{0}$, 都有$e_{k}=0$. 这表明一定会存在一个$\tilde{k}\leq k_{0}$, 使得对所有的$k\geq \tilde{k}$都有$e_{k}=0$, 即结论 (1) 得证.

(2) 令$\theta \in \left (0, \frac{1}{2}\right ]$, 序列$\left \{e^{k}\right \}_{k\geqslant 1}$是单调递减的, 且$2\theta -1\leqslant0$, 则由$\bar{\alpha}e_{k}^{2\theta}\leqslant e_{k-1}-e_{k}$有: 对所有的$k\geqslant k_{0}$,$\bar{\alpha }e_{k_{0}}^{2\theta-1}e_{k} \leqslant \bar{\alpha }e_{k}^{2\theta}\leqslant e_{k-1}-e_{k}$, 整理上式可得

$e_{k}\leqslant \frac{e_{k-1}}{\left ( 1+\bar{\alpha}e_{k_{0}}^{2\theta-1}\right )} \leqslant\cdots \leqslant \frac{e_{k_{0}}}{\left ( 1+\bar{\alpha}e_{k_{0}}^{2\theta-1}\right )^{k-k_{0}}}, $

$\begin{equation}e_{k}\leqslant e_{k_{0}}/{\left ( 1+\bar{\alpha}e_{k_{0}}^{2\theta-1}\right )^{k-k_{0}}}, k\geqslant k_{0}.\end{equation}$

所以存在$c_{1}> 0$及$\tau _{1} \in \left [ 0, 1\right ),$使得$\left \| e_{k}\right \|\leqslant c_{1}\tau _{1} ^{k},$从而$\left \|e_{k} \right \|=O\left ( \tau _{1} ^{k}\right ).$

(3) 令$\theta \in \left ( \frac{1}{2}, 1\right )$, 整理 (3.2) 式可得

$\begin{equation}\bar{\alpha }\leqslant\left ( e_{k-1}-e_{k}\right )e_{k}^{-2\theta}, \forall k\geqslant k_{0}.\end{equation}$

令$h:\Re _{+}\rightarrow\Re$, 定义为$h\left ( s\right )=s^{-2\theta}$对所有的$s\in \Re _{+}$. 因为${h}'\left ( s\right )=-2\theta s^{-\left ( 2\theta +1\right )}< 0$, 所以$h$是单调递减的, 即有$h\left ( e_{k-1}\right )\leqslant h\left ( e_{k}\right )$对所有的$k\geqslant 1$. (因为序列$\left \{e_{k}\right \})$单调递减).

考虑两种情况

情况一 令$r_{0}\in \left ( 1, +\infty\right )$使得$h\left ( e_{k}\right )\leqslant r_{0}h\left ( e_{k-1}\right ), \forall k\geqslant k_{0}$. 因此由 (3.5) 式可得

$\begin{align}\bar{\alpha }&\leqslant r_{0}\left ( e_{k-1}-e_{k}\right )h\left ( e_{k-1}\right )=r_{0}h\left ( e_{k-1}\right )\int_{e_{k}}^{e_{k-1}}1{\rm d}s\leqslant r_{0}\int_{e_{k}}^{e_{k-1}}h\left ( s\right ){\rm d}s \notag \\&=r_{0}\int_{e_{k}}^{e_{k-1}}s^{-2\theta}{\rm d}s=\frac{r_{0}}{1-2\theta}\left ( e_{k-1}^{1-2\theta}-e_{k}^{1-2\theta}\right ), \notag\end{align}$(3.6)

整理可得:$0< \frac{\bar{\alpha }\left ( 2\theta -1\right )}{r_{0}}\leqslant\left ( e_{k}^{1-2\theta}-e_{k-1}^{1-2\theta}\right )$,设$\hat{\mu }=\frac{\bar{\alpha}\left ( 2\theta -1\right )}{r_{0}}> 0, v=1-2\theta < 0$, 则可得

$\begin{equation}0<\hat{\mu }\leqslant \left ( e_{k}^{v}-e_{k-1}^{v}\right ), \forall k\geqslant k_{0}.\end{equation}$

情况二$h\left ( e_{k}\right )> r_{0}h\left ( e_{k-1}\right )$, 即有:$r_{0}^{-1}e_{k-1}^{2\theta }\geqslant e_{k}^{2\theta }$. 两边同时开$\frac{1}{2\theta}$次方, 设$q=r_{0}^{-\frac{1}{2\theta}}\in \left ( 0, 1\right )$, 则有$qe_{k-1}\geqslant e_{k}, \forall k\geqslant k_{0}$. 因为$v=1-2\theta < 0$, 则$q^{v}e_{k-1}^{v}\leqslant e_{k}^{v}$, 从而有

$\left ( q^{v}-1\right )e_{k-1}^{v}\leqslant e_{k}^{v}-e_{k-1}^{v}.$

又由$q^{v}-1> 0$, 且当$p\rightarrow \infty$时,$e_{p}\rightarrow 0^{+}$, 则存在$\bar{\mu}> 0$, 使得对$\forall k\geqslant k_{0}$, 都有$\left ( q^{v}-1\right )e_{k-1}^{v}> \bar{\mu}$.

因此有

$\begin{equation}0< \bar{\mu }\leqslant e_{k}^{v}-e_{k-1}^{v}, \forall k\geqslant k_{0}.\end{equation}$

选择$\mu =\min\left \{\hat{\mu}, \bar{\mu}\right \}> 0$, 结合 (3.6) 式和 (3.7) 式有

$0< \mu \leqslant e_{k}^{v}-e_{k-1}^{v}, \forall k\geqslant k_{0}.$

对上式从$k_{0}$到$k$求和, 有

$\mu \left ( k-k_{0}+1\right )+e_{k_{0}-1}^{v}\leqslant e_{k}^{v}.$

因此有

$e_{k}\leqslant\left ( \mu \left ( k-k_{0}+1\right )+e_{k_{0}-1}^{v}\right )^{\frac{1}{v}}, \left ( v=1-2\theta < 0\right ), $

$\begin{equation}e_{k}\leqslant\left ( \mu \left ( k-k_{0}+1\right )+e_{k_{0}-1}^{1-2\theta}\right )^{\frac{1}{1-2\theta}}, k\geqslant k_{0}.\end{equation}$

所以存在$c_{2}> 0,$使得$\left \| e^{k}\right \|\leqslant c_{2}k^{\frac{1}{1-2\theta}}.$从而$\left \| e^{k}\right \|=O\left ( k^{\frac{1 }{1-2\theta}}\right ).$证毕.

定理 3.2 假设 (A1)-(A5) 成立,$\hat{L}$在$\hat{w}^{\ast}$满足 KŁ 性质, 且$\psi \in \Psi _{\eta}$.

如果引理 3.1 中的$a>0$, 则存在$k_{0}\geqslant 2$, 使得对$\forall k\geqslant k_{0}$有

$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \hat{c}\max\left \{\psi \left ( e_{k-1}\right ), \sqrt{e_{k-1}}\right \}, \hat{c}=\max\left \{\hat{\delta }, \frac{\tilde{\delta }}{\sqrt{a}}\right \}.$

进一步, 如果$\psi :\left [ 0, \eta\right )\rightarrow \left [ 0, +\infty\right )$定义为$\psi \left ( s\right )=s^{1-\theta}$, 其中$\theta \in \left [ 0, 1 \right )$, 那么下面的渐近序列收敛率可以得到

(i) 如果$\theta =0$, 则序列$\left \{w^{k}\right \}$经过有限次迭代后收敛到$w^{\ast}$;

(ii) 如果$\theta \in \left ( 0, \frac{1}{2}\right ]$, 那么存在$\tau _{2} \in \left [ 0, 1\right ),$使得$\left \| \left | w^{k}-w^{\ast}\right |\right \|=O\left ( \tau _{2} ^{k}\right ),$即 Bregman ADMM 是线性收敛的;

(iii) 如果$\theta \in \left ( \frac{1}{2}, 1\right )$, 那么$\left \| \left | w^{k}-w^{\ast}\right |\right \|=O\left ( k^{\frac{1-\theta }{1-2\theta}}\right ),$即 Bregman ADMM 是次线性收敛的.

固定$k\geqslant 1$,$a> 0$, 由引理 3.1 知序列$\left \{\hat{L}_{k}\right \}_{k\geqslant 1}$是单调递减的, 且序列$\left \{e_{k}\right \}_{k\geqslant 1}$也是单调递减的.

因为由引理 3.1 可得

$\begin{equation}\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|^{2}\leqslant \frac{1}{a}\left ( \hat{L}_{k}-\hat{L}_{k+1}\right )=\frac{1}{a}\left ( e_{k}-e_{k+1}\right ),\end{equation}$

令$\varepsilon > 0, \eta > 0$, 给定$\psi \in \Psi _{n}$, 又因为当$k\rightarrow \infty$时,$\hat{L}_{k}\rightarrow \hat{L}_{\ast}$, 且$\underset{k\rightarrow \infty}{\lim}\hat{w}^{k}=\hat{w}^{\ast}$, 所以存在一个$k_{0}\in \mathbb{Z}_{+}$, 使得对所有的$k\geqslant k_{0}$, 有

$\text{dist}\left (\hat{w}^{k}, \hat{w}^{\ast}\right )< \varepsilon, \hat{L}_{\ast}< \hat{L}_{k}< \hat{L}_{\ast}+\eta.$

因为$\hat{L}$在$\hat{w}^{\ast}$满足 KŁ 性质, 所以有

$\begin{equation}{\psi}'\left ( e_{k}\right )\cdot \text{dist}\left ( 0, \partial \hat{L}_{k}\right )\geqslant 1,\end{equation}$

结合 (3.9) 式和 (3.10) 式, 有

$\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|^{2}\leqslant\frac{1}{a}\left ( e_{k}-e_{k+1}\right ){\psi}'\left ( e_{k}\right )\cdot \text{dist}\left ( 0, \partial \hat{L}_{k}\right ), $

因为$\psi \in \Psi _{n}$是一个凹函数, 所以

$\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|^{2}\leqslant\frac{1}{a}\left ( \psi \left ( e_{k}\right )-\psi \left ( e_{k+1}\right )\right )\cdot \text{dist}\left ( 0, \partial \hat{L}_{k}\right ).$

对任意的$\gamma > 0$, 由算术-几何平均不等式可得

$\begin{align*}\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|&\leqslant\left [ \frac{1}{a}\left ( \psi \left ( e_{k}\right )-\psi \left ( e_{k+1}\right )\right )\cdot \text{dist}\left ( 0, \partial \hat{L}_{k}\right )\right ]^{\frac{1}{2}}\\ &\leqslant \frac{\gamma }{2a}\left ( \psi \left ( e_{k}\right )-\psi \left ( e_{k+1}\right )\right )+\frac{1}{2\gamma }\text{dist}\left ( 0, \partial \hat{L}_{k}\right), \end{align*}$

则由引理 3.3 可得

$\begin{equation}\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \| \leqslant \frac{\gamma }{2a}\left ( \psi \left ( e_{k}\right )-\psi \left ( e_{k+1}\right )\right )+\frac{b}{2\gamma }\left \| \hat{w}^{k}-\hat{w}^{k-1}\right \|,\end{equation}$

对任意的$K\geqslant k\geqslant k_{0}$, 有下面的等式恒成立

$\begin{equation}\sum\limits_{k=k_{0}}^{K}\left \| \hat{w}^{k}-\hat{w}^{k-1}\right \|=\sum\limits_{k=k_{0}}^{K}\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|+\left \| \hat{w}^{k_{0}}-\hat{w}^{k_{0}-1}\right \|-\left \| \hat{w}^{K+1}-\hat{w}^{K}\right \|.\end{equation}$

令$\gamma > 0$足够大使得$\delta =1-\frac{b}{2\gamma }> 0$, 对 (3.11) 式从$k_{0}$到$K$求和, 并用上恒等式 (3.12), 可得

$\sum\limits_{k=k_{0}}^{K}\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|\leqslant \hat{\delta }\psi \left ( e_{k_{0}}\right )+\tilde{\delta }\left \|\hat{w}^{k_{0}}-\hat{w}^{k_{0}-1} \right \|, $

其中$\hat{\delta }=\frac{\gamma}{2a\delta }, \tilde{\delta }=\frac{b}{2\gamma \delta}$, 令$K\rightarrow \infty$, 则有

$\begin{equation} \sum\limits_{k=k_{0}}^{\infty}\left \| \hat{w}^{k+1}-\hat{w}^{k}\right \|\leqslant \hat{\delta }\psi \left ( e_{k_{0}}\right )+\tilde{\delta }\left \|\hat{w}^{k_{0}}-\hat{w}^{k_{0}-1} \right \|,\end{equation}$

由三角不等式, 结合 (3.13) 式, 对$\forall k\geqslant k_{0}$, 有

$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \sum\limits_{p=k}^{\infty }\left \| \left | w^{p+1}-w^{p}\right |\right \|\leqslant\sum\limits_{p=k}^{\infty }\left \| \left | \hat{w}^{p+1}- \hat{w}^{p}\right |\right \| \leqslant \hat{\delta}\psi \left ( e_{k}\right )+\tilde{\delta }\left \|\hat{w}^{k}-\hat{w}^{k-1} \right \|, $

由 (3.9) 式可得

$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \hat{\delta }\psi \left ( e_{k}\right )+\frac{\tilde{\delta }}{\sqrt{a}}\sqrt{e_{k-1}}\leqslant \hat{c} \max\left \{\psi \left ( e_{k}\right ), \sqrt{e_{k-1}}\right \}, $

其中$\hat{c}=\max\left \{\hat{\delta }, \frac{\tilde{\delta }}{\sqrt{a}}\right \}$. 因为${\psi}'\left ( s\right )> 0$, 序列$\left \{e_{k}\right \}_{k\geqslant 1}$是单调递减的, 所以有

$\begin{equation}\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \hat{c}\, \max\left \{\psi \left ( e_{k-1}\right ), \sqrt{e_{k-1}}\right \}, \forall k\geqslant k_{0}.\end{equation}$

令$\psi \left ( s\right )=s^{1-\theta}$, 其中$\theta \in \left [ 0, 1\right )$, 则

$\begin{equation}\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \hat{c}\, \max\left \{e_{k-1}^{1-\theta}, \sqrt{e_{k-1}}\right \}.\end{equation}$

因为${\psi }'\left ( s\right )=\left ( 1-\theta \right )s^{-\theta}$, 且$\hat{L}$在$\hat{w}^{\ast}$满足 KŁ 性质, 则有

${\psi}'\left ( e_{k}\right )\cdot \text{dist}\left ( 0, \partial \hat{L}_{k}\right )\geqslant 1, $

即$\left ( 1-\theta\right )\left ( e_{k}\right )^{-\theta }\cdot \text{dist}\left ( 0, \partial \hat{L}_{k}\right )\geqslant 1$,从而$e_{k} ^{\theta }\leqslant \left ( 1-\theta\right )\text{dist}\left ( 0, \partial \hat{L}_{k}\right ), \forall k\geqslant k_{0}$,因此$e_{k}^{2\theta}\leqslant \left ( 1-\theta\right )^{2}\, \text{dist}^{2}\left ( 0, \partial \hat{L}_{k}\right ).$

又由引理 3.3 和引理 3.1 可得

$e_{k}^{2\theta}\leqslant \left ( 1-\theta\right )^{2}\cdot b^{2}\left \| \left | \hat{w}^{k}-\hat{w}^{k-1}\right |\right \|^{2}\leqslant \frac{b^{2}\left ( 1-\theta\right )^{2}}{a}\left ( e_{k-1}-e_{k}\right ), $

即$\hat{\alpha }e_{k}^{2\theta }\leqslant e_{k-1}-e_{k}$, 其中$\hat{\alpha }=\frac{a}{b^{2}\left ( 1-\theta \right )^{2}}.$

(i) 令$\theta =0$, 又由定理 3.1(i) 可得序列$\left \{e_{k}\right \}_{k\geqslant 1}$经过有限次迭代后收敛到 0, 因此, 由 (3.15) 式可得序列$\left \{w^{k}\right \}$在经过有限次迭代后收敛到 0;

(ii) 令$\theta \in \left ( 0, \frac{1}{2}\right ]$, 由于$\max\left \{e_{k-1}^{1-\theta}, \sqrt{e_{k-1}}\right \}=\sqrt{e_{k-1}}$, 则由 (3.4) 式可得

$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \hat{c}\sqrt{e_{k_{0}}}/\left ( \sqrt{1+\bar{\alpha }e_{k_{0}}^{2\theta -1}}\right )^{k-k_{0}-1}, \forall k\geqslant k_{0},$

所以存在$c_{3}> 0$及$\tau _{2} \in \left [ 0, 1\right )$, 使得

$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant c_{3}\tau _{2} ^{k}.$从而$\left \| \left | w^{k}-w^{\ast}\right |\right \|=O\left ( \tau _{2} ^{k}\right );$

(iii) 令$\theta \in \left ( \frac{1}{2}, 1\right )$, 由于$\max\left \{e_{k-1}^{1-\theta}, \sqrt{e_{k-1}}\right \}=e_{k-1}^{1-\theta}$, 则由 (3.8) 式, 可得

$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant \hat{c}\left ( \mu \left (k-k_{0} \right )+e_{k_{0}-1}^{1-2\theta}\right )^{-\frac{1-\theta}{2\theta -1}}, \forall k\geqslant k_{0}.$

所以存在$c_{4}> 0$使得$\left \| \left | w^{k}-w^{\ast}\right |\right \|\leqslant c_{4}k^{\frac{1-\theta }{1-2\theta}}.$从而$\left \| \left | w^{k}-w^{\ast}\right |\right \|=O\left (k^{\frac{1-\theta }{1-2\theta}} \right ).$证毕.

4 结论与展望

本文考虑了一类多块带 Bregman 距离的交替方向乘子 (Bregman ADMM) 来求解一类带线性约束的可分非凸优化问题, 在效益函数满足 KŁ 性质下, 给出了多块 Bregman ADMM 的函数和序列的收敛速度, 并给出了保证算法产生点列有界的充分条件, 本文还发现收敛速率与效益函数的 KŁ 指数直接相关.

实际上, 在本文基础上还有许多问题值得进一步讨论, 比如: 开发一些微积分规则, 用于在非凸环境下推导效益函数的 KŁ 指数, 这将允许我们确定 Bregman ADMM 的收敛速度, 证明随机多块 Bregman ADMM 的收敛速率及其证明出多块 Bregman ADMM 的迭代复杂性.

参考文献

Boyd S, Parikh N, Chu E, et al.

Distributed optimization and statistical learning via the alternating direction method of multipliers

Found Trends Mach Learn, 2011, 3: 1-122

DOI:10.1561/2200000016      URL     [本文引用: 1]

Glowinski R, Marroco A.

Sur l'approximation, paréléments finis d'ordre un, et la résolution, par pénalisation-dualitéd'une classe de problèmes de Dirichlet non linéaires

Journal of Equine Veterinary Science, 1975, 2: 41-76

[本文引用: 1]

Gabay D, Mercier B.

A dual algorithm for the solution of nonlinear variational problems via finite element approximation

Computers and Mathematics with Applications, 1976, 2: 17-40

DOI:10.1016/0898-1221(76)90003-1      URL     [本文引用: 1]

Bertsekas D, Tsitsiklis J.

Numerical Methods for Nonlinear Variational Problems

Berlin: Springer Series in Computational Physics, Springer, 1984

[本文引用: 1]

何炳生.

乘子交替方向法的一些收敛性质

高等学校计算数学学报, 2017, 39: 81-96

[本文引用: 2]

He B S.

Some convergence properties of the alternating direction method of multipliers

Numerical Mathematics A Journal of Chinese Universities, 2017, 39: 81-96

[本文引用: 2]

Yang W H, Han D R.

Linear Convergence of the alternating direction method of multipliers for a class of convex optimization problems

SIAM Journal on Numerical Analysis, 2016, 54: 625-640

DOI:10.1137/140974237      URL     [本文引用: 1]

He B S, Ma F, Yuan X M.

Convergence study on the symmetric version of ADMM with larger step sizes

SIAM Journal on Imaging Sciences, 2016, 9: 1467-1501

DOI:10.1137/15M1044448      URL     [本文引用: 2]

Fazel M, Pong T K, Sun D F, et al.

Hankel matrix rank minimization with applications to system identification and realization

SIAM Journal on Matrix Analysis and Applications, 2013, 34: 946-977

DOI:10.1137/110853996      URL     [本文引用: 2]

Li M, Sun D F, Toh K C.

A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization

SIAM Journal on Optimization, 2016, 26: 922-950

DOI:10.1137/140999025      URL     [本文引用: 2]

Zhang L, Wu J, Zhang L W.

A linearly convergent majorized ADMM with indefinite proximal terms for convex composite programming and its applications

Mathematics of Computation, 2020, 89(324): 1867-1894

DOI:10.1090/mcom/2020-89-324      URL     [本文引用: 2]

Wang H H, Banerjee A.

Bregman alternating direction method of multipliers

Arxiv: 1306.3203v3, 2014

[本文引用: 1]

Wang F, Xu Z B, Xu H K.

Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems

https://arXiv preprint arXiv: 1410.8625, 2014

[本文引用: 5]

Guo K, Han D R, Wu T T.

Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints

International Journal of Computer Mathematics, 2017, 94: 1653-1669

DOI:10.1080/00207160.2016.1227432      URL     [本文引用: 3]

Wu Z M, Li M, Wang D, et al.

A symmetric alternating direction method of multipliers for separable nonconvex minimization problems

Asia Pacific Journal of Operation Research, 2017, 34(06): 1750030

DOI:10.1142/S0217595917500300      URL     [本文引用: 2]

In this paper, we propose a symmetric alternating method of multipliers for minimizing the sum of two nonconvex functions with linear constraints, which contains the classic alternating direction method of multipliers in the algorithm framework. Based on the powerful Kurdyka–Łojasiewicz property, and under some assumptions about the penalty parameter and objective function, we prove that each bounded sequence generated by the proposed method globally converges to a critical point of the augmented Lagrangian function associated with the given problem. Moreover, we report some preliminary numerical results on solving [Formula: see text] regularized sparsity optimization and nonconvex feasibility problems to indicate the feasibility and effectiveness of the proposed method.

Jian J B, Zhang Y, Chao M T.

A regularized alternating direction method of multiplier for a class of nonconvex problems

Journal of Inequalities and Applications, 2019, 2019: 1-16

DOI:10.1186/s13660-019-1955-4      [本文引用: 2]

Jia Z H, Gao X, Cai X J, et al.

Local linear convergence of the alternating direction method of multipliers for nonconvex separable optimization problems

Journal of Optimization Theory and Applications, 2021, 188: 1-25

DOI:10.1007/s10957-020-01782-y      [本文引用: 2]

Chen C H, He B S, Ye Y Y, et al.

The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent

Mathematical Programming, 2016, 155: 57-79

DOI:10.1007/s10107-014-0826-5      URL     [本文引用: 1]

Han D R, Yuan X M.

A note on the alternating direction method of multipliers

Journal of Optimization Theory and Applications, 2012, 155: 227-238

DOI:10.1007/s10957-012-0003-z      URL     [本文引用: 1]

Lin T Y, Ma S Q, Zhang S Z.

Global convergence of unmodified 3-block ADMM for a class of convex minimization problems

Journal of Scientific Computing, 2018, 76: 69-88

DOI:10.1007/s10915-017-0612-7      URL     [本文引用: 1]

He B S.

Parallel splitting augmented lagrangian methods for monotone structured variational inequalities

Computational Optimization and Applications, 2009, 42: 195-212

DOI:10.1007/s10589-007-9109-x      URL     [本文引用: 1]

He B S, Tao M, Yuan X M.

Alternating direction method with Gaussian back substitution for separable convex programming

SIAM Journal on Optimization, 2012, 22: 313-340

DOI:10.1137/110822347      URL     [本文引用: 1]

Chen L, Li X D, Sun D F, et al.

On the equivalence of inexact proximal ALM and ADMM for a class of convex composite programming

Mathematical Programming, 2021, 185: 111-161

DOI:10.1007/s10107-019-01423-x      [本文引用: 1]

Deng W, Lai M J, Peng Z M, et al.

Parallel multi-block ADMM with$o(1/k)$convergence

Journal of Scientific Computing, 2017, 71 : 712-736

DOI:10.1007/s10915-016-0318-2      [本文引用: 1]

Hong M Y, Luo Z Q, Razaviyayn M.

Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems

SIAM Journal on Optimization, 2016, 26: 337-364

DOI:10.1137/140990309      URL     [本文引用: 1]

Guo K, Han D R, David W W, et al.

Convergence of ADMM for multi-block nonconvex separable optimization mobels

Front. Math. China, 2017, 12: 1139-1162

DOI:10.1007/s11464-017-0631-6      URL     [本文引用: 2]

Jiang B, Lin T Y, Ma S Q, et al.

Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis

Computational Optimization and Applications, 2019, 72: 115-157

DOI:10.1007/s10589-018-0034-y      [本文引用: 1]

Nonconvex and nonsmooth optimization problems are frequently encountered in much of statistics, business, science and engineering, but they are not yet widely recognized as a technology in the sense of scalability. A reason for this relatively low degree of popularity is the lack of a well developed system of theory and algorithms to support the applications, as is the case for its convex counterpart. This paper aims to take one step in the direction of disciplined nonconvex and nonsmooth optimization. In particular, we consider in this paper some constrained nonconvex optimization models in block decision variables, with or without coupled affine constraints. In the absence of coupled constraints, we show a sublinear rate of convergence to an E-stationary solution in the form of variational inequality for a generalized conditional gradient method, where the convergence rate is dependent on the Holderian continuity of the gradient of the smooth part of the objective. For the model with coupled affine constraints, we introduce corresponding E-stationarity conditions, and apply two proximal-type variants of the ADMM to solve such a model, assuming the proximal ADMM updates can be implemented for all the block variables except for the last block, for which either a gradient step or a majorization-minimization step is implemented. We show an iteration complexity bound of O(1/E2) to reach an E-stationary solution for both algorithms. Moreover, we show that the same iteration complexity of a proximal BCD method follows immediately. Numerical results are provided to illustrate the efficacy of the proposed algorithms for tensor robust PCA and tensor sparse PCA problems.

Yashtinin M.

Multi-block nonconvex nonsmooth proximal ADMM: Convergence and rates under Kurdyka-Łojasiewicz property

Journal of Optimization Theory and Applications, 2021, 190: 966-998

DOI:10.1007/s10957-021-01919-7      [本文引用: 2]

简金宝, 刘鹏杰, 江羡珍.

非凸多分块优化部分对称正则化交替方向乘子法

数学学报, 2021, 64A(6): 1005-1026

[本文引用: 2]

Jian J B, Liu P J, Jiang X Z.

A partially symmetric regularized alternating direction method of multipliers for nonconvex Multi-block optimization

Acta Mathematica Sinica (Chinese Series), 2021, 64A(6): 1005-1026

[本文引用: 2]

Wang F H, Cao W F, Xu Z B.

Convergence of multi-block Bregman ADMM for nonconvex problems

Science China Information Sciences, 2018, 61: 1-12

[本文引用: 13]

Frankel P, Garrigos G, Peypouquet J.

Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates

Journal of Optimization Theory and Applications, 2015, 165: 874-900

DOI:10.1007/s10957-014-0642-3      URL     [本文引用: 2]

Rockafellar R T, Wets R J B. Variational Analysis. Berlin: Springer Science and Business Media, 2009

[本文引用: 5]

Attouch H, Bolte J, Svaiter B F.

Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods

Mathematical Programming, 2013, 137: 91-129

DOI:10.1007/s10107-011-0484-9      URL     [本文引用: 2]

Si S, Tao D C, Geng B.

Bregman divergence-based regularization for transfer subspace learning

IEEE Transactions on Knowledge and Data Engineering, 2010, 22: 929-942

DOI:10.1109/TKDE.2009.126      URL     [本文引用: 1]

Xu Z B, Chang X Y, Xu F M, et al.

$L_{1/2}$regularization: a thresholding representation theory and a fast solver

IEEE Transactions on Neural Networks and Learning systems, 2012, 23: 1013-1027

DOI:10.1109/TNNLS.2012.2197412      URL     [本文引用: 1]

Bolte J, Daniilidis A, Lewis A.

$L_{1/2}$The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems

SIAM Journal on Optimization, 2007, 17: 1205-1223

DOI:10.1137/050644641      URL     [本文引用: 3]

/