## Convergence Analysis of Bregman ADMM for Three-Block Nonconvex Indivisible Optimization Problems with Linearization Technique

Liu Fuqin, Peng Jianwen,*, Luo Honglin

School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331

 基金资助: 国家自然科学基金重大项目.  11991024国家自然科学基金面上项目.  12271071重庆英才$\cdot$ 创新创业领军人才$\cdot$ 创新创业示范团队项目.  CQYC20210309536重庆英才计划包干制项目.  cstc2022ycjh-bgzxm0147重庆市高校创新研究群体项目.  CXQT20014重庆市自然科学基金项目.  cstc2021jcyj-msxmX0300

 Fund supported: The NSFC.  11991024The NSFC.  12271071Team Project of Innovation Leading Talent in Chongqing.  CQYC20210309536Contract System Project of Chongqing Talent Plan.  cstc2022ycjh-bgzxm0147Chongqing University Innovation Research Group Project.  CXQT20014Chongqing Natural Science Foundation Project.  cstc2021jcyj-msxmX0300

Abstract

Alternating direction multiplier method is an effective method to solve two separable convex optimization problems, but the convergence of alternating direction multiplier method may not be guaranteed for three nonseparable nonconvex optimization problems. This paper mainly studies the convergence analysis of the linearized generalized Bregman alternating direction multiplier method (L-G-BADMM) for solving the nonconvex minimization problem whose objective function is three indivisible blocks. Under appropriate assumptions, we solve the subproblem of the algorithm and construct a benefit function satisfying Kurdyka-Lojasiewicz property. The convergence of the algorithm can be obtained through theoretical proof.

Keywords： Bregman divergence ; ADMM ; Kurdyka-Lojasiewicz property ; Linearization

Liu Fuqin, Peng Jianwen, Luo Honglin. Convergence Analysis of Bregman ADMM for Three-Block Nonconvex Indivisible Optimization Problems with Linearization Technique. Acta Mathematica Scientia[J], 2023, 43(1): 291-304 doi:

## 1 引言

$$$\min \limits_{x \in X,y \in Y} f(x) + g(y)~~{\rm s.t.~}Ax + By = c,$$$

$\begin{eqnarray*} {L_\beta }({x_1},{x_2},y,\lambda ) &=& {f_1}({x_1}) + {f_2}({x_2}) + g({x_1},{x_2},y) + \langle \lambda,{A_1}{x_1} + {A_2}{x_2} + By - c \rangle \\ &&+ \frac{\beta }{2}\left\| {{A_1}{x_1} + {A_2}{x_2} + By - c} \right\|_2^2. \end{eqnarray*}$

$\begin{eqnarray*}{\hat L_\beta }({x_1},{x_2},y,\lambda )& =& {f_1}({x_1}) + {f_2}({x_2}) + g({x_1},{x_2},y) + \langle \lambda,{A_1}{x_1} + {A_2}{x_2} + By - c \rangle \\ && + \beta {\Delta _\psi }({A_1}{x_1} + {A_2}{x_2} + By,c). \end{eqnarray*}$

$$$\left\{ \begin{array}{ll} x_1^{k + 1} =& \mathop {\arg \min }\limits_{{x_1} \in {X_1}} {f_1}({x_1}) + {({x_1} - x_1^k)^{\rm T}}{\nabla _{{x_1}}}g(x_1^k,x_2^k,{y^k}) + \langle{\lambda ^k},{A_1}{x_1} + {A_2}x_2^k + B{y^k} - c \rangle\\ & + \beta {\Delta _\psi }({A_1}{x_1} + {A_2}x_2^k + B{y^k},c) + {\Delta _{{\varphi _1}}}({x_1},x_1^k), \\ x_2^{k + 1} =&\mathop {\arg \min }\limits_{{x_2} \in {X_2}} {f_2}({x_2}) + {({x_2} - x_2^k)^{\rm T}}{\nabla _{{x_2}}}g(x_1^k,x_2^k,{y^k}) + \langle {\lambda ^k},{A_1}x_1^{k + 1} + {A_2}x_2^{} + B{y^k} - c \rangle \\ & + \beta {\Delta _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{} + B{y^k},c) + {\Delta _{{\varphi _2}}}({x_2},x_2^k) \\ {y^{k + 1}} =&\mathop {\arg \min }\limits_{y \in Y} {(y - {y^k})^{\rm T}}{\nabla _y}g(x_1^k,x_2^k,{y^k}) + \langle{\lambda ^k},{A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + By - c \rangle \\ & + \beta {\Delta _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + By,c) + {\Delta _\phi }(y,{y^k}) \\ {\lambda ^{k + 1}}= & {\lambda ^k} + \beta ({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}} - c). \end{array} \right.$$$

## 2 预备知识

(1)给定 $x \in {\rm dom}f$, $f$$x 处的 Frechet 次微分记做 \hat \partial f(x), 对所有的 \mu \in {\Bbb R^n} 都满足 \hat \partial f(x) =\bigg \{ \mu \in {\Bbb R^n}:\frac{{f(y) - f(x) - \langle \mu,y - x\rangle }}{{\parallel y - x\parallel }} \ge 0\bigg\}. (2)如果 \partial f(x) = \{ \mu \in {\Bbb R^n}:\exists {x^k} \to x,f({x^k}) \to f(x),{\mu ^k} \in \hat \partial f(x) \to \mu,k \to \infty \}, 则称 f$$x$ 处的极限次微分或者简单次微分, 记做 $\partial f(x)$.

(3)如果 ${x_0}$称为函数$f$的稳定点或者临界点, 则 $0 \in \partial {f(x_0)}$.

$\left\{ \begin{array}{ll} A{x^*} = B{x^*},\\ - \nabla f({x^*}) = {A^T}{\lambda ^*},\\ {A^T}{\lambda ^*} \in \partial g({y^*}). \end{array} \right.$

$x \in O({x_0},\delta ) \cap \left\{ {x:f({x_0}) < f(x) < f({x_0}) + \eta } \right\},$

$\varphi '(f(x) - f({x_0})){\rm dist}(0,\partial f(x)) \ge 1,$

(b) $\varphi $$\left[ {0,\eta } \right] 上光滑凹, (c) \varphi (0) = 0, \varphi '(x) > 0,\forall x \in (0,\eta ). 引理2.1[6] (一致K-L性质) 设 \Omega 是紧集, f:{\Bbb R^n} \to \Bbb R\cup \left\{ { + \infty } \right\}是真且下半连续函数, 若函数f 在集合上取常数f({x_0}), 并在\Omega 中任意一点满足K-L性质, 则存在\eta > 0, \delta > 0, \varphi \in {A_n}, 使得对所有 x \in O({x_0},\delta ) \cap \left\{ {x:f({x_0}) < f(x) < f({x_0}) + \eta } \right\}, 我们有 \varphi '(f(x) - f({x_0}))(0,\partial f(x)) \ge 1. 定义2.4[4] 对于一个凸可微的函数\Phi , 定义Bregman散度如下 {\Delta _\Phi }(x,y) = \Phi (x) - \Phi (y) - \left\langle {\nabla \Phi (y),x - y} \right\rangle. 特别地, 如果\Phi (x) = \left\| x \right\|_Q^2 = {x^{\rm T}}Qx, 则有{\Delta _\Phi }(x,y) = \left\| {x - y} \right\|_Q^2, 其中Q是对称半正定矩阵. 定义2.5 (强凸性) 设f:{\Bbb R^n} \to \Bbb R\cup \left\{ { + \infty } \right\}是真函数, 如果存在\delta > 0, 使得 f(\lambda x + (1 - \lambda )y) \le \lambda f(x) + (1 - \lambda )f(y) - \frac{{\delta \lambda (1 - \lambda )}}{2}\left\| {x - y} \right\|_{}^2,\forall \lambda \in (0,1). 引理2.2[1]{\Delta _\phi }是可微的凸函数, 则有 (i) (非负性) {\Delta _\Phi }(x,y) \ge 0,{\Delta _\Phi }(x,x) = 0. (ii) (凸性) {\Delta _\Phi }(x,y)$$x$具有凸性.

(iii) (强凸性) $\Phi$如果是一个强凸函数, 强凸系数为$\delta$, 则${\Delta _\Phi }(x,y) \ge \frac{\delta }{2}{\left\| {x - y} \right\|^2}$.

$\left| {g(y) - g(x) - \left\langle {\nabla g(x),y - x} \right\rangle } \right| \le \frac{{{l_g}}}{2}{\left\| {y - x} \right\|^2}.$

(ii) 存在$\mu > 0$, 使得${B^{\rm T}}B \succ \frac{1}{\mu }I$;

(iii) ${\nabla _\psi }, {\nabla _{{\varphi _1}}},{\nabla _{{\varphi _2}}},{\nabla _\phi }$满足利普希茨连续, 且利普希茨系数分别为${l_\psi },{l_{{\varphi _1}}},{l_{{\varphi _2}}},{l_\phi };$

(iv) $\psi, {\varphi _1}, {\varphi _2},\phi$是强凸函数, 所对应的强凸系数分别为${\delta _\psi }, {\delta _{{\varphi _1}}},{ \delta _{{\varphi _2}}}, {\delta _\phi }$, 且${\delta _{{\varphi _1}}},{\delta _{{\varphi _2}}},{\delta _\phi }$均大于${l_g}$;

(v) ${A_1},{A_2}$均为正交矩阵, 即${A_1}^{\rm T}{A_1} = I,{A_2}^{\rm T}{A_2} = I;$

(vi)

$\frac{{{\delta _{{\varphi _1}}} - l_g^{}}}{2} > \frac{{4l_g^2\mu }}{\beta } - 4l_\psi ^2{\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu,\frac{{{\delta _{{\varphi _2}}} - l_g^{}}}{2} > \frac{{4l_g^2\mu }}{\beta } - 4l_\psi ^2{\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu,$
$\frac{{{\delta _\phi } - l_g^{}}}{2} > \frac{{4l_g^2\mu + 8l_\phi ^2\mu }}{\beta } - 4l_\psi ^2{\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu.$

## 3 收敛性分析

$\begin{matrix} {\left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|^2} & \le&4l_\psi ^2{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu {\left\| {x_1^{k + 2} - x_1^{k + 1}} \right\|^2} + 4l_\psi ^2{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu {\left\| {x_2^{k + 2} - x_2^{k + 1}} \right\|^2} \\ & &+ (4l_\phi ^2\mu + 4l_\psi ^2{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu ){\left\| {{y^{k + 2}} - {y^{k + 1}}} \right\|^2} + 4l_g^2\mu {\left\| {x_1^{k + 1} - x_1^k} \right\|^2} \\ &&+4l_g^2\mu {\left\| {x_2^{k + 1} - x_2^k} \right\|^2} + (4l_g^2\mu + 4l_\phi ^2\mu ){\left\| {{y^{k + 1}} - {y^k}} \right\|^2}. \end{matrix}$

$$${\left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|^2} \le \mu {\left\| {B({\lambda ^{k + 1}} - {\lambda ^k})} \right\|^2}.$$$

$\begin{matrix} {B^{\rm T}}{\lambda ^k}&=& {\nabla _\phi }({y^k}) - {\nabla _\phi }({y^{k + 1}}) - {\nabla _y}g(x_1^k,x_2^k,{y^k}) \\ &&+ \beta {B^{\rm T}}({\nabla _\psi }(c) - {\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}})), \end{matrix}$

$\begin{matrix} {B^{\rm T}}{\lambda ^{k + 1}} &= &{\nabla _\phi }({y^{k + 1}}) - {\nabla _\phi }({y^{k + 2}}) - {\nabla _y}g(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) \\ & &+ \beta {B^{\rm T}}({\nabla _\psi }(c) - {\nabla _\psi }({A_1}x_1^{k + 2} + {A_2}x_2^{k + 2} + B{y^{k + 2}})), \end{matrix}$

$\begin{matrix} && {\left\| {{B^{\rm T}}({\lambda ^{k + 1}} - {\lambda ^k})} \right\|^2} \\ &=&\left\| {{\nabla _y}g(x_1^k,x_2^k,{y^k}) - {\nabla _y}g(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}})} \right. + {\nabla _\phi }({y^{k + 1}}) - {\nabla _\phi }({y^{k + 2}})+ {\nabla _\phi }({y^{k + 1}}) - {\nabla _\phi }({y^k}) \\ &&{\left. { + \beta {B^{\rm T}}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }({A_1}x_1^{k + 2} + {A_2}x_2^{k + 2} + B{y^{k + 2}}))} \right\|^2} \\ &\le & 4{\left[ {\left\| {{\nabla _y}g(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) - {\nabla _y}g(x_1^k,x_2^k,{y^k})} \right\|} \right.^2} \\ &&+ {\left\| {{\nabla _\phi }({y^{k + 2}}) - {\nabla _\phi }({y^{k + 1}})} \right\|^2} + {\left\| {{\nabla _\phi }({y^{k + 1}}) - {\nabla _\phi }({y^k})} \right\|^2} \\ &&+ {\beta ^2}{{\left\| {{B^{\rm T}}} \right\|}^2}\left. {{{\left\| {({\nabla _\psi }({A_1}x_1^{k + 2} + {A_2}x_2^{k + 2} + B{y^{k + 2}}) - {\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}})} \right\|}^2}} \right] \\ &\le & 4l_g^2{\left\| {(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) - (x_1^k,x_2^k,{y^k})} \right\|^2} + 4l_\phi ^2{\left\| {({y^{k + 2}} - {y^{k + 1}})} \right\|^2} + 4l_\phi ^2{\left\| {({y^{k + 1}} - {y^k})} \right\|^2} \\ && + 4{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}l_\psi ^2{\left\| {({A_1}x_1^{k + 2} + {A_2}x_2^{k + 2} + B{y^{k + 2}}) - ({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}})} \right\|^2}. \end{matrix}$

$\begin{matrix} {\left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|^2} &\le & \mu {\left\| {B({\lambda ^{k + 1}} - {\lambda ^k})} \right\|^2} \\ &\le & 4l_\psi ^2{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu {\left\| {x_1^{k + 2} - x_1^{k + 1}} \right\|^2} + 4l_\psi ^2{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu {\left\| {x_2^{k + 2} - x_2^{k + 1}} \right\|^2} \\ && + 4l_g^2\mu {\left\| {x_1^{k + 1} - x_1^k} \right\|^2} + (4l_\phi ^2\mu + 4l_\psi ^2{\beta ^2}{\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu ){\left\| {{y^{k + 2}} - {y^{k + 1}}} \right\|^2} \\ &&+ 4l_g^2\mu {\left\| {x_2^{k + 1} - x_2^k} \right\|^2} + (4l_g^2\mu + 4l_\phi ^2\mu ){\left\| {{y^{k + 1}} - {y^k}} \right\|^2}. \end{matrix}$

${\left\| {A + B} \right\|^2} \le {\left\| A \right\|^2} + {\left\| B \right\|^2} + 2\left\| A \right\|\left\| B \right\|.$

${\left\| {A + B} \right\|^2} \le 2({\left\| A \right\|^2} + {\left\| B \right\|^2}).$

$\left\{ {{w^k} = {{(x_1^k,x_2^k,{y^k},{\lambda ^k})}^{\rm T}}} \right\}$是L-G-BADMM算法产生的点列, 为了证明${\hat L_\beta }({w^k})$ 的单调性, 下面构造函数$\hat L({\hat w^k})$

$\hat L({x_1},{x_2},y,\lambda,{\hat x_1},{\hat x_2},\hat y) = {\hat L_\beta }({x_1},{x_2},y,\lambda ) + {\sigma _1}{\left\| {\hat x_1^{} - x_1^{}} \right\|^2} +{\sigma _2}{\left\| {\hat x_2^{} - x_2^{}} \right\|^2} + {\sigma _3}{\left\| {\hat y - y} \right\|^2},$

${\sigma _1} = - \left( {\frac{{{\delta _{{\varphi _1}}} - l_g^{}}}{2} - \frac{{4l_g^2\mu }}{\beta } - 4l_\psi ^2\beta{{\left\| {{B^{\rm T}}} \right\|}^2}{{\left\| {{A_1}} \right\|}^2}\mu } \right),$
${\sigma _2} = - \left( {\frac{{{\delta _{{\varphi _2}}} - l_g^{}}}{2} - \frac{{4l_g^2\mu }}{\beta } - 4l_\psi ^2\beta{{\left\| {{B^{\rm T}}} \right\|}^2}{{\left\| {{A_2}} \right\|}^2}\mu } \right),$
${\sigma _3} = - \left( {\frac{{{\delta _\phi } - l_g^{}}}{2} - \frac{{4l_g^2\mu + 8l_\phi ^2\mu }}{\beta } - 4l_\psi ^2\beta{{\left\| {{B^{\rm T}}} \right\|}^2}{{\left\| B \right\|}^2}\mu } \right).$

$\begin{matrix} &&{f_1}(x_1^{k + 1}) + {f_2}(x_2^{k + 1}) + g(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) + \langle{\lambda ^k},{A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}} - c\rangle \\ &&+ \beta {\Delta _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}},c) \\ & \le & {f_1}(x_1^k) + {f_2}(x_2^{k + 1}) + g(x_1^k,x_2^k,{y^k}) + \langle{\lambda ^k},{A_1}x_1^k + {A_2}x_2^k + B{y^k} - c\rangle \\ &&+ \beta {\Delta _\psi }({A_1}x_1^k + {A_2}x_2^k + B{y^k},c) + [g(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) - g(x_1^k,x_2^k,{y^k}) \\ &&- ({(x_1^{k + 1} - x_1^k)^{\rm T}}{\nabla _{{x_1}}}g(x_1^k,x_2^k,{y^k}) + {(x_2^{k + 1} - x_2^k)^{\rm T}}{\nabla _{{x_2}}}g(x_1^k,x_2^k,{y^k}) \\ & &+ {({y^{k + 1}} - {y^k})^{\rm T}}{\nabla _y}g(x_1^k,x_2^k,{y^k})) - \frac{{{\delta _{{\varphi _1}}}}}{2}{\left\| {x_1^{k + 1} - x_1^k} \right\|^2} - \frac{{{\delta _{{\varphi _2}}}}}{2}{\left\| {x_2^{k + 1} - x_2^k} \right\|^2} \\ & &- \frac{{{\delta _\phi }}}{2}{\left\| {{y^{k + 1}} - {y^k}} \right\|^2}, \end{matrix}$

$\begin{matrix} &&{{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^k}) \\ &\le &{{\hat L}_\beta }(x_1^k,x_2^k,{y^k},{\lambda ^k}) + \frac{{{l_g}}}{2}{\left\| {(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) - (x_1^k,x_2^k,{y^k})} \right\|^2} \\ && - \frac{{{\delta _{{\varphi _1}}}}}{2}{\left\| {x_1^{k + 1} - x_1^k} \right\|^2} - \frac{{{\delta _{{\varphi _2}}}}}{2}{\left\| {x_2^{k + 1} - x_2^k} \right\|^2} - \frac{{{\delta _\phi }}}{2}{\left\| {{y^{k + 1}} - {y^k}} \right\|^2}. \end{matrix}$

$\begin{matrix} &&{{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^{k + 1}}) \\ &\le & {{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^k}) + \langle{\lambda ^{k + 1}} - {\lambda ^k},{A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}} - c\rangle \\ &=& {{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^k}) + \frac{1}{\beta }{\left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|^2}, \end{matrix}$

$\begin{matrix} &&{{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^{k + 1}}) \\ &\le & {{\hat L}_\beta }(x_1^k,x_2^k,{y^k},{\lambda ^k}) - \frac{{{\delta _{{\varphi _1}}} - {l_g}}}{2}{\left\| {x_1^{k + 1} - x_1^k} \right\|^2}- \frac{{{\delta _{{\varphi _2}}} - {l_g}}}{2}{\left\| {x_2^{k + 1} - x_2^k} \right\|^2} \\ &&- \frac{{{\delta _\phi } - {l_g}}}{2}{\left\| {{y^{k + 1}} - {y^k}} \right\|^2} + \frac{1}{\beta }{\left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|^2}. \end{matrix}$

$\begin{matrix} &&{{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^{k + 1}}) - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu {\left\| {x_1^{k + 2} - x_1^{k + 1}} \right\|^2} \\ &&- 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu {\left\| {x_2^{k + 2} - x_2^{k + 1}} \right\|^2}- (\frac{{4l_\phi ^2\mu }}{\beta } + 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu ){\left\| {{y^{k + 2}} - {y^{k + 1}}} \right\|^2} \\ &\le &{{\hat L}_\beta }(x_1^k,x_2^k,{y^k},{\lambda ^k}) - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu {\left\| {x_1^{k + 1} - x_1^k} \right\|^2} \\ && - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu {\left\| {x_2^{k + 1} - x_2^k} \right\|^2} - (\frac{{4l_\phi ^2\mu }}{\beta } + 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu ){\left\| {{y^{k + 1}} - {y^k}} \right\|^2} \\ & &- (\frac{{{\delta _{{\varphi _1}}} - {l_g}}}{2} - \frac{{4l_g ^2\mu }}{\beta } - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu ){\left\| {x_1^{k + 1} - x_1^k} \right\|^2} \\ & &- (\frac{{{\delta _{{\varphi _2}}} - {l_g}}}{2} - \frac{{4l_g ^2\mu }}{\beta } - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu ){\left\| {x_2^{k + 1} - x_2^k} \right\|^2} \\ & &- (\frac{{{\delta _\phi } - {l_g}}}{2} - \frac{{4l_g^2\mu }}{\beta } - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu - \frac{{8l_\phi ^2\mu }}{\beta }){\left\| {{y^{k + 1}} - {y^k}} \right\|^2}. \end{matrix}$

$\begin{matrix} \hat L({{\hat w}^{k + 1}}) &= &\hat L(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^{k + 1}},\hat x_1^{k + 1},\hat x_2^{k + 1},{{\hat y}^{k + 1}}) \\ & = &{{\hat L}_\beta }(x_1^{k + 1},x_2^{k + 1},{y^{k + 1}},{\lambda ^{k + 1}}) - 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_1}} \right\|^2}\mu {\left\| {x_1^{k + 2} - x_1^{k + 1}} \right\|^2} \\ & &-4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| {{A_2}} \right\|^2}\mu {\left\| {x_2^{k + 2} - x_2^{k + 1}} \right\|^2} \\ && - (\frac{{4l_\phi ^2\mu }}{\beta } + 4l_\psi ^2\beta {\left\| {{B^{\rm T}}} \right\|^2}{\left\| B \right\|^2}\mu ){\left\| {{y^{k + 2}} - {y^{k + 1}}} \right\|^2}. \end{matrix}$

${\sigma _4}\sum\limits_{k = 1}^t {({{\left\| {x_1^{k + 1} - x_1^k} \right\|}^2} + {{\left\| {x_2^{k + 1} - x_2^k} \right\|}^2} + {{\left\| {{y^{k + 1}} - {y^k}} \right\|}^2})} \le \hat L({\hat w^1}) - \hat L({\hat w^{t + 1}}) < + \infty.$

${\sigma _4} > 0$, 以及$t$的任意性可知

$\sum\limits_{k = 0}^{ + \infty } {{{\left\| {{u^{k + 1}} - {u^k}} \right\|}^2}} < + \infty,$

$\sum\limits_{k = 0}^{ + \infty } {{{\left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|}^2}} < + \infty.$

$\sum\limits_{k = 0}^{ + \infty } {{{\left\| {{w^{k + 1}} - {w^k}} \right\|}^2}} < + \infty,$

$\begin{matrix} d\left( {0,\partial \hat L\left( {{{\hat w}^{k + 1}}} \right)} \right) &\le &\gamma (\left\| {x_1^{k + 2} - x_1^{k + 1}} \right\| + \left\| {x_2^{k + 2} - x_2^{k + 1}} \right\| + \left\| {{y^{k + 2}} - {y^{k + 1}}} \right\| \\ & &+ \left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\|). \end{matrix}$

$$$\left\{ \begin{array}{ll} {\nabla _{{x_1}}}\hat L({{\hat w}^{k + 1}}) = \nabla {f_1}(x_1^{k + 1}) + {\nabla _{{x_1}}}g(x_1^k,x_2^k,{y^k}) + A_1^{\rm T}{\lambda ^{k + 1}}\\ + \beta A_1^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }(c)) - 2{\sigma _1}(x_1^{k + 2} - x_1^{k + 1}),\\ {\nabla _{{x_2}}}\hat L({{\hat w}^{k + 1}}) = \nabla {f_2}(x_2^{k + 1}) + {\nabla _{{x_2}}}g(x_1^k,x_2^k,{y^k}) + A_2^{\rm T}{\lambda ^{k + 1}}\\ + \beta A_2^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }(c)) - 2{\sigma _2}(x_2^{k + 2} - x_2^{k + 1}),\\ {\nabla _y}\hat L({{\hat w}^{k + 1}}) = {\nabla _y}g(x_1^k,x_2^k,{y^k}) + B_{}^{\rm T}{\lambda ^{k + 1}} - 2{\sigma _3}({y^{k + 2}} - {y^{k + 1}})\\ \ + \beta B_{}^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }(c)),\\ {\nabla _{{{\hat x}_1}}}\hat L({{\hat w}^{k + 1}}) = 2{\sigma _1}(x_1^{k + 2} - x_1^{k + 1}),\\ {\nabla _{{{\hat x}_2}}}\hat L({{\hat w}^{k + 1}}) = 2{\sigma _2}(x_2^{k + 2} - x_2^{k + 1}),\\ {\nabla _{{{\hat y}_{}}}}\hat L({{\hat w}^{k + 1}}) = 2{\sigma _3}({y^{k + 2}} - {y^{k + 1}}),\\ {\nabla _{{\lambda _{}}}}\hat L({{\hat w}^{k + 1}}) = \frac{1}{\beta }({\lambda ^{k + 1}} - {\lambda ^k}). \end{array} \right.$$$

$$$\left\{ \begin{array}{ll} \nabla {f_1}(x_1^{k + 1}) = - {\nabla _{{x_1}}}g(x_1^k,x_2^k,{y^k}) - A_1^{\rm T}{\lambda ^k} + {\nabla _{{\varphi _1}}}(x_1^k) - {\nabla _{{\varphi _1}}}(x_1^{k + 1})\\ - \beta A_1^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^k + B{y^k}) - {\nabla _\psi }(c)),\\ \nabla {f_2}(x_2^{k + 1}) = - {\nabla _{{x_2}}}g(x_1^k,x_2^k,{y^k}) - A_2^{\rm T}{\lambda ^k} + {\nabla _{{\varphi _2}}}(x_2^k) - {\nabla _{{\varphi _2}}}(x_2^{k + 1})\\ - \beta A_2^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^k}) - {\nabla _\psi }(c)),\\ {\nabla _y}g(x_1^k,x_2^k,{y^k}) = - B_{}^{\rm T}{\lambda ^k} + {\nabla _\phi }(y_{}^k) - {\nabla _\phi }(y_{}^{k + 1})\\ - \beta B_{}^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }(c)),\\ {\lambda ^{k + 1}} = {\lambda ^k} + \beta ({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}} - c). \end{array} \right.$$$

$$$\left\{ \begin{array}{ll} \rho _1^k = A_1^{\rm T}({\lambda ^{k + 1}} - {\lambda ^k}) - 2{\sigma _1}(x_1^{k + 2} - x_1^{k + 1}) + {\nabla _{{\varphi _1}}}(x_1^k) - {\nabla _{{\varphi _1}}}(x_1^{k + 1})\\ + \beta A_1^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^k + B{y^k})),\\ \rho _2^k = A_2^{\rm T}({\lambda ^{k + 1}} - {\lambda ^k}) - 2{\sigma _2}(x_2^{k + 2} - x_2^{k + 1}) + {\nabla _{{\varphi _2}}}(x_2^k) - {\nabla _{{\varphi _2}}}(x_2^{k + 1})\\ + \beta A_2^{\rm T}({\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^k})),\\ \rho _3^k = B_{}^{\rm T}({\lambda ^{k + 1}} - {\lambda ^k}) - 2{\sigma _3}(y_{}^{k + 2} - y_{}^{k + 1}) + {\nabla _\phi }(y_{}^k) - {\nabla _\phi }(y_{}^{k + 1}),\\ \rho _4^k = 2{\sigma _1}(x_1^{k + 2} - x_1^{k + 1}),\\ \rho _5^k = 2{\sigma _2}(x_2^{k + 2} - x_2^{k + 1}),\\ \rho _6^k = 2{\sigma _3}({y^{k + 2}} - {y^{k + 1}}),\\ \rho _7^k = \frac{1}{\beta }({\lambda ^{k + 1}} - {\lambda ^k}). \end{array} \right.$$$

$d\left( {0,\partial \hat L\left( {{{\hat w}^{k + 1}}} \right)} \right) \le \left\| {{{(\rho _1^k,\rho _2^k,\rho _3^k,\rho _4^k,\rho _5^k,\rho _6^k,\rho _7^k)}^{\rm T}}} \right\|.$

$\begin{matrix} &&d\left( {0,\partial \hat L\left( {{{\hat w}^{k + 1}}} \right)} \right) \\ &\le&{\gamma _1}(\left\| {x_1^{k + 2} - x_1^{k + 1}} \right\| + \left\| {x_2^{k + 2} - x_2^{k + 1}} \right\| + \left\| {{y^{k + 2}} - {y^{k + 1}}} \right\| + \left\| {{\lambda ^{k + 1}} - {\lambda ^k}} \right\|) \\ & &+\beta \left\| {A_1^{\rm T}} \right\|\left\| {{\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^k + B{y^k})} \right\| \\ &&+ \beta\left\| {A_2^{\rm T}} \right\|\left\| {{\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^{k + 1}}) - {\nabla _\psi }({A_1}x_1^{k + 1} + {A_2}x_2^{k + 1} + B{y^k})} \right\| \\ & &+\left\| {{\nabla _{{\varphi _1}}}(x_1^k) - {\nabla _{{\varphi _1}}}(x_1^{k + 1})} \right\| + \left\| {{\nabla _{{\varphi _2}}}(x_2^k) - {\nabla _{{\varphi _2}}}(x_2^{k + 1})} \right\| + \left\| {{\nabla _\phi }({y^k}) - {\nabla _\phi }({y^{k + 1}})} \right\|, \end{matrix}$

1) 故$\mathop {\lim }\limits_{k \to \infty } d({\hat w^k},S({\hat w^0})) = 0$;

2) 若$(x_1^*,x_2^*,{y^*},{\lambda ^*},\hat x_1^*,x_2^*,{y^*}) \in S({\hat w^0})$, 则${\hat w^*}$$\hat L( \cdot )的稳定点; 3) 则\hat L( \cdot )在集合S({\hat w^0})上取值为常数且\mathop {\inf }\limits_{k \in N} \hat L({\hat w^k}) = \mathop {\lim }\limits_{k \to \infty } \hat L({\hat w^k}). 1) 由定义可得\mathop {\lim }\limits_{k \to \infty } d({\hat w^k},S({\hat w^0})) = 0. 2) 设{\hat w^*} \in S({\hat w^0}), 则存在\left\{ {{{\hat w}^k}} \right\}的收敛子列\left\{ {{{\hat w}^{{k_j}}}} \right\}收敛于\left\{ {{{\hat w}^*}} \right\}.x_1^{k + 1}的最优性条件可得 $$\hat L(x_1^{k + 1},x_2^k,{y^k},{\lambda ^k},x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}) \le \hat L(x_1^*,x_2^k,{y^k},{\lambda ^k},x_1^{k + 1},x_2^{k + 1},{y^{k + 1}}),$$ \mathop {\lim }\limits_{j \to \infty } {\hat w^{{k_j}}} = {\hat w^*}, 可得 \begin{matrix} & &\mathop {\lim }\limits_{j \to \infty } \sup \hat L(x_1^{{k_j} + 1},x_2^{{k_j}},{y^{{k_j}}},{\lambda ^{{k_j}}},x_1^{{k_j} + 1},x_2^{{k_j} + 1},{y^{{k_j} + 1}}) \\ & \le& \mathop {\lim }\limits_{j \to \infty } \sup \hat L(x_1^{{k_j} + 1},x_2^{{k_j} + 1},{y^{{k_j} + 1}},{\lambda ^{{k_j} + 1}},x_1^{{k_j} + 2},x_2^{{k_j} + 2},{y^{{k_j} + 2}}) \\ &\le &\hat L({{\hat w}^*}). \end{matrix} 又由\hat L( \cdot )的下半连续性, 可知 $$\mathop {\lim }\limits_{j \to \infty } \inf \hat L({\hat w^{{k_j} + 1}}) \ge \hat L({\hat w^*}),$$ 结合(3.25)式和(3.26)式可得 $$\mathop {\lim }\limits_{j \to \infty } \hat L({\hat w^{{k_j} + 1}}) = \hat L({\hat w^*}).$$ 又由{\nabla _g}的连续性和\partial f的闭性并对(3.21)式中序列\left\{ {{{\hat w}^{{k_j} + 1}}} \right\}取极限, 则有 $$\left\{ \begin{array}{ll} \nabla {f_1}(x_1^*) = - {\nabla _{{x_1}}}g(x_1^*,x_2^*,{y^*}) - A_1^{\rm T}{\lambda ^*},\\ \nabla {f_2}(x_2^*) = - {\nabla _{{x_2}}}g(x_1^*,x_2^*,{y^*}) - A_2^{\rm T}{\lambda ^*},\\ {\nabla _y}g(x_1^*,x_2^*,{y^*}) = - B_{}^{\rm T}{\lambda ^*},\\ {A_1}x_1^* + {A_2}x_2^* + Bx_2^* = c. \end{array} \right.$$ {\hat w^*}$$\hat L( \cdot )$的稳定点.

3) 结合(3.25)式和${\left\{ {\hat L({{\hat w}^k})} \right\}_{k \in N}}$单调递减可得$\mathop {\lim }\limits_{k \to \infty } \hat L({\hat w^k}) = \hat L({\hat w^*})$, 并有函数$\hat L( \cdot )$在非空紧集$S({\hat w^0})$上是常数, 以及$\mathop {\inf }\limits_{k \in N} \hat L({\hat w^k}) = \mathop {\lim }\limits_{k \to \infty } \hat L({\hat w^k})$成立. 证毕.

(1) $\sum\limits_{k = 0}^{ + \infty } {{{\left\| {{w^{k + 1}} - {w^k}} \right\|}^2}} < + \infty$;

(2) 序列$\left\{ {{w^k}} \right\}$收敛到本文问题(1.3)的稳定点.

1a) 若存在整数${k_0}$, 有$\hat L({\hat w^{{k_0}}}) = \hat L({\hat w^*})$, 又由引理3.4可得, 对任意的$k > {k_0}$, 使得

$\begin{eqnarray*} &&{\sigma _4}\left( {{{\left\| {x_1^{k + 1} - x_1^k} \right\|}^2} + {{\left\| {x_2^{k + 1} - x_2^k} \right\|}^2} + {{\left\| {{y^{k + 1}} - {y^k}} \right\|}^2}} \right) \\ & \le& \hat L({{\hat w}^k}) - \hat L({{\hat w}^{k + 1}}) \le \hat L({{\hat w}^{{k_0}}}) - \hat L({{\hat w}^*}) = 0 \end{eqnarray*}$

1b) 对任意${k}$均有$\hat L({\hat w^{{k}}}) = \hat L({\hat w^*})$, 因为$\mathop {\lim }\limits_{k \to \infty } d({\hat w^k},S({\hat w^0})) = 0$, 所以对于任意给定的$\varepsilon > 0$, 存在${k_1} > 0$, 当$k > {k_1}$时, 有$d({\hat w^k},S({\hat w^0})) < \varepsilon$. 又因为$\mathop {\lim }\limits_{k \to \infty } \hat L({\hat w^k}) = \hat L({\hat w^*})$, 所以对任意给定$\eta > 0$, 存在${k_2} > 0$, 且$k > {k_2}$时, 有$\hat L({\hat w^k}) < L({\hat w^*}) + \eta$. 那么当$k > \tilde k = \max \left\{ {{k_1},{k_2}} \right\}$, 有$d({\hat w^k},S({\hat w^0})) < \varepsilon,L({\hat w^*}) < \hat L({\hat w^k}) < L({\hat w^*}) + \eta$. 因为$S({\hat w^0})$为紧集且非空时, 函数$\hat L( \cdot )$在集合$S({\hat w^0})$上为常数, 所以由一致K-L性质, 对任意$k > \tilde k$, 有$\varphi '(\hat L({\hat w^k}) - L({\hat w^*}))d(0,\partial L({\hat w^*})) \ge 1,$$\forall k > \tilde k$, 再由引理3.4, 则有

$\begin{matrix} \frac{1}{{\varphi '(\hat L({{\hat w}^k}) - \hat L({{\hat w}^*}))}}&\le& d(0,\partial \hat L({{\hat w}^*})) \\ & \le &\gamma (\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\| \\ && + \left\| {x_1^k - x_1^{k - 1}} \right\|+ \left\| {x_2^k - x_2^{k - 1}} \right\| + \left\| {{y^k} - {y^{k - 1}}} \right\|). \end{matrix}$

$\begin{matrix} &&\varphi '(\hat L({\hat w^k}) - L({\hat w^*}))(\hat L({\hat w^k}) - L({\hat w^{k + 1}})) \\ &\le& \varphi (\hat L({\hat w^k}) - L({\hat w^*})) - \varphi (\hat L({\hat w^{k + 1}}) - L({\hat w^*})). \end{matrix}$

$\begin{matrix} {\sigma _4}{\left\| {{u^{k + 1}} - {u^k}} \right\|^2} &\le &\hat L({{\hat w}^k}) - \hat L({{\hat w}^{k + 1}}) \\ &\le& \frac{{\varphi (\hat L({{\hat w}^k}) - \hat L({{\hat w}^*})) - \varphi (\hat L({{\hat w}^{k + 1}}) - \hat L({{\hat w}^*}))}}{{\varphi '(\hat L({{\hat w}^k}) - \hat L({{\hat w}^*}))}} \\ &\le& \gamma (\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\| \\ &&+ \left\| {x_1^k - x_1^{k - 1}} \right\| + \left\| {x_2^k - x_2^{k - 1}} \right\| + \left\| {{y^k} - {y^{k - 1}}} \right\|){\Delta _{k,k + 1}}. \end{matrix}$

$\begin{matrix} &&{\left\| {x_1^{k + 1} - x_1^k} \right\|^2} + {\left\| {x_2^{k + 1} - x_2^k} \right\|^2} + {\left\| {{y^{k + 1}} - {y^k}} \right\|^2} \\ & \le& \frac{\gamma }{{{\sigma _4}}}(\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\| + \left\| {x_1^k - x_1^{k - 1}} \right\| \\ && + \left\| {x_2^k - x_2^{k - 1}} \right\| + \left\| {{y^k} - {y^{k - 1}}} \right\|){\Delta _{k,k + 1}}. \end{matrix}$

$\begin{eqnarray*} && 3(\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\|) \\ & \le& 3\sqrt 3 {({\left\| {x_1^{k + 1} - x_1^k} \right\|^2} + {\left\| {x_2^{k + 1} - x_2^k} \right\|^2} + {\left\| {{y^{k + 1}} - {y^k}} \right\|^2})^{\frac{1}{2}}} \\ &\le& 2(\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\| \\ &&+\left\| {x_1^k - x_1^{k - 1}} \right\| + \left\| {x_2^k - x_2^{k - 1}} \right\| + \left\| {{y^k} - {y^{k - 1}}} \right\|{)^{\frac{1}{2}}}{(\frac{{27\gamma }}{{4{\sigma _4}}}{\Delta _{k,k + 1}})^{\frac{1}{2}}} \\ & \le& \left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\| \\ & &+\left\| {x_1^k - x_1^{k - 1}} \right\| + \left\| {x_2^k - x_2^{k - 1}} \right\| + \left\| {{y^k} - {y^{k - 1}}} \right\| + \frac{{27\gamma }}{{4{\sigma _4}}}{\Delta _{k,k + 1}}, \end{eqnarray*}$

$\begin{eqnarray*} && \sum\limits_{k = \tilde k + 1}^m {3(\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\|)} \\ & \le& \sum\limits_{k = \tilde k + 1}^m {(\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\|} \\ &&+ \left\| {x_1^k - x_1^{k - 1}} \right\| + \left\| {x_2^k - x_2^{k - 1}} \right\| + \left\| {{y^k} - {y^{k - 1}}} \right\| + \frac{{27\gamma }}{{4{\sigma _4}}}{\Delta _{k,k + 1}}), \end{eqnarray*}$

$\begin{matrix} &&\sum\limits_{k = \tilde k + 1}^m {\left\| {x_1^{k + 1} - x_1^k} \right\| + } \sum\limits_{k = \tilde k + 1}^m {\left\| {x_2^{k + 1} - x_2^k} \right\|} + \sum\limits_{k = \tilde k + 1}^m {\left\| {{y^{k + 1}} - {y^k}} \right\|} \\ & \le& \sum\limits_{k = \tilde k + 1}^m {(\left\| {x_1^k - x_1^{k - 1}} \right\| - \left\| {x_1^{k + 1} - x_1^k} \right\|)} + \sum\limits_{k = \tilde k + 1}^m {(\left\| {x_2^k - x_2^{k - 1}} \right\|} - \left\| {x_2^{k + 1} - x_2^k} \right\|) \\ & & + \sum\limits_{k = \tilde k + 1}^m {(\left\| {{y^k} - {y^{k - 1}}} \right\| - \left\| {{y^{k + 1}} - {y^k}} \right\|)} + \sum\limits_{k = \tilde k + 1}^m {\frac{{27\gamma }}{{4{\sigma _4}}}{\Delta _{k,k + 1}}} \\ &\le& \left\| {x_1^{\tilde k + 1} - x_1^{\tilde k}} \right\| - \left\| {x_1^{m + 1} - x_1^m} \right\| + \left\| {x_2^{\tilde k + 1} - x_2^{\tilde k}} \right\| - \left\| {x_2^{m + 1} - x_2^m} \right\| \\ & & + \left\| {{y^{\tilde k + 1}} - {y^{\tilde k}}} \right\| - \left\| {{y^{m + 1}} - {y^m}} \right\| + \frac{{27\gamma }}{{4{\sigma _4}}}( \varphi (\hat L({{\hat w}^{\tilde k + 1}}) - \hat L({{\hat w}^*})) \\ & \le& \left\| {x_1^{\tilde k + 1} - x_1^{\tilde k}} \right\| + \left\| {x_2^{\tilde k + 1} - x_2^{\tilde k}} \right\| + \left\| {{y^{\tilde k + 1}} - {y^{\tilde k}}} \right\|+ \frac{{27\gamma }}{{4{\sigma _4}}}\varphi (\hat L({{\hat w}^{\tilde k + 1}}) - \hat L({{\hat w}^*})). \end{matrix}$

$\sum\limits_{k = \tilde k + 1}^m {(\left\| {x_1^{k + 1} - x_1^k} \right\| + \left\| {x_2^{k + 1} - x_2^k} \right\| + \left\| {{y^{k + 1}} - {y^k}} \right\|)} < + \infty.$

$\sum\limits_{k = \tilde k + 1}^m {(\left\| {w_{}^{k + 1} - w_{}^k} \right\|)} < + \infty,$

$\sum\limits_{k = 0}^{ + \infty } {\left\| {{w^{k + 1}} - {w^k}} \right\|} < + \infty.$

(2) 根据上述证明以及$\left\{ {{w^k}} \right\}$是柯西序列, 可得$\left\{ {{w^k}} \right\}$收敛, 又由定理3.1(2) 可知该定理显然成立. 定理3.2得证.

## 参考文献 原文顺序 文献年度倒序 文中引用次数倒序 被引期刊影响因子

Banerjee A, Merugu S, Dhillon I, Ghosh J.

Clustering with bregman divergences

Journal of Machine Learning Research, 2005, 6: 1705-1749

Bolte J, Daniilidis A, Ley O, et al.

Characterizations of Lojasiewicz inequalities and applications: subgradient flows, talweg, convexity

Transactions of the American Mathematical Society, 2010, 362(6): 3319-3363

Boyd S, Parikh N, Chu E, et al.

Distributed optimization and statistical learning via the alternating direction method of multipliers

Foundations Trends in Machine Learning, 2010, 3(1): 1-122

Bregman L M.

The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming

USSR Computational Mathematics and Mathematical Physics, 1967, 7(3): 200-217

Chao M T, Cheng C Z, Zhang H B.

A linearized alternating direction method of multipliers with substitution procedure

Asia-Pacific Journal of Operational Research, 2015, 32(3): 1550011

Chao M T, Deng Z, Jian J B.

Convergence of linear Bregman ADMM for nonconvex and nonsmooth problems with nonseparable structure

Complexity, 2020, 2020: Art ID 6237942

Chao M T, Zhang Y, Jian J B.

An inertial proximal alternating direction method of multipliers for nonconvex optimization

International Journal of Computer Mathematics, 2021, 98(6): 1199-1217

Chen C H, He B S, Ye Y Y, et al.

The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent

Mathematical Programming, 2016, 155: 57-79

Deng W, Yin W T.

On the global and linear convergence of the generalized alternating direction method of multipliers

Journal of Scientific Computing, 2016, 66(3): 889-916

Feng J K, Zhang H B, Cheng C Z, et al.

Convergence analysis of L-ADMM for multi-block linear-constrained separable convex minimization problem

Journal of the Operations Research Society of China, 2015, 3(4): 563-579

Gabay D, Mercier B.

A dual algorithm for the solution of nonlinear variational problems via finite element approximation

Computers and Mathematics with Applications, 1976, 2(1): 17-40

Glowinski R, Marroco A.

Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires

Journal of E-quine Veterinary Science, 1975, 2(2): 41-76

Guo K, Han D R, Wang D Z W, et al.

Convergence of ADMM for multi-block nonconvex separable optimization models

Frontiers of Mathematics in China, 2017, 12(5): 1139-1162

For solving minimization problems whose objective function is the sum of two functions without coupled variables and the constrained function is linear, the alternating direction method of multipliers (ADMM) has exhibited its efficiency and its convergence is well understood. When either the involved number of separable functions is more than two, or there is a nonconvex function, ADMM or its direct extended version may not converge. In this paper, we consider the multi-block separable optimization problems with linear constraints and absence of convexity of the involved component functions. Under the assumption that the associated function satisfies the Kurdyka- Lojasiewicz inequality, we prove that any cluster point of the iterative sequence generated by ADMM is a critical point, under the mild condition that the penalty parameter is sufficiently large. We also present some sufficient conditions guaranteeing the sublinear and linear rate of convergence of the algorithm.

Guo K, Hand D R, Wu T T.

Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints

International Journal of Computer Mathematics, 2017, 94(8): 1653-1669

Guo K, Wang X.

Convergence of generalized alternating direction method of multipliers for nonseparable nonconvex objective with linear constraints

Journal of Mathematical Research with Applications, 2018, 38(5): 523-540

Hong M Y, Lou Z Q, Razaviyayn M.

Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems

SIAM Journal on Optimization, 2016, 26(1): 337-364

Li M, Sun D F, Toh K C.

A convergent 3-blocksemi-proximal ADMM for convex minimization problems with one strongly convex block

Asia-Pacific Journal of Operational Research, 2015, 32(4): 1550024

Rockafellar R T, Wets R J B.

Variational Analysis

Berlin: Springer Science and Business Media, 2009

Wang F H, Cao W F, Xu Z B.

Convergence of multi-block Bregman ADMM for nonconvex composite problems

Science China Information Sciences, 2018, 61(12): 1-12

Wang F H, Xu Z B, Xu H K.

Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems

arXiv: 1410.8625

Wang H H, Banerjee A.

Bregman alternating direction method of multipliers

//Ghahramani Z, Welling M, Cortes C, et al. Advances in Neural Information Processing Systems, 2014: 2816-2824

Xu Z B, Chang X Y, Xu F M, Zhang H.

$L_\frac{1}{2}$ regularization: a thresholding representation theory and a fast solver

IEEE Transactions on Neural Networks and Learning Systems, 2012, 23: 1013-1027

Yang W H, Han D.

Linear Convergence of the alternating direction method of multipliers for a class of convex optimization problems

SIAM Journal on Numerical Analysis, 2016, 54(2): 625-640

Yashtini M.

Multi-block nonconvex nonsmooth proximal ADMM: Convergence and rates Under Kurdyka-ojasiewicz property

Journal of Optimization Theory and Applications, 2021, 190(3): 966-998

Zeng J S, Fang J, Xu Z B.

Sparse SAR imaging based on $L_\frac{1}{2}$ regularization

Sci China Infor Sci, 2012, 55: 1755-1775

Zeng J S, Lin S B, Wang Y, et al.

$L_\frac{1}{2}$ Regularization: Convergence of iterative half thresholding algorithm

IEEE Transactions on Signal Processing, 2014, 62(9): 2317-2329

Zeng J S, Xu Z B, Zhang B, et al.

Accelerated $L_\frac{1}{2}$ regularization based SAR imaging via BCR and reduced Newton skills

Signal Processing, 2013, 93: 1831-184

/

 〈 〉