特征提取中一类矩阵迹函数极值问题的黎曼优化算法

图1

图1 不同维数下算法 1 和算法 AL 的目标函数值随迭代步变化曲线图

5.2 与两类主流非单调线搜索技术的数值比较

本文提出的线搜索 (3.2) 实质上是结合了 Zhang-Hager 非单调线搜索技术与文献 [40] 中单调 Armijo 型线搜索的一种新型非单调线搜索. 为验证其有效性, 本节与前文提到的两种主流的非单调线搜索-Zhang-Hager线搜索技术和 Max 型非单调线搜索来进行比较. 与之比较的采用两类非单调搜索技术的代表性黎曼梯度类算法包括文献 [42] 中提出的应用于流形优化的基于曲线搜索的黎曼梯度下降法 ({OptStiefelGBB}), 文献 [32,43] 推广的基于欧式空间 Dai 非线性共轭梯度法且应用于流形优化的黎曼非线性共轭梯度法 ({RCG}). 首先给出算法 {OptStiefelGBB} 和 {RCG} 应用于求解问题 (1.7) 的迭代框架.

OptStiefelGBB: 对于给定参数 $\delta, \rho, \vartheta \in \left( {0,1} \right)$ 和搜索方向 $\left({{\eta _k},{\xi _k}}\right)=-\mathrm{grad} f(Q_k, P_k)$ , 其搜索步长 ${\alpha _k}=\max\{\bar{\alpha}_k {\rho ^h}, j=0,1,2,\cdots \}$ 由如下Zhang-Hager非单调搜索准则确定

$f\left( \mathcal{R}_{{(Q_k, P_k)}}(\alpha _k({{\eta _k},{\xi _k}} ))\right) \le {C_k} - \delta \alpha _k\| (\eta_{k},\xi_{k}) \| ^{2},$

其中作用于 Stiefel 流形上的收缩映射 $\mathcal{R}_Q{(\alpha\eta)}$ 采用基于 Cayley 变换的收缩算子^[42], 初始迭代步长 $\bar{\alpha}_k$ 的选取同 (5.1)-(5.2). 参考函数 $C_{k}$ 的选取同算法 1.

RCG: 迭代步长 ${\alpha_k}$ 采用如下 $\textrm{Max}$ 型非单调线搜索确定

$f\left( {{\mathcal{R}_{\left( {{Q_k},{P_k}} \right)}}\left( {{\alpha_k}\left( {{\eta _k},{\xi _k}} \right)} \right)} \right) \le \max \left\{ {f\left( {{Q_k},{P_k}} \right), \cdots, f\left( {{Q_{k - {h_k}}},{P_{k - {h_k}}}} \right)} \right\} + \delta {\alpha_k}\left\langle {\mathrm{grad}\ f\left( {{Q_k},{P_k}} \right),\left( {{\eta _k},{\xi _k}} \right)} \right\rangle,$

其中作用于 Stiefel 流形上的收缩映射 $\mathcal{R}_Q{(\alpha\eta)}$ 同算法 {OptStiefelGBB}, $\delta \in \left( {0,1} \right),{h_k} = \min \left\{ {h - 1,k} \right\}$ , $h$ 为正整数. 搜索方向更新方案为

$\left( {{\eta _{k + 1}},{\xi _{k + 1}}} \right) = - \mathrm{grad}\ f\left( {{Q_{k + 1}},{P_{k + 1}}} \right) + {\beta _k}{\mathbf{T} _{{\alpha _k}\left( {{\eta _k},{\xi _k}} \right)}}\left( {{\eta _k},{\xi _k}} \right),$

其中 $\mathbf{T}$ 为采用基于微分收缩的向量转移算子^[32], 参数 ${\beta _{k + 1}} \in \left[ {0,\beta _{k + 1}^D} \right]$ , 而

$\beta _{k + 1}^D = \frac{{{{\left\| {\mathrm{grad} f\left( {{Q_{k + 1}},{P_{k + 1}}} \right)} \right\|}^2}}}{{\max \left\{ {M, - \left\langle {\mathrm{grad} f\left( {{Q_k},{P_k}} \right),\left( {{\eta _k},{\xi _k}} \right)} \right\rangle } \right\}}},$

其中 $M = \left\langle {\mathrm{grad} f\left( {{Q_{k + 1}},{P_{k + 1}}} \right),{\mathbf{T} _{{\alpha_k}\left( {{\eta _k},{\xi _k}} \right)}}\left( {{\eta _k},{\xi _k}} \right) - \left\langle {\mathrm{grad} f\left( {{Q_k},{P_k}} \right),\left( {{\eta _k},{\xi _k}} \right)} \right\rangle } \right\rangle$ .

表2给出了不同系数维数下算法 1 与 OptStiefelGBB 和 RCG 的数值比较结果, 其中表头“CT.”、“IT.”、“Obj.”和“ ${ {\rm grad}_{QP}}$ ” 的定义同表1. 表2前 6 组数据给出的是低样本量、高维样本的数值比较结果, 后 6 组数据给出高样本量、低维样本的比较结果. 图2给出了三种算法的黎曼梯度范数随迭代步的半对数收敛曲线图. 从表2中数据可以看出, 在相同的终止标准下, 三种算法得到的迭代终止目标函数值近似相同, 其迭代解所对应的黎曼梯度范数相当. 但从迭代时间和迭代步来看, 大多数情况下, 本文算法 1 在迭代效率上有一定的优势. 因算法 OptStiefelGBB 不需要计算向量转移算子, 故达到相同终止标准所需要的迭代时间相对较少. 结合表2 的数据结果和图2的收敛曲线图可以看出, 本文提出的结合 Zhang-Hager 非单调线搜索技术与文献 [40] 中单调 Armijo 型线搜索的新型非单调线搜索较两类主流的非单调线搜索在迭代效率上有一定的优势.

表2 算法 1 与两类基于主流非单调线搜索技术黎曼梯度类算法的数值比较

图2

图2 算法 1 与两类基于主流非单调线搜索技术黎曼梯度类算法的黎曼梯度范数随迭代步变化曲线图

5.3 与黎曼优化工具箱 Manopt 中已有的黎曼一阶方法比较

本小节给出算法 1 与黎曼优化工具箱 Manopt^[44] 中已有的一阶方法进行数值比较, 与之比较的算法包括 Steepest-descent(RSD-Manopt), Conjugate-gradient(RCG-Manopt) 和 Barzilai-Borwein(BB-Manopt). 表3给出了在低样本量高维样本和高样本量低维样本两种情况下四种算法的数值比较结果. 图3给出了四种算法的黎曼梯度范数随迭代步的半对数变化曲线图. 黎曼优化工具箱 Manopt 中的三类算法其相关参数选取和终止标准均取为默认, 最大迭代步修改为 50000. 从表3中数据和图3中黎曼梯度范数的变化曲线图可以看出, 算法 RSD-Manopt 和算法 BB-Manopt 在大多数情况下都达到最大迭代步, 其黎曼梯度范数下降相对比较缓慢, 且达到一定精度后趋于稳定. 由此可以看出, 对于求解问题 (1.7), 本文算法较黎曼优化工具箱 Manopt 中的一阶算法相比也具有一定的优势.

表3 算法 1 与黎曼优化工具箱 Manopt 中的算法 RSD-Manopt, RCG-Manopt 和 BB-Manopt 的数值比较

图3

图3 不同维数下算法 1, RSD-Manopt, RCG-Manopt 和 BB-Manopt 的黎曼梯度范数随迭代步变化曲线图

5.4 与黎曼优化工具箱 Manopt 中已有的黎曼二阶方法比较

基于引理 2.2 中给出的黎曼海塞具体计算公式 (2.15), 本小节给出算法 1 与黎曼优化工具箱 Manopt 中已有的二阶算法进行数值比较, 与之比较的算法包括 BFGS(BFGS-Manopt), Trust-regions(RTR-Manopt) 和 Adaptive regularization by cubics(ARC-Manopt). 表4给出了在低样本量高维样本和高样本量低维样本两种情况下四种算法的数值比较结果. 图4给出了四种算法的黎曼梯度范数随迭代步的半对数变化曲线图. 黎曼优化工具箱 Manopt 中的三类算法其相关参数选取和终止标准均取为默认, 最大迭代步修改为 10000. 从表4的数据可以看出, 在对应的终止标准下, 算法 1 虽然总体迭代步数较多, 但因为不需要内迭代, 故总体时间较少. 算法 RTR-Manopt 因为需要用到截断共轭梯度法 (tCG) 求解相应信赖域子问题, 虽然具有超线性收敛速度, 但是总体迭代时间要比算法 1 长. 另在大多数情况下, 算法 BFGS-Manopt 的总体运行时间最长, 这是因为黎曼优化工具箱中的黎曼 BFGS 算法实质上是 Limited-memory BFGS 在黎曼流形上的推广, 且其默认的记忆规模为 30, 也即每次迭代中都需计算 30 次基于正交投影的向量转移算子, 这很大程度上主导了BFGS-Manopt 的运行时间.

表4 算法 1 黎曼优化工具箱 Manopt 中的算法 BFGS-Manopt, RTR-Manopt 和 ARC-Manopt 的数值比较

图4

图4 不同维数下算法 1, BFGS-Manopt, RTR-Manopt 和 ARC-Manopt的黎曼梯度范数随迭代步变化曲线图

6 结论

本文研究了来源于特征提取的一类鲁棒判别回归模型, 该模型可重构为由 Stiefel 流形和线性流形所组成的黎曼乘积流形上的一类矩阵迹函数极小化问题, 即问题 (1.6). 因模型 (1.6) 的目标函数中矩阵 $F, D$ 的元素由变量 $Q, P$ 确定, 故原问题难以求解. 本文考虑其取为固定矩阵时的简化版本, 即问题(1.7). 结合 Zhang-Hager 非单调搜索技术和 Armijo 型线搜索, 本文提出一类新的非单调搜索准则, 并结合乘积流形几何性质, 构造一类适用于求解问题 (1.7) 的黎曼非线性共轭梯度法, 同时给出了算法的全局收敛性分析. 充分的数值实验和数值比较说明了本文算法对于求解问题 (1.7) 的可行性和高效性, 其中数值比较包括与已有的近似交替最小二乘方法的比较, 与现有两类主流非单调线搜索技术的比较, 以及黎曼优化工具箱 Manopt 中已有的一阶和二阶方法的比较. 数值结果表明本文针对问题 (1.7) 所提出的算法, 与已有算法相比, 在迭代解精度、迭代时间或迭代步数上都具有一定的优势.

本文只针对问题 (1.7) 进行数值求解, 如何设计有效算法进一步求解问题 (1.6) 是下一步的研究工作, 可行的研究方案是采用交替更新的迭代思想, 其迭代框架为

1. 给定数据矩阵 $X$ 和描述数据集局部流形结构的类内相似图 $W$ . 给定初始点 $\left(Q_0, P_0\right) \in \mathrm{St}(m, d) \times \mathcal{P}$ , $k:=0$ ;

2. $k=1,2,\cdots$ , 直到满足终止标准;

3. 生成 $F_k, D_k$ ;

4. 通过算法1更新 $(Q_k, P_k)$ , 即

$(Q_k, P_k)=\mathop { {argmin}}\limits_{(Q, P)\in \mathrm{St}(m, d) \times \mathcal{P}} \operatorname{tr}\left(X^T D_k X-2 P^T Q^T X^T F_k X+P^T Q^T X^T D X Q P+\psi P^T P\right).$

但终止标准的建立和算法的收敛性分析值得进一步研究.

参考文献

原文顺序

文献年度倒序

文中引用次数倒序

被引期刊影响因子

[1]

. Least squares linear discriminant analysis//Proceedings of the 24th International Conference on Machine Learning. 2007: 1087-1093

[本文引用: 2]

[2]

, Nie

, Yang

, et al.

Discriminating joint feature analysis for multimedia data understanding

IEEE Transactions on Multimedia, 2012, 14(6): 1662-1672

[3]

, Yang

, Sebe

, et al.

Multimedia event detection using a classifier-specific intermediate representation

IEEE Transactions on Multimedia, 2013, 15(7): 1628-1637

[4]

, Tang

, Yu

, Ye

A shared-subspace learning framework for multi-label classification

ACM Transactions on Knowledge Discovery from Data (TKDD), 2010, 4(2): 1-29

[5]

Seung

H S

, Lee

D D

The manifold ways of perception

Science, 2000, 290(5500): 2268-2269

PMID:11188725 [本文引用: 1]

One of the great puzzles of visual perception is how an image that is in perpetual flux can still be seen by the observer as the same object. In an informative Perspective, Seung and Lee explain the mathematical intricacies of two new algorithms for modeling the variability of perceptual stimuli and other types of high-dimensional data (Tenenbaum et al., and Roweis and Saul).

[6]

Roweis

S T

, Saul

L K

Nonlinear dimensionality reduction by locally linear embedding

Science, 2000, 290(5500): 2323-2326

DOI:10.1126/science.290.5500.2323 PMID:11125150 [本文引用: 1]

Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

[7]

Belkin

, Niyogi

Laplacian eigenmaps and spectral techniques for embedding and clustering//Advances in Neural Information Processing Systems

Cambridge: MIT press, 2001, 14

[8]

Zhang

, Ma

, Tan

On the equivalence of HLLE and LTSA

IEEE Transactions on Cybernetics, 2017, 48(2): 742-753

[9]

Lai

, Mo

, Wong

W K

, et al.

Robust discriminant regression for feature extraction

IEEE Transactions on Cybernetics, 2017, 48(8): 2472-2484

[本文引用: 15]

[10]

Nie

, Huang

, Cai

, Ding

C H

. Efficient and robust feature selection via joint

$L_{2, 1}$

-norms minimization//Advances in Neural Information Processing Systems, 2010: 1813-1821

[11]

Sato

, Sato

Structure-preserving $H^{2}$ optimal model reduction based on the Riemannian trust-region method

IEEE Transactions on Automatic Control, 2017, 63(2): 505-512

[12]

Sato

, Sato

. Riemannian gradient-based online identification method for linear systems with symmetric positive-definite matrix// 2019 IEEE 58th Conference on Decision and Control (CDC), 2019: 3593-3598

[13]

Sato

Riemannian optimal model reduction of linear port-Hamiltonian systems

Automatica, 2018, 93: 428-434

[14]

Sato

Riemannian optimal control and model matching of linear port-Hamiltonian systems

IEEE Transactions on Automatic Control, 2017, 62(12): 6575-6581

[15]

Sato

, Sato

, Damm

Riemannian optimal identification method for linear systems with symmetric positive-definite matrix

IEEE Transactions on Automatic Control, 2020, 65(11): 4493-4508

[16]

Chiang

C Y

, Lin

M M

, Jin

X Q

Riemannian inexact Newton method for structured inverse eigenvalue and singular value problems

BIT Numerical Mathematics, 2019, 59: 675-694

[17]

Ishteva

, Absil

P A

, Huffel

S V

, Lathauwer

L D

Best low multilinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme

SIAM Journal on Matrix Analysis and Applications, 2011, 32(1): 115-135

[18]

Sato

, Iwai

A Riemannian optimization approach to the matrix singular value decomposition

SIAM Journal on Optimization, 2013, 23(1): 188-212

[19]

Wang

, Zhao

, Bai

Z J

Riemannian Newton-CG methods for constructing a positive doubly stochastic matrix from spectral data

Inverse Problems, 2020, 36(11): 115006

[20]

Yao

T T

, Bai

Z J

, Zhao

, Ching

W K

A riemannian fletcher-reeves conjugate gradient method for doubly stochastic inverse eigenvalue problems

SIAM Journal on Matrix Analysis and Applications, 2016, 37(1): 215-234

[21]

Yao

T T

, Bai

Z J

, Jin

X Q

, Zhao

A geometric Gauss-Newton method for least squares inverse eigenvalue problems

BIT Numerical Mathematics, 2020, 60: 825-852

[22]

Zhao

, Jin

X Q

, Bai

Z J

A geometric nonlinear conjugate gradient method for stochastic inverse eigenvalue problems

SIAM Journal on Numerical Analysis, 2016, 54(4): 2015-2035

[23]

Zhao

, Bai

Z J

, Jin

X Q

A Riemannian inexact Newton-CG method for constructing a nonnegative matrix with prescribed realizable spectrum

Numerische Mathematik, 2018, 140(4): 827-855

[24]

Zhao

, Jin

X Q

, Yao

T T

A Riemannian under-determined BFGS method for least squares inverse eigenvalue problems

BIT Numerical Mathematics, 2022, 62(1): 311-337

[25]

Zhao

, Yao

T T

, Bai

Z J

, Jin

X Q

A Riemannian inexact Newton dogleg method for constructing a symmetric nonnegative matrix with prescribed spectrum

Numerical Algorithms, 2023, 92: 1951-1981

[26]

Jiang

Y L

, Xu

K L

Riemannian modified Polak-Ribière-Polyak conjugate gradient order reduced model by tensor techniques

SIAM Journal on Matrix Analysis and Applications, 2020, 41(2): 432-463

[27]

Oviedo

Global convergence of Riemannian line search methods with a Zhang-Hager-type condition

Numerical Algorithms, 2022, 91(3): 1183-1203

[28]

Sato

. Riemannian Optimization and Its Applications. Brelin: Springer, 2021

[29]

Sato

, Iwai

A new, globally convergent Riemannian conjugate gradient method

Optimization, 2015, 64(4): 1011-1031

[30]

Sato

A Dai-Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions

Computational Optimization and Applications, 2016, 64(1): 101-118

[31]

Sakai

, Iiduka

Hybrid Riemannian conjugate gradient methods with global convergence properties

Computational Optimization and Applications, 2020, 77(3): 811-830

[32]

Zhu

A Riemannian conjugate gradient method for optimization on the Stiefel manifold

Computational optimization and Applications, 2017, 67(1): 73-110

[本文引用: 3]

[33]

Vandereycken

Low-rank matrix completion by Riemannian optimization

SIAM Journal on Optimization, 2013, 23(2): 1214-1236

[34]

Zhang

, Hager

W W

A nonmonotone line search technique and its application to unconstrained optimization

SIAM Journal on Optimization, 2004, 14(4): 1043-1056

[本文引用: 2]

[35]

Absil

P A

, Mahony

, Sepulchre

. Optimization Algorithms on Matrix Manifolds. Princeton: Princeton University Press, 2009

[本文引用: 4]

[36]

Ring

, Wirth

Optimization methods on Riemannian manifolds and their application to shape space

SIAM Journal on Optimization, 2012, 22(2): 596-627

[37]

Absil

P A

, Mahony

, Trumpf

. An extrinsic look at the Riemannian Hessian//International Conference on Geometric Science of Information. Berlin: Springer, 2013: 361-368

[本文引用: 2]

[38]

Nocedal

, Wright

S J

Numerical Optimization

Springer, 1999

[39]

Grippo

, Lampariello

, Lucidi

A nonmonotone line search technique for Newton's method

SIAM Journal on Numerical Analysis, 1986, 23(4): 707-716

[40]

Zhang

, Zhou

, Li

Global convergence of a modified Fletcher-Reeves conjugate gradient method with Armijo-type line search

Numerische Mathematik, 2006, 104(4): 561-572

[本文引用: 3]

[41]

王松桂, 吴密霞, 贾忠贞. 矩阵不等式. 北京: 科学出版社, 2006

Wang

S G

, Wu

M X

, Jia

Z Z

. Matrix Inequalities. Beijing: Science Press, 2006

[42]

Wen

, Yin

A feasible method for optimization with orthogonality constraints

Mathematical Programming, 2013, 142(1/2): 397-434

[本文引用: 3]

[43]

J F

, Li

, Vong

S W

, et al.

A Riemannian optimization approach for solving the generalized eigenvalue problem for nonsquare matrix pencils

Journal of Scientific Computing, 2020, 82: 1-43

[44]

Boumal

, Mishra

, Absil

P A

, Sepulchre

Manopt, a Matlab toolbox for optimization on manifolds

Journal of Machine Learning Research, 2014, 15(1): 1455-1459