Speaker:
Institution:
Time:
Host:
Location:
Modern machine learning and scientific computing pose optimization challenges of unprecedented scale and complexity, demanding fundamental advances in both theory and algorithmic design for nonconvex optimization. This talk presents recent advances that address these challenges by exploiting matrix and tensor structures, integrating adaptivity, and leveraging sampling techniques. In the first part, I introduce AdaGO, a new optimizer that combines orthogonalized momentum updates with adaptive learning rates. Building on the recent success of the Muon optimizer in large language model training, AdaGO incorporates an AdaGrad-type stepsize that scales orthogonalized update directions by accumulated past gradient norms. This design preserves the structural advantage of orthogonalized updates while adapting stepsizes to noise and the optimization landscape. We establish optimal convergence rates for smooth nonconvex functions and demonstrate improved performance over Muon and Adam on classification and regression tasks. The second part focuses on zeroth-order global optimization. We develop a theoretical framework for inexact proximal point (IPP) methods for global optimization, establishing convergence guarantees when proximal operators are estimated either deterministically or stochastically. The quadratic regularization in the proximal operator induces a concentrated Gibbs measure landscape that facilitates effective sampling. We propose two sampling-based practical algorithms: TT-IPP, which constructs a low-rank tensor-train (TT) approximation using a randomized TT cross algorithm, and MC-IPP, which employs Monte Carlo integration. Both IPP algorithms adaptively balance efficiency and accuracy in proximal operator estimation, achieving strong performance surpassing established solvers across diverse benchmark functions and applications. Together, these works advance structure-aware adaptive first-order optimization for deep learning and zeroth-order global optimization in scientific computing.
