- We study the Wasserstein natural gradient in parametric statistical models with continuous sample space. Our approach is to pull back the $L^2$-Wasserstein metric tensor in probability density space to parameter space, under which the parameter space become a Riemannian manifold, named the Wasserstein statistical manifold. The gradient flow and natural gradient descent method in parameter space are then derived. When parameterized densities lie in $\bR$, we show the induced metric tensor establishes an explicit formula. Computationally, optimization problems can be accelerated by the proposed Wasserstein natural gradient descent, if the objective function is the Wasserstein distance. Examples are presented to demonstrate its effectiveness in several parametric statistical models.
- We show that it is possible to obtain an $O(\epsilon^{-4/3})$ runtime --- including computational cost --- for finding $\epsilon$-stationary points of nonconvex functions using cutting plane methods. This improves on the best known epsilon dependence achieved by cubic regularized Newton of $O(\epsilon^{-3/2})$ as proved by Nesterov and Polyak (2006)\nocitenesterov2006cubic. Our techniques utilize the convex until proven guilty principle proposed by Carmon, Duchi, Hinder, and Sidford (2017)\nocitecarmon2017convex.
- May 23 2018 math.OC arXiv:1805.08756v1We prove that a simple Sequential Quadratic Programming (SQP) algorithm for equality constrained optimization has local linear convergence with rate $1 - 1/\kappa_R$, where $\kappa_R$ is the condition number of the Riemannian Hessian. Our analysis builds on insights from Riemannian optimization and indicates that first-order Riemannian algorithms and "simple" SQP algorithms have nearly identical local behavior. Unlike Riemannian algorithms, SQP avoids calculating the projection or retraction of the points onto the manifold for each iterates. All the SQP iterates will automatically be quadratically close the the manifold near the local minimizer.
- We consider a new stochastic gradient descent algorithm for efficiently solving general min-max optimization problems that arise naturally in distributionally robust learning. By focusing on the entire dataset, current approaches do not scale well. We address this issue by initially focusing on a subset of the data and progressively increasing this support to statistically cover the entire dataset.
- May 23 2018 math.OC arXiv:1805.08207v1Gradient descent generalises naturally to Riemannian manifolds, and to hyperbolic $n$-space, in particular. Namely, having calculated the gradient at the point on the manifold representing the model parameters, the updated point is obtained by travelling along the geodesic passing in the direction of the gradient. Previous works employing optimisation in hyperbolic space have not attempted this procedure, however, employing instead various approximations to avoid a calculation that was considered to be too complicated. In this work, we demonstrate that in the hyperboloid model of hyperbolic space, the necessary calculations are in fact straight-forward. The advantages of the approach are then both illustrated and quantified for the optimisation problem of computing the Fréchet mean (the analogue of the barycentre) of points in hyperbolic space.
- May 23 2018 math.OC arXiv:1805.08537v1