results for au:Hefny_A in:math

- We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings.
- Jul 22 2015 math.NA arXiv:1507.05844v2The Kaczmarz and Gauss-Seidel methods aim to solve a linear $m \times n$ system $\boldsymbol{X} \boldsymbol{\beta} = \boldsymbol{y}$ by iteratively refining the solution estimate; the former uses random rows of $\boldsymbol{X}$ to update $\boldsymbol{\beta}$ given the corresponding equations and the latter uses random columns of $\boldsymbol{X}$ to update corresponding coordinates in $\boldsymbol{\beta}$. Interest in these methods was recently revitalized by a proof of Strohmer and Vershynin showing linear convergence in expectation for a \textitrandomized Kaczmarz method variant (RK), and a similar result for the randomized Gauss-Seidel algorithm (RGS) was later proved by Lewis and Leventhal. Recent work unified the analysis of these algorithms for the overcomplete and undercomplete systems, showing convergence to the ordinary least squares (OLS) solution and the minimum Euclidean norm solution respectively. This paper considers the natural follow-up to the OLS problem, ridge regression, which solves $(\boldsymbol{X}^* \boldsymbol{X} + \lambda \boldsymbol{I}) \boldsymbol{\beta} = \boldsymbol{X}^* \boldsymbol{y}$. We present particular variants of RK and RGS for solving this system and derive their convergence rates. We compare these to a recent proposal by Ivanov and Zhdanov to solve this system, that can be interpreted as randomly sampling both rows and columns, which we argue is often suboptimal. Instead, we claim that one should always use RGS (columns) when $m > n$ and RK (rows) when $m < n$. This difference in behavior is simply related to the minimum eigenvalue of two related positive semidefinite matrices, $\boldsymbol{X}^* \boldsymbol{X} + \lambda \boldsymbol{I}_n$ and $\boldsymbol{X} \boldsymbol{X}^* + \lambda \boldsymbol{I}_m$ when $m > n$ or $m < n$.
- We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of constraints. We present algorithms and analysis for four key problem scenarios: (i) smooth; (ii) smooth + nonsmooth separable; (iii) asynchronous parallel; and (iv) stochastic. We illustrate empirical behavior of our algorithms by simulation experiments.