- This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest $\mathbf{x}^{\natural}\in\mathbb{R}^{n}$ from $m$ quadratic equations/samples $y_{i}=(\mathbf{a}_{i}^{\top}\mathbf{x}^{\natural})^{2}$, $1\leq i\leq m$. This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficiency of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent --- when randomly initialized --- yields an $\epsilon$-accurate solution in $O\big(\log n+\log(1/\epsilon)\big)$ iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.
- Mar 22 2018 math.CO arXiv:1803.07694v1Consider the following two ways to colour the vertices of a graph where the requirement that adjacent vertices get distinct colours is relaxed. A colouring has "defect" $d$ if each monochromatic component has maximum degree at most $d$. A colouring has "clustering" $c$ if each monochromatic component has at most $c$ vertices. This paper surveys research on these types of colourings, where the first priority is to minimise the number of colours, with small defect or small clustering as a secondary goal. List colouring variants are also considered. The following graph classes are studied: outerplanar graphs, planar graphs, graphs embeddable in surfaces, graphs with given maximum degree, graphs with given maximum average degree, graphs excluding a given subgraph, graphs with linear crossing number, linklessly or knotlessly embeddable graphs, graphs with given Colin de Verdière parameter, graphs with given circumference, graphs excluding a fixed graph as an immersion, graphs with given thickness, graphs with given stack- or queue-number, graphs excluding $K_t$ as a minor, graphs excluding $K_{s,t}$ as a minor, and graphs excluding an arbitrary graph $H$ as a minor. Several open problems are discussed.
- This paper considers the word problem for free inverse monoids of finite rank from a language theory perspective. It is shown that no free inverse monoid has context-free word problem; that the word problem of the free inverse monoid of rank $1$ is both $2$-context-free (an intersection of two context-free languages) and ET0L; that the co-word problem of the free inverse monoid of rank $1$ is context-free; and that the word problem of a free inverse monoid of rank greater than $1$ is not poly-context-free.
- We study the Wasserstein metric $W_p$, a notion of distance between two probability distributions, from the perspective of Fourier Analysis and discuss applications. In particular, we bound the Earth Mover Distance $W_1$ between the distribution of quadratic residues in a finite field $\mathbb{F}_p$ and uniform distribution by $\lesssim p^{-1/2}$ (the Polya-Vinogradov inequality implies $\lesssim p^{-1/2} \log{p}$). We also show for continuous $f:\mathbb{T} \rightarrow \mathbb{R}_{}$ with mean value 0 $$ (\mboxnumber of roots of~f) ⋅\left( \sum_k=1^∞ \frac |\widehatf(k)|^2k^2\right)^\frac12 ≳\frac\|f\|^2_L^1(\mathbbT)\|f\|_L^∞(\mathbbT).$$ Moreover, we show that for a Laplacian eigenfunction $-\Delta_g \phi_{\lambda} = \lambda \phi_{\lambda}$ on a compact Riemannian manifold $W_p\left(\max\left\{\phi_{\lambda}, 0\right\}dx, \max\left\{-\phi_{\lambda}, 0\right\} dx\right) \lesssim_p \sqrt{\log{\lambda}/\lambda} \|\phi_{\lambda}\|_{L^1}^{1/p}$ which is at most a factor $\sqrt{\log{\lambda}}$ away from sharp. Several other problems are discussed.
- We interpret part of the experimental results of Shwartz-Ziv and Tishby [2017]. Inspired by these results, we established a conjecture of the dynamics of the machinary of deep neural network. This conjecture can be used to explain the counterpart result by Saxe et al. [2018].
- In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size $O(\mu^2)$ around the minimizer rather than $O(\mu)$. Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations.
- Applications in machine learning, optimization, and control require the sequential selection of a few system elements, such as sensors, data, or actuators, to optimize the system performance across multiple time steps. However, in failure-prone and adversarial environments, sensors get attacked, data get deleted, and actuators fail. Thence, traditional sequential design paradigms become insufficient and, in contrast, resilient sequential designs that adapt against system-wide attacks, deletions, or failures become important. In general, resilient sequential design problems are computationally hard. Also, even though they often involve objective functions that are monotone and (possibly) submodular, no scalable approximation algorithms are known for their solution. In this paper, we provide the first scalable algorithm, that achieves the following characteristics: system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks, deletions, or failures; adaptiveness, i.e., at each time step, the algorithm selects system elements based on the history of inflicted attacks, deletions, or failures; and provable approximation performance, i.e., the algorithm guarantees for monotone objective functions a solution close to the optimal. We quantify the algorithm's approximation performance using a notion of curvature for monotone (not necessarily submodular) set functions. Finally, we support our theoretical analyses with simulated experiments, by considering a control-aware sensor scheduling scenario, namely, sensing-constrained robot navigation.
- The existence of a unique Augustin mean is established for any positive order and probability mass function on the input set. The Augustin mean is shown to be the unique fixed point of an operator defined in terms of the order and the input distribution. The Augustin information is shown to be continuously differentiable in the order. For any channel and any convex constraint set with finite Augustin capacity, the existence of a unique Augustin center and associated van Erven-Harremoes bound are established.The Augustin-Legendre (A-L) information, capacity, center, and radius are introduced and the latter three is proved to be equal to the corresponding Renyi-Gallager quantities. The equality of the A-L capacity to the A-L radius for arbitrary channels and the existence of a unique A-L center for channels with finite A-L capacity are established. For all interior points of the feasible set of cost constraints, the cost constrained Augustin capacity and center are expressed in terms the A-L capacity and center. Certain shift invariant families of probability measures and certain Gaussian channels are analyzed as examples.
- Mar 22 2018 math.AG arXiv:1803.07923v1Given a complex manifold endowed with a $\mathbb{C}^\times$-action and a DQ-algebra equipped with a compatible holomorphic Frobenius action (F-action), we prove that if the $\mathbb{C}^\times$-action is free and proper, then the category of F-equivariant DQ-modules is equivalent to the category of modules over the sheaf of invariant sections of the DQ-algebra. As an application, we deduce the codimension three conjecture for formal microdifferential modules from the one for DQ-modules on a symplectic manifold.
- Mar 22 2018 math.AP arXiv:1803.07914v1We study the stabilization issue of the Benjamin-Bona-Mahony (BBM) equation on a finite star-shaped network with a damping term acting on the central node. In a first time, we prove the well-posedness of this system. Then thanks to the frequency domain method, we get the asymptotic stabilization result.
- Mar 22 2018 math.FA arXiv:1803.07912v1In this paper we prove an $n$th root test for series as well as a Cauchy-Hadamard type formula and Abel's' theorem for power series on universally complete Archimedean complex vector lattices. These results are aimed at developing an alternative approach to the classical theory of complex series and power series using the topology of order convergence.
- Mar 22 2018 math.OC arXiv:1803.07872v1We study a differential game where two players separately control their own dynamics, pay a running cost, and moreover pay an exit cost (quitting the game) when they leave a fixed domain. In particular, each player has its own domain and the exit cost consists of three different exit costs, depending whether either the first player only leaves its domain, or the second player only leaves its domain, or they both simultaneously leave their own domain. We prove that, under suitable hypotheses, the lower and upper value are continuous and are, respectively, the unique viscosity solution of a suitable Dirichlet problem for a Hamilton-Jacobi-Isaacs equation. The continuity of the values relies on the existence of suitable non-anticipative strategies respecting the domain-constraint. This problem is also treated in this work.
- Mar 22 2018 math.CV arXiv:1803.07862v1We generalize the notion of tame discrete sets introduced by Rosay and Rudin from complex-Euclidean space to arbitrary complex manifolds and establish their basic properties. We show that complex-linear algebraic groups different from the complex line or the punctured complex line contain tame discrete sets.
- In this paper, we describe a new Niederreiter cryptosystem based on quasi-cyclic $\frac{m-1}{m}$ codes that is quantum-secure. This new cryptosystem has good transmission rate compared to the one using binary Goppa codes and uses smaller keys.
- In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method. We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant step size choice). More importantly, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size, which is a comparable performance to a centralized stochastic gradient algorithm. Numerical examples further demonstrate the effectiveness of the method.
- We prove that unless P=NP, there exists no polynomial time (or even pseudo-polynomial time) algorithm that can test whether the optimal value of a nonlinear optimization problem where the objective and constraints are given by low-degree polynomials is attained. If the degrees of these polynomials are fixed, our results along with previously-known "Frank-Wolfe type" theorems imply that exactly one of two cases can occur: either the optimal value is attained on every instance, or it is strongly NP-hard to distinguish attainment from non-attainment. We also show that testing for some well-known sufficient conditions for attainment of the optimal value, such as coercivity of the objective function and closedness and boundedness of the feasible set, is strongly NP-hard. As a byproduct, our proofs imply that testing the Archimedean property of a quadratic module is strongly NP-hard, a property that is of independent interest to the convergence of the Lasserre hierarchy. Finally, we give semidefinite programming (SDP)-based sufficient conditions for attainment of the optimal value, in particular a new characterization of coercive polynomials that lends itself to an SDP hierarchy.
- Smoothing splines can be thought of as the posterior mean of a Gaussian process regression in a certain limit. By constructing a reproducing kernel Hilbert space with an appropriate inner product, the Bayesian form of the V-spline is derived when the penalty term is a fixed constant instead of a function. An extension to the usual generalized cross-validation formula is utilized to find the optimal V-spline parameters.
- We present criteria for establishing a triangulation of a manifold. Given a manifold M, a simplicial complex A, and a map H from the underlying space of A to M, our criteria are presented in local coordinate charts for M, and ensure that H is a homeomorphism. These criteria do not require a differentiable structure, or even an explicit metric on M. No Delaunay property of A is assumed. The result provides a triangulation guarantee for algorithms that construct a simplicial complex by working in local coordinate patches. Because the criteria are easily verified in such a setting, they are expected to be of general use.
- We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is only required to keep the sufficient statistics in memory. This characterization is achieved by bringing the full power of the Burkholder Method --- originally developed for certifying probabilistic martingale inequalities --- to bear on the online learning setting. To demonstrate the scope and effectiveness of the Burkholder method, we develop a novel online strategy for matrix prediction that attains a regret bound corresponding to the variance term in matrix concentration inequalities. We also present a linear-time/space prediction strategy for parameter free supervised learning with linear classes and general smooth norms.
- Mar 22 2018 math.AG arXiv:1803.07596v1We define the notion of Mumford divisors, argue that they are the natural divisors to study on reduced but non-normal varieties and prove a structure theorem for the Mumford class group.
- In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider a new distributed gradient-based method where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents' objective functions. From the viewpoint of an agent, the information about the decision variable is pushed to the neighbors, while the information about the gradients is pulled from the neighbors (hence giving the name "push-pull gradient method"). The method unifies the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the algorithm converges linearly for strongly convex and smooth objective functions over a directed static network. In our numerical test, the algorithm performs well even for time-varying directed networks.
- The PWP map was introduced by the second author as a tool for ranking nodes in networks. In this work we extend this technique so that it can be used to rank links as well. Applying the Girvan-Newman algorithm a ranking method on links induces a deconstruction method for networks, therefore we obtain new methods for finding clustering and core-periphery structures on networks.
- Mar 22 2018 math.DS arXiv:1803.08032v1
- Mar 22 2018 math.OC arXiv:1803.08031v1
- Mar 22 2018 math.GT arXiv:1803.08025v1
- Mar 22 2018 math.NT arXiv:1803.08023v1
- Mar 22 2018 math.OA arXiv:1803.08012v1
- Mar 22 2018 math.GT arXiv:1803.08004v1
- Mar 22 2018 math.AG arXiv:1803.07992v1
- Mar 22 2018 math.AP arXiv:1803.07988v1
- Mar 22 2018 math.OC arXiv:1803.07984v1
- Mar 22 2018 math.OC arXiv:1803.07966v1
- Mar 22 2018 math.CO arXiv:1803.07963v1
- Mar 22 2018 math.OC arXiv:1803.07959v1
- Mar 22 2018 math.CT arXiv:1803.07956v1
- Mar 22 2018 math.RA arXiv:1803.07953v1
- Mar 22 2018 math.RA arXiv:1803.07941v1
- Mar 22 2018 math.RA arXiv:1803.07939v1