- Source localization and spectral estimation are among the most fundamental problems in statistical and array signal processing. Methods which rely on the orthogonality of the signal and noise subspaces, such as Pisarenko's method, MUSIC, and root-MUSIC are some of the most widely used algorithms to solve these problems. As a common feature, these methods require both apriori knowledge of the number of sources, and an estimate of the noise subspace. Both requirements are complicating factors to the practical implementation of the algorithms, and when not satisfied exactly, can potentially lead to severe errors. In this paper, we propose a new localization criterion based on the algebraic structure of the noise subspace that is described for the first time to the best of our knowledge. Using this criterion and the relationship between the source localization problem and the problem of computing the greatest common divisor (GCD), or more practically approximate GCD, for polynomials, we propose two algorithms which adaptively learn the number of sources and estimate their locations. Simulation results show a significant improvement over root-MUSIC in challenging scenarios such as closely located sources, both in terms of detection of the number of sources and their localization over a broad and practical range of SNRs. Further, no performance sacrifice in simple scenarios is observed.
- We study the Wasserstein natural gradient in parametric statistical models with continuous sample space. Our approach is to pull back the $L^2$-Wasserstein metric tensor in probability density space to parameter space, under which the parameter space become a Riemannian manifold, named the Wasserstein statistical manifold. The gradient flow and natural gradient descent method in parameter space are then derived. When parameterized densities lie in $\bR$, we show the induced metric tensor establishes an explicit formula. Computationally, optimization problems can be accelerated by the proposed Wasserstein natural gradient descent, if the objective function is the Wasserstein distance. Examples are presented to demonstrate its effectiveness in several parametric statistical models.
- Despite the remarkable successes of generative adversarial networks (GANs) in many applications, theoretical understandings of their performance is still limited. In this paper, we present a simple shallow GAN model fed by high-dimensional input data. The dynamics of the training process of the proposed model can be exactly analyzed in the high-dimensional limit. In particular, by using the tool of scaling limits of stochastic processes, we show that the macroscopic quantities measuring the quality of the training process converge to a deterministic process that is characterized as the unique solution of a finite-dimensional ordinary differential equation (ODE). The proposed model is simple, but its training process already exhibits several different phases that can mimic the behaviors of more realistic GAN models used in practice. Specifically, depending on the choice of the learning rates, the training process can reach either a successful, a failed, or an oscillating phase. By studying the steady-state solutions of the limiting ODEs, we obtain a phase diagram that precisely characterizes the conditions under which each phase takes place. Although this work focuses on a simple GAN model, the analysis methods developed here might prove useful in the theoretical understanding of other variants of GANs with more advanced training algorithms.
- This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of \em straggling processors. To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method. This allows us to obtain convergence guarantees for the proposed solution. Furthermore, the proposed moment encoding based method is shown to outperform the existing schemes in a real distributed computing setup.
- The celebrated Monte Carlo method estimates a quantity that is expensive to compute by random sampling. We propose adaptive Monte Carlo optimization: a general framework for discrete optimization of an expensive-to-compute function by adaptive random sampling. Applications of this framework have already appeared in machine learning but are tied to their specific contexts and developed in isolation. We take a unified view and show that the framework has broad applicability by applying it on several common machine learning problems: $k$-nearest neighbors, hierarchical clustering and maximum mutual information feature selection. On real data we show that this framework allows us to develop algorithms that confer a gain of a magnitude or two over exact computation. We also characterize the performance gain theoretically under regularity assumptions on the data that we verify in real world data. The code is available at https://github.com/govinda-kamath/combinatorial_MAB.