results for au:Nikolov_A in:math

- We study the $A$-optimal design problem where we are given vectors $v_1,\ldots,v_n\in\mathbb{R}^d$, an integer $k\geq d$, and the goal is to select a set $S$ of $k$ vectors that minimizes the trace of $(\sum_{i\in S}v_iv_i^\top)^{-1}$. Traditionally, the problem is an instance of optimal design of experiments in statistics where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick $k$ of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks, sparse least squares regression, feature selection for $k$-means clustering, and matrix approximation. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for $A$-optimal design. Given a matrix, proportional volume sampling picks a set of columns $S$ of size $k$ with probability proportional to $\mu(S)$ times $\det(\sum_{i\in S}v_iv_i^\top)$ for some measure $\mu$. Our main result is to show the approximability of the $A$-optimal design problem can be reduced to approximate independence properties of the measure $\mu$. We appeal to hard-core distributions as candidate distributions $\mu$ that allow us to obtain improved approximation algorithms for the $A$-optimal design. Our results include a $d$-approximation when $k=d$, an $(1+\epsilon)$-approximation when $k=\Omega\left(\frac{d}{\epsilon}+\frac{1}{\epsilon^2}\log\frac{1}{\epsilon}\right)$ and $\frac{k}{k-d+1}$-approximation when repetitions of vectors are allowed in the solution. We consider generalization of the problem for $k\leq d$ and obtain a $k$-approximation. The last result implies a restricted invertibility principle for the harmonic mean of singular values. We also show that the problem is $\mathsf{NP}$-hard to approximate within a fixed constant when $k=d$.
- It is well-known that for every $N \geq 1$ and $d \geq 1$ there exist point sets $x_1, \dots, x_N \in [0,1]^d$ whose discrepancy with respect to the Lebesgue measure is of order at most $(\log N)^{d-1} N^{-1}$. In a more general setting, the first author proved together with Josef Dick that for any normalized measure $\mu$ on $[0,1]^d$ there exist points $x_1, \dots, x_N$ whose discrepancy with respect to $\mu$ is of order at most $(\log N)^{(3d+1)/2} N^{-1}$. The proof used methods from combinatorial mathematics, and in particular a result of Banaszczyk on balancings of vectors. In the present note we use a version of the so-called transference principle together with recent results on the discrepancy of red-blue colorings to show that for any $\mu$ there even exist points having discrepancy of order at most $(\log N)^{d-\frac12} N^{-1}$, which is almost as good as the discrepancy bound in the case of the Lebesgue measure.
- Combinatorial discrepancy is a complexity measure of a collection of sets which quantifies how well the sets in the collection can be simultaneously balanced. More precisely, we are given an n-point set $P$, and a collection $\mathcal{F} = \{F_1, ..., F_m\}$ of subsets of $P$, and our goal is color $P$ with two colors, red and blue, so that the maximum over the $F_i$ of the absolute difference between the number of red elements and the number of blue elements (the discrepancy) is minimized. Combinatorial discrepancy has many applications in mathematics and computer science, including constructions of uniformly distributed point sets, and lower bounds for data structures and private data analysis algorithms. We investigate the combinatorial discrepancy of geometrically defined systems, in which $P$ is an n-point set in $d$-dimensional space ,and $\mathcal{F}$ is the collection of subsets of $P$ induced by dilations and translations of a fixed convex polytope $B$. Such set systems include systems of sets induced by axis-aligned boxes, whose discrepancy is the subject of the well known Tusnady problem. We prove new discrepancy upper and lower bounds for such set systems by extending the approach based on factorization norms previously used by the author and Matousek. We improve the best known upper bound for the Tusnady problem by a logarithmic factor, using a result of Banaszczyk on signed series of vectors. We extend this improvement to any arbitrary convex polytope $B$ by using a decomposition due to Matousek. Using Fourier analytic techniques, we also prove a nearly matching discrepancy lower bound for sets induced by any fixed bounded polytope $B$ satisfying a certain technical condition. We also outline applications of our results to geometric discrepancy, data structure lower bounds, and differential privacy.
- An important theorem of Banaszczyk (Random Structures & Algorithms `98) states that for any sequence of vectors of $\ell_2$ norm at most $1/5$ and any convex body $K$ of Gaussian measure $1/2$ in $\mathbb{R}^n$, there exists a signed combination of these vectors which lands inside $K$. A major open problem is to devise a constructive version of Banaszczyk's vector balancing theorem, i.e. to find an efficient algorithm which constructs the signed combination. We make progress towards this goal along several fronts. As our first contribution, we show an equivalence between Banaszczyk's theorem and the existence of $O(1)$-subgaussian distributions over signed combinations. For the case of symmetric convex bodies, our equivalence implies the existence of a universal signing algorithm (i.e. independent of the body), which simply samples from the subgaussian sign distribution and checks to see if the associated combination lands inside the body. For asymmetric convex bodies, we provide a novel recentering procedure, which allows us to reduce to the case where the body is symmetric. As our second main contribution, we show that the above framework can be efficiently implemented when the vectors have length $O(1/\sqrt{\log n})$, recovering Banaszczyk's results under this stronger assumption. More precisely, we use random walk techniques to produce the required $O(1)$-subgaussian signing distributions when the vectors have length $O(1/\sqrt{\log n})$, and use a stochastic gradient ascent method to implement the recentering procedure for asymmetric bodies.
- We show that every symmetric normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every $n$, $d = n^{o(1)}$, and every $d$-dimensional symmetric norm $\|\cdot\|$, there exists a data structure for $\mathrm{poly}(\log \log n)$-approximate nearest neighbor search over $\|\cdot\|$ for $n$-point datasets achieving $n^{o(1)}$ query time and $n^{1+o(1)}$ space. The main technical ingredient of the algorithm is a low-distortion embedding of a symmetric norm into a low-dimensional iterated product of top-$k$ norms. We also show that our techniques cannot be extended to general norms.
- The maximum volume $j$-simplex problem asks to compute the $j$-dimensional simplex of maximum volume inside the convex hull of a given set of $n$ points in $\mathbb{Q}^d$. We give a deterministic approximation algorithm for this problem which achieves an approximation ratio of $e^{j/2 + o(j)}$. The problem is known to be $\mathrm{NP}$-hard to approximate within a factor of $c^{j}$ for some constant $c > 1$. Our algorithm also gives a factor $e^{j + o(j)}$ approximation for the problem of finding the principal $j\times j$ submatrix of a rank $d$ positive semidefinite matrix with the largest determinant. We achieve our approximation by rounding solutions to a generalization of the $D$-optimal design problem, or, equivalently, the dual of an appropriate smallest enclosing ellipsoid problem. Our arguments give a short and simple proof of a restricted invertibility principle for determinants.
- Aug 07 2014 math.CO arXiv:1408.1376v2The $\gamma_2$ norm of a real $m\times n$ matrix $A$ is the minimum number $t$ such that the column vectors of $A$ are contained in a $0$-centered ellipsoid $E\subseteq\mathbb{R}^m$ which in turn is contained in the hypercube $[-t, t]^m$. We prove that this classical quantity approximates the \emphhereditary discrepancy $\mathrm{herdisc}\ A$ as follows: $\gamma_2(A) = {O(\log m)}\cdot \mathrm{herdisc}\ A$ and $\mathrm{herdisc}\ A = O(\sqrt{\log m}\,)\cdot\gamma_2(A) $. Since $\gamma_2$ is polynomial-time computable, this gives a polynomial-time approximation algorithm for hereditary discrepancy. Both inequalities are shown to be asymptotically tight. We then demonstrate on several examples the power of the $\gamma_2$ norm as a tool for proving lower and upper bounds in discrepancy theory. Most notably, we prove a new lower bound of $\Omega(\log^{d-1} n)$ for the \emph$d$-dimensional Tusnády problem, asking for the combinatorial discrepancy of an $n$-point set in $\mathbb{R}^d$ with respect to axis-parallel boxes. For $d>2$, this improves the previous best lower bound, which was of order approximately $\log^{(d-1)/2}n$, and it comes close to the best known upper bound of $O(\log^{d+1/2}n)$, for which we also obtain a new, very simple proof.
- We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by $n^{1/O(\log \log n)}$. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube $\{0, 1\}^d$.
- The Komlos conjecture in discrepancy theory states that for some constant K and for any m by n matrix A whose columns lie in the unit ball there exists a +/- 1 vector x such that the infinity norm of Ax is bounded above by K. This conjecture also implies the Beck-Fiala conjecture on the discrepancy of bounded degree hypergraphs. Here we prove a natural relaxation of the Komlos conjecture: if the columns of A are assigned unit real vectors rather than +/- 1 then the Komlos conjecture holds with K=1. Our result rules out the possibility of a counterexample to the conjecture based on semidefinite programming. It also opens the way to proving tighter efficient (polynomial-time computable) upper bounds for the conjecture using semidefinite programming techniques.