# Statistics Theory (math.ST)

• We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we characterize the minimax regret up to log factors by proving an upper bound matching a previously known lower bound. In a partial feedback model motivated by second-price auctions, we prove upper bounds for Lipschitz and semi-Lipschitz losses that improve on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound.
• We put forward a framework for nonparametric density estimation in situations where the sample is supplemented by information and assumptions about shape, support, continuity, slope, location of modes, density values, etc. These supplements are incorporated as constraints that in conjunction with a maximum likelihood criterion lead to constrained infinite-dimensional optimization problems that we formulate over spaces of semicontinuous functions. These spaces, when equipped with an appropriate metric, offer a series of advantages including simple conditions for existence of estimators and their limits and, in particular, guarantee the convergence of modes of densities. Relying on the approximation theory---epi-convergence---for optimization problems, we provide general conditions under which estimators subject to essentially arbitrary constraints are consistent and illustrate the framework with a number of examples that span classical and novel shape constraints.
• In this paper we investigate the problem of detecting dynamically evolving signals. We model the signal as an $n$ dimensional vector that is either zero or has $s$ non-zero components. At each time step $t\in \mathbb{N}$ the non-zero components change their location independently with probability $p$. The statistical problem is to decide whether the signal is a zero vector or in fact it has non-zero components. This decision is based on $m$ noisy observations of individual signal components collected at times $t=1,\ldots,m$. We consider two different sensing paradigms, namely adaptive and non-adaptive sensing. For non-adaptive sensing the choice of components to measure has to be decided before the data collection process started, while for adaptive sensing one can adjust the sensing process based on observations collected earlier. We characterize the difficulty of this detection problem in both sensing paradigms in terms of the aforementioned parameters, with special interest to the speed of change of the active components. In addition we provide an adaptive sensing algorithm for this problem and contrast its performance to that of non-adaptive detection algorithms.
• We study the problem of using i.i.d. samples from an unknown multivariate probability distribution $p$ to estimate the mutual information of $p$. This problem has recently received attention in two settings: (1) where $p$ is assumed to be Gaussian and (2) where $p$ is assumed only to lie in a large nonparametric smoothness class. Estimators proposed for the Gaussian case converge in high dimensions when the Gaussian assumption holds, but are brittle, failing dramatically when $p$ is not Gaussian. Estimators proposed for the nonparametric case fail to converge with realistic sample sizes except in very low dimensions. As a result, there is a lack of robust mutual information estimators for many realistic data. To address this, we propose estimators for mutual information when $p$ is assumed to be a nonparanormal (a.k.a., Gaussian copula) model, a semiparametric compromise between Gaussian and nonparametric extremes. Using theoretical bounds and experiments, we show these estimators strike a practical balance between robustness and scaling with dimensionality.
• We consider the challenging problem of statistical inference for exponential-family random graph models given one observation of a random graph with complex dependence (e.g., transitivity). To facilitate statistical inference, we endow random graphs with additional structure. The basic idea is that random graphs are composed of subgraphs with complex dependence. We have shown elsewhere that when the composition of random graphs is known, $M$-estimators of canonical and curved exponential families with complex dependence are consistent. In practice, the composition is known in some applications, but is unknown in others. If the composition is unknown, the first and foremost question is whether it can be recovered. The main consistency results of the paper show that it is possible to do so as long as exponential families satisfy weak dependence and smoothness conditions. These results confirm that exponential-family random graph models with additional structure constitute a promising direction of statistical network analysis.
• The Allan Variance (AV) is a widely used quantity in areas focusing on error measurement as well as in the general analysis of variance for autocorrelated processes in domains such as engineering and, more specifically, metrology. The form of this quantity is widely used to detect noise patterns and indications of stability within signals. However, the properties of this quantity are not known for commonly occurring processes whose covariance structure is non-stationary and, in these cases, an erroneous interpretation of the AV could lead to misleading conclusions. This paper generalizes the theoretical form of the AV to some non-stationary processes while at the same time being valid also for weakly stationary processes. Some simulation examples show how this new form can help to understand the processes for which the AV is able to distinguish these from the stationary cases and hence allow for a better interpretation of this quantity in applied cases.

Alessandro Dec 09 2015 01:12 UTC

Hey, I've already seen this title! http://arxiv.org/abs/1307.0401

Richard Kueng Mar 08 2015 22:02 UTC

Neither, Frédéric! Replacing fidelity by superfidelity still requires optimizing over all density matrices. However, the Birkhoff-von Neumann Theorem (see Lemma 1) allows for further restricting this optimization to n scalar variables w.l.o.g.---Theorem 2. Arguably, this greatly simplifies the geome

...(continued)
Frédéric Grosshans Mar 05 2015 11:31 UTC

I fell for that clickbait title and read the paper. I still don’t get why von Neumann didn't want us to know about this weird trick? And which weird trick? The use of superfidelity or the use of non-physical density matrices like $\sigma^\sharp$?

Noon van der Silk Mar 03 2015 03:20 UTC

I took the liberty of uploading the IPython notebook as a github [gist](https://gist.github.com), so it's viewable [here](http://nbviewer.ipython.org/urls/gist.githubusercontent.com/silky/b14fa42c6d5475a3a724/raw/887c19fb04581f1a33f9d03370e4b7b3a33c2ea8/ferrie_kueng_bayes_est_fid.ipynb).