# Statistics Theory (stat.TH)

• M-quantile random-effects regression represents an interesting approach for modelling multilevel data when the interest of researchers is focused on the conditional quantiles. When data are based on complex survey designs, sampling weights have to be incorporate in the analysis. A pseudo-likelihood approach for accommodating sampling weights in the M-quantile random-effects regression is presented. The proposed methodology is applied to the Italian sample of the "Program for International Student Assessment 2015" survey in order to study the gender gap in mathematics at various quantiles of the conditional distribution. Findings offer a possible explanation of the low share of females in "Science, Technology, Engineering and Mathematics" sectors.
• In this paper, we consider the situation in which the observations follow an isotonic generalized partly linear model. Under this model, the mean of the responses is modelled, through a link function, linearly on some covariates and nonparametrically on an univariate regressor in such a way that the nonparametric component is assumed to be a monotone function. A class of robust estimates for the monotone nonparametric component and for the regression parameter, related to the linear one, is defined. The robust estimators are based on a spline approach combined with a score function which bounds large values of the deviance. As an application, we consider the isotonic partly linear log--Gamma regression model. Through a Monte Carlo study, we investigate the performance of the proposed estimators under a partly linear log--Gamma regression model with increasing nonparametric component.
• In this paper we consider the problem of finding anomalies in a $d$-dimensional field of independent random variables $\{Y_i\}_{i \in \left\{1,...,n\right\}^d}$, each distributed according to a one-dimensional natural exponential family $\mathcal F = \left\{F_\theta\right\}_{\theta \in\Theta}$. Given some baseline parameter $\theta_0 \in\Theta$, the field is scanned using local likelihood ratio tests to detect from a (large) given system of regions $\mathcal{R}$ those regions $R \subset \left\{1,...,n\right\}^d$ with $\theta_i \neq \theta_0$ for some $i \in R$. We provide a unified methodology which controls the overall family wise error (FWER) to make a wrong detection at a given error rate. Fundamental to our method is a Gaussian approximation of the asymptotic distribution of the underlying multiscale scanning test statistic with explicit rate of convergence. From this, we obtain a weak limit theorem which can be seen as a generalized weak invariance principle to non identically distributed data and is of independent interest. Furthermore, we give an asymptotic expansion of the procedures power, which yields minimax optimality in case of Gaussian observations.
• Estimating the entropy based on data is one of the prototypical problems in distribution property testing and estimation. For estimating the Shannon entropy of a distribution on $S$ elements with independent samples, [Paninski2004] showed that the sample complexity is sublinear in $S$, and [Valiant--Valiant2011] showed that consistent estimation of Shannon entropy is possible if and only if the sample size $n$ far exceeds $\frac{S}{\log S}$. In this paper we consider the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations. We show that: (1) As long as the Markov chain mixes not too slowly, i.e., the relaxation time is at most $O(\frac{S}{\ln^3 S})$, consistent estimation is achievable when $n \gg \frac{S^2}{\log S}$. (2) As long as the Markov chain has some slight dependency, i.e., the relaxation time is at least $1+\Omega(\frac{\ln^2 S}{\sqrt{S}})$, consistent estimation is impossible when $n \lesssim \frac{S^2}{\log S}$. Under both assumptions, the optimal estimation accuracy is shown to be $\Theta(\frac{S^2}{n \log S})$. In comparison, the empirical entropy rate requires at least $\Omega(S^2)$ samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.
• Applied researchers often construct a network from a random sample of nodes in order to infer properties of the parent network. Two of the most widely used sampling schemes are subgraph sampling, where we sample each vertex independently with probability $p$ and observe the subgraph induced by the sampled vertices, and neighborhood sampling, where we additionally observe the edges between the sampled vertices and their neighbors. In this paper, we study the problem of estimating the number of motifs as induced subgraphs under both models from a statistical perspective. We show that: for any connected $h$ on $k$ vertices, to estimate $s=\mathsf{s}(h,G)$, the number of copies of $h$ in the parent graph $G$ of maximum degree $d$, with a multiplicative error of $\epsilon$, (a) For subgraph sampling, the optimal sampling ratio $p$ is $\Theta_{k}(\max\{ (s\epsilon^2)^{-\frac{1}{k}}, \; \frac{d^{k-1}}{s\epsilon^{2}} \})$, achieved by Horvitz-Thompson type of estimators. (b) For neighborhood sampling, we propose a family of estimators, encompassing and outperforming the Horvitz-Thompson estimator and achieving the sampling ratio $O_{k}(\min\{ (\frac{d}{s\epsilon^2})^{\frac{1}{k-1}}, \; \sqrt{\frac{d^{k-2}}{s\epsilon^2}}\})$. This is shown to be optimal for all motifs with at most $4$ vertices and cliques of all sizes. The matching minimax lower bounds are established using certain algebraic properties of subgraph counts. These results quantify how much more informative neighborhood sampling is than subgraph sampling, as empirically verified by experiments on both synthetic and real-world data. We also address the issue of adaptation to the unknown maximum degree, and study specific problems for parent graphs with additional structures, e.g., trees or planar graphs.
• We study three different quasi-symmetry models and three different mixture models of $n\times n\times n$ tensors for modeling rater agreement data. For these models we give a geometric description of the associated varieties and we study their invariants distinguishing between the case $n=2$ and the case $n>2$. Finally, for the two models for pairwise agreement we state some results about the pairwise Cohen's $\kappa$ coefficients.

Alessandro Dec 09 2015 01:12 UTC

Hey, I've already seen this title! http://arxiv.org/abs/1307.0401

Richard Kueng Mar 08 2015 22:02 UTC

Neither, Frédéric! Replacing fidelity by superfidelity still requires optimizing over all density matrices. However, the Birkhoff-von Neumann Theorem (see Lemma 1) allows for further restricting this optimization to n scalar variables w.l.o.g.---Theorem 2. Arguably, this greatly simplifies the geome

...(continued)
Frédéric Grosshans Mar 05 2015 11:31 UTC

I fell for that clickbait title and read the paper. I still don’t get why von Neumann didn't want us to know about this weird trick? And which weird trick? The use of superfidelity or the use of non-physical density matrices like $\sigma^\sharp$?

Noon van der Silk Mar 03 2015 03:20 UTC

I took the liberty of uploading the IPython notebook as a github [gist](https://gist.github.com), so it's viewable [here](http://nbviewer.ipython.org/urls/gist.githubusercontent.com/silky/b14fa42c6d5475a3a724/raw/887c19fb04581f1a33f9d03370e4b7b3a33c2ea8/ferrie_kueng_bayes_est_fid.ipynb).