Statistics (stat)

• In this article, we present an orthogonal basis expansion method for solving stochastic differential equations with a path-independent solution of the form $X_{t}=\phi(t,W_{t})$. For this purpose, we define a Hilbert space and construct an orthogonal basis for this inner product space with the aid of 2D-Hermite polynomials. With considering $X_{t}$ as orthogonal basis expansion, this method is implemented and the expansion coefficients are obtained by solving a system of nonlinear integro-differential equations. The strength of such a method is that expectation and variance of the solution is computed by these coefficients directly. Eventually, numerical results demonstrate its validity and efficiency in comparison with other numerical methods.
• This work studies how an AI-controlled dog-fighting agent with tunable decision-making parameters can learn to optimize performance against an intelligent adversary, as measured by a stochastic objective function evaluated on simulated combat engagements. Gaussian process Bayesian optimization (GPBO) techniques are developed to automatically learn global Gaussian Process (GP) surrogate models, which provide statistical performance predictions in both explored and unexplored areas of the parameter space. This allows a learning engine to sample full-combat simulations at parameter values that are most likely to optimize performance and also provide highly informative data points for improving future predictions. However, standard GPBO methods do not provide a reliable surrogate model for the highly volatile objective functions found in aerial combat, and thus do not reliably identify global maxima. These issues are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point Sampling (HRMS) techniques. Simulation studies show that HRMS improves the accuracy of GP surrogate models, allowing AI decision-makers to more accurately predict performance and efficiently tune parameters.
• We study a variant of the source identification game with training data in which part of the training data is corrupted by an attacker. In the addressed scenario, the defender aims at deciding whether a test sequence has been drawn according to a discrete memoryless source $X \sim P_X$, whose statistics are known to him through the observation of a training sequence generated by $X$. In order to undermine the correct decision under the alternative hypothesis that the test sequence has not been drawn from $X$, the attacker can modify a sequence produced by a source $Y \sim P_Y$ up to a certain distortion, and corrupt the training sequence either by adding some fake samples or by replacing some samples with fake ones. We derive the unique rationalizable equilibrium of the two versions of the game in the asymptotic regime and by assuming that the defender bases its decision by relying only on the first order statistics of the test and the training sequences. By mimicking Stein's lemma, we derive the best achievable performance for the defender when the first type error probability is required to tend to zero exponentially fast with an arbitrarily small, yet positive, error exponent. We then use such a result to analyze the ultimate distinguishability of any two sources as a function of the allowed distortion and the fraction of corrupted samples injected into the training sequence.
• We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.
• We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e. each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly high-rank, but it becomes low-rank after mapping each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, including affine subspaces and their union, can be described by a variety model. In addition, varieties can be used to model a richer class of nonlinear quadratic and higher degree curves and surfaces. We study the sampling requirements for matrix completion under a variety model with a focus on a union of affine subspaces. We also propose an efficient matrix completion algorithm that minimizes a convex or non-convex surrogate of the rank of the matrix of monomial features. Our algorithm uses the well-known "kernel trick" to avoid working directly with the high-dimensional monomial matrix. We show the proposed algorithm is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques in experiments with real data.
• Highly robust and efficient estimators for the generalized linear model with a dispersion parameter are proposed. The estimators are based on three steps. In the first step the maximum rank correlation estimator is used to consistently estimate the slopes up to a scale factor. In the second step, the scale factor, the intercept, and the dispersion parameter are consistently estimated using a MT-estimator of a simple regression model. The combined estimator is highly robust but inefficient. Then, randomized quantile residuals based on the initial estimators are used to detect outliers to be rejected and to define a set S of observations to be retained. Finally, a conditional maximum likelihood (CML) estimator given the observations in S is computed. We show that, under the model, S tends to the complete sample for increasing sample size. Therefore, the CML tends to the unconditional maximum likelihood estimator. It is therefore highly efficient, while maintaining the high degree of robustness of the initial estimator. The case of the negative binomial regression model is studied in detail.
• Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.
• The estimation of a log-concave density on $\mathbb{R}$ is a canonical problem in the area of shape-constrained nonparametric inference. We present a Bayesian nonparametric approach to this problem based on an exponentiated Dirichlet process mixture prior and show that the posterior distribution converges to the log-concave truth at the (near-) minimax rate in Hellinger distance. Our proof proceeds by establishing a general contraction result based on the log-concave maximum likelihood estimator that prevents the need for further metric entropy calculations. We also present two computationally more feasible approximations and a more practical empirical Bayes approach, which are illustrated numerically via simulations.
• Analyzing time series data is important to predict future events and changes in finance, manufacturing and administrative decisions. Gaussian processes (GPs) solve regression and classification problems by choosing appropriate kernels capturing covariance structure of data. In time series analysis, GP based regression methods recently demonstrate competitive performance by decomposing temporal covariance structure. Such covariance structure decomposition allows exploiting shared parameters over a set of multiple but selected time series. In this paper, we propose an efficient variational inference algorithm for nonparametric clustering over multiple GP covariance structures. We handle multiple time series by placing an Indian Buffet Process (IBP) prior on the presence of the additive shared kernels. We propose a new variational inference algorithm to learn the nonparametric Bayesian models for the clustering and regression problems. Experiments are conducted on both synthetic data sets and real world data sets, showing promising results in term of structure discoveries. In addition, our model learns GP kernels faster but still preserves a good predictive performance.
• We provide a comprehensive study of the convergence of forward-backward algorithm under suitable geometric conditions leading to fast rates. We present several new results and collect in a unified view a variety of results scattered in the literature, often providing simplified proofs. Novel contributions include the analysis of infinite dimensional convex minimization problems, allowing the case where minimizers might not exist. Further, we analyze the relation between different geometric conditions, and discuss novel connections with a priori conditions in linear inverse problems, including source conditions, restricted isometry properties and partial smoothness.
• The paper addresses the issue of environmental awareness in the regions of the Russian Federation. Russia provides an important study area to investigate the relationship between economic development and environmental consciousness in general. This paper introduces an index of environmental awareness, which is derived as a latent variable from various categories of search entries in Yandex, the prominent Russian search engine, during two periods in years 2014 and 2015. These indicators are presumably dependent on certain causes, which are also integrated into the model. The resulting Multiple Indicators-Multiple Causes model allows to estimate the proposed index of environmental awareness and to rank the Russian regions for each period. Comparing the results of 2014 and 2015 is especially interesting, because of the RUR devaluation at the end of 2014. In additional, we answer the question of the existence of an Environmental Kuznets Curve with respect to environmental understanding.
• Accounting for undecided and uncertain voters is a challenging issue for predicting election results from public opinion polls. Undecided voters typify the uncertainty of swing voters in polls but are often ignored or allocated to each candidate in a simplistic manner. Historically this has been adequate because first, the undecided tend to settle on a candidate as the election day draws closer, and second, they are comparatively small enough to assume that the undecided voters do not affect the relative proportions of the decided voters. These assumptions are used by poll authors and meta-poll analysts, but in the presence of high numbers of undecided voters these static rules may bias election predictions. In this paper, we examine the effect of undecided voters in the 2016 US presidential election. This election was unique in that a) there was a relatively high number of undecided voters and b) the major party candidates had high unfavorability ratings. We draw on psychological theories of decision making such as decision field theory and prospect theory to explain the link between candidate unfavorability and voter indecisiveness, and to describe how these factors likely contributed to a systematic bias in polling. We then show that the allocation of undecided voters in the 2016 election biased polls and meta-polls in a manner consistent with these theories. These findings imply that, given the increasing number of undecided voters in recent elections, it will be important to take into account the underlying psychology of voting when making predictions about elections.
• We introduce the notion of the essential tangent bundle of a parametrized measure model and the notion of reduced Fisher metric on a (possibly singular) 2-integrable measure model. Using these notions and a new characterization of $k$-integrable parametrized measure models, we extend the Cramér-Rao inequality to $2$-integrable (possibly singular) statistical models for general $\varphi$-estimations, where $\varphi$ is a $V$-valued feature function and $V$ is a topological vector space.
• In this paper, we address the inverse problem, or the statistical machine learning problem, in Markov random fields with a non-parametric pair-wise energy function with continuous variables. The inverse problem is formulated by maximum likelihood estimation. The exact treatment of maximum likelihood estimation is intractable because of two problems: (1) it includes the evaluation of the partition function and (2) it is formulated in the form of functional optimization. We avoid Problem (1) by using Bethe approximation. Bethe approximation is an approximation technique equivalent to the loopy belief propagation. Problem (2) can be solved by using orthonormal function expansion. Orthonormal function expansion can reduce a functional optimization problem to a function optimization problem. Our method can provide an analytic form of the solution of the inverse problem within the framework of Bethe approximation.
• We show that the generalized method of moments (GMM) estimation problem in instrumental variable quantile regression (IVQR) models can be equivalently formulated as a mixed integer quadratic programming problem. This enables exact computation of the GMM estimators for the IVQR models. We illustrate the usefulness of our algorithm via Monte Carlo experiments and an application to demand for fish.
• This paper provides several statistical estimators for the drift and volatility parameters of an Ornstein-Uhlenbeck process driven by fractional Brownian motion, whose observations can be made either continuously or at discrete time instants. First and higher order power variations are used to estimate the volatility parameter. The almost sure convergence of the estimators and the corresponding central limit theorems are obtained for all the Hurst parameter range $H\in (0, 1)$. The least squares estimator is used for the drift parameter. A central limit theorem is proved when the Hurst parameter $H \in (0, 1/2)$ and a noncentral limit theorem is proved for $H\in[3/4, 1)$. Thus, the open problem left in the paper by Hu and Nualart (2010) is completely solved, where a central limit theorem for least squares estimator is proved for $H\in [1/2, 3/4)$.
• In this paper, we develop an upper bound for the SPARSEVA (SPARSe Estimation based on a VAlidation criterion) estimation error in a general scheme, i.e., when the cost function is strongly convex and the regularized norm is decomposable for a pair of subspaces. We show how this general bound can be applied to a sparse regression problem to obtain an upper bound for the traditional SPARSEVA problem. Numerical results are used to illustrate the effectiveness of the suggested bound.
• This paper proposes an algorithm to estimate the parameters, including time delay, of continuous time systems based on instrumental variable identification methods. To overcome the multiple local minima of the cost function associated with the estimation of a time delay system, we utilise the useful redundancy technique. Specifically, the cost function is filtered through a set of low-pass filters to improve convexity with the useful redundancy technique exploited to achieve convergence to the global minimum of the optimization problem. Numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.
• Software packages usually report the results of statistical tests using p-values. Users often interpret these by comparing them to standard thresholds, e.g. 0.1%, 1% and 5%, which is sometimes re-inforced by a star rating (***, **, *). In this article, we consider an arbitrary statistical test whose p-value p is not available explicitly, but can be approximated by Monte Carlo samples, e.g. bootstrap or permutation tests. The standard implementation of such tests usually draws a fixed number of samples to approximate p. However, the probability that the exact and the approximated p-value lie on different sides of a threshold (thus changing the interpretation) can be high, particularly for p-values close to a threshold. We present a method to overcome this. We consider a finite set of user-specified intervals which cover [0,1] and which can be overlapping. We call these p-value buckets. We present algorithms that, with high probability, return a p-value bucket containing p. Suitably chosen overlapping buckets allow decisions in finite time and lead to an extension of the star rating. The proposed methods are suitable for use in standard software and can be computationally more efficient than standard implementations. We illustrate our methods using a contingency table example.
• A flexible approach to modeling network data is based on exponential-family random graph models. We consider here exponential-family random graph models with additional structure in the form of local dependence, which have important conceptual and statistical advantages over models without additional structure. An open problem is how to estimate such models from large random graphs. We pave the ground for massive-scale estimation of such models by exploiting model structure for the purpose of parallel computing. The main idea is that we can first decompose random graphs into subgraphs with local dependence and then perform parallel computing on subgraphs. We hence propose a two-step likelihood-based approach. The first step estimates the local structure underlying random graphs. The second step estimates parameters given the estimated local structure of random graphs. Both steps can be implemented in parallel, which enables massive-scale estimation. We demonstrate the advantages of the two-step likelihood-based approach by simulations and an application to a large Amazon product network.
• There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.
• Bands of vector-valued functions $f:T\mapsto\mathbb{R}^d$ are defined by considering convex hulls generated by their values concatenated at $m$ different values of the argument. The obtained $m$-bands are families of functions, ranging from the conventional band in case the time points are individually considered (for $m=1$) to the convex hull in the functional space if the number $m$ of simultaneously considered time points becomes large enough to fill the whole time domain. These bands give rise to a depth concept that is new both for real-valued and vector-valued functions.
• The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction.

Noon van der Silk Mar 08 2017 04:45 UTC

I feel that while the proliferation of GUNs is unquestionable a good idea, there are many unsupervised networks out there that might use this technology in dangerous ways. Do you think Indifferential-Privacy networks are the answer? Also I fear that the extremist binary networks should be banned ent

...(continued)
Noon van der Silk Jan 27 2016 03:39 UTC

Great institute name ...

Alessandro Dec 09 2015 01:12 UTC

Hey, I've already seen this title! http://arxiv.org/abs/1307.0401

Chris Granade Sep 22 2015 19:15 UTC

Thank you for the kind comments, I'm glad that our paper, source code, and tutorial are useful!

Travis Scholten Sep 21 2015 17:05 UTC

This was a really well-written paper! Am very glad to see this kind of work being done.

In addition, the openness about source code is refreshing. By explicitly relating the work to [QInfer](https://github.com/csferrie/python-qinfer), this paper makes it more easy to check the authors' work. Furthe

...(continued)
Chris Granade Sep 15 2015 02:40 UTC

I fell for that clickbait title and read the paper. I still don’t get why von Neumann didn't want us to know about this weird trick? And which weird trick? The use of superfidelity or the use of non-physical density matrices like $\sigma^\sharp$?