# Machine Learning (stat.ML)

• This work studies how an AI-controlled dog-fighting agent with tunable decision-making parameters can learn to optimize performance against an intelligent adversary, as measured by a stochastic objective function evaluated on simulated combat engagements. Gaussian process Bayesian optimization (GPBO) techniques are developed to automatically learn global Gaussian Process (GP) surrogate models, which provide statistical performance predictions in both explored and unexplored areas of the parameter space. This allows a learning engine to sample full-combat simulations at parameter values that are most likely to optimize performance and also provide highly informative data points for improving future predictions. However, standard GPBO methods do not provide a reliable surrogate model for the highly volatile objective functions found in aerial combat, and thus do not reliably identify global maxima. These issues are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point Sampling (HRMS) techniques. Simulation studies show that HRMS improves the accuracy of GP surrogate models, allowing AI decision-makers to more accurately predict performance and efficiently tune parameters.
• We study a variant of the source identification game with training data in which part of the training data is corrupted by an attacker. In the addressed scenario, the defender aims at deciding whether a test sequence has been drawn according to a discrete memoryless source $X \sim P_X$, whose statistics are known to him through the observation of a training sequence generated by $X$. In order to undermine the correct decision under the alternative hypothesis that the test sequence has not been drawn from $X$, the attacker can modify a sequence produced by a source $Y \sim P_Y$ up to a certain distortion, and corrupt the training sequence either by adding some fake samples or by replacing some samples with fake ones. We derive the unique rationalizable equilibrium of the two versions of the game in the asymptotic regime and by assuming that the defender bases its decision by relying only on the first order statistics of the test and the training sequences. By mimicking Stein's lemma, we derive the best achievable performance for the defender when the first type error probability is required to tend to zero exponentially fast with an arbitrarily small, yet positive, error exponent. We then use such a result to analyze the ultimate distinguishability of any two sources as a function of the allowed distortion and the fraction of corrupted samples injected into the training sequence.
• We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.
• We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e. each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly high-rank, but it becomes low-rank after mapping each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, including affine subspaces and their union, can be described by a variety model. In addition, varieties can be used to model a richer class of nonlinear quadratic and higher degree curves and surfaces. We study the sampling requirements for matrix completion under a variety model with a focus on a union of affine subspaces. We also propose an efficient matrix completion algorithm that minimizes a convex or non-convex surrogate of the rank of the matrix of monomial features. Our algorithm uses the well-known "kernel trick" to avoid working directly with the high-dimensional monomial matrix. We show the proposed algorithm is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques in experiments with real data.
• Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.
• Analyzing time series data is important to predict future events and changes in finance, manufacturing and administrative decisions. Gaussian processes (GPs) solve regression and classification problems by choosing appropriate kernels capturing covariance structure of data. In time series analysis, GP based regression methods recently demonstrate competitive performance by decomposing temporal covariance structure. Such covariance structure decomposition allows exploiting shared parameters over a set of multiple but selected time series. In this paper, we propose an efficient variational inference algorithm for nonparametric clustering over multiple GP covariance structures. We handle multiple time series by placing an Indian Buffet Process (IBP) prior on the presence of the additive shared kernels. We propose a new variational inference algorithm to learn the nonparametric Bayesian models for the clustering and regression problems. Experiments are conducted on both synthetic data sets and real world data sets, showing promising results in term of structure discoveries. In addition, our model learns GP kernels faster but still preserves a good predictive performance.
• We provide a comprehensive study of the convergence of forward-backward algorithm under suitable geometric conditions leading to fast rates. We present several new results and collect in a unified view a variety of results scattered in the literature, often providing simplified proofs. Novel contributions include the analysis of infinite dimensional convex minimization problems, allowing the case where minimizers might not exist. Further, we analyze the relation between different geometric conditions, and discuss novel connections with a priori conditions in linear inverse problems, including source conditions, restricted isometry properties and partial smoothness.
• In this paper, we address the inverse problem, or the statistical machine learning problem, in Markov random fields with a non-parametric pair-wise energy function with continuous variables. The inverse problem is formulated by maximum likelihood estimation. The exact treatment of maximum likelihood estimation is intractable because of two problems: (1) it includes the evaluation of the partition function and (2) it is formulated in the form of functional optimization. We avoid Problem (1) by using Bethe approximation. Bethe approximation is an approximation technique equivalent to the loopy belief propagation. Problem (2) can be solved by using orthonormal function expansion. Orthonormal function expansion can reduce a functional optimization problem to a function optimization problem. Our method can provide an analytic form of the solution of the inverse problem within the framework of Bethe approximation.