The main SciRate homepage is down (when not logged in). We are working to fix it. See https://github.com/scirate/scirate/issues/337 for updates.

Statistics (stat)

  • PDF
    We analyze the bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating $f(p)$, where $f \in C[0,1]$ is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in delete-$d$ jackknife, different values of $d$ may lead to drastically different behavior in jackknife. We show that in the binomial model, iterating the bootstrap bias correction infinitely many times may lead to divergence of bias and variance, and demonstrate that the bias properties of the bootstrap bias corrected estimator after $r-1$ rounds is exactly the same as that of the $r$-jackknife estimator if a bounded coefficients condition is satisfied.
  • PDF
    A Triangle Generative Adversarial Network ($\Delta$-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach.
  • PDF
    Generative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems---using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the context of problems with one-dimensional outputs. We identify a class of function decompositions with properties that make them well suited to the critic role in a leading approach to GANs known as Wasserstein GANs. We show that Taylor and Fourier series decompositions belong to our class, provide examples of these critics outperforming standard GAN approaches, and suggest how they can be scaled to higher dimensional problems in the future.
  • PDF
    Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n^{\Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $d\ge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation.
  • PDF
    Learning to remember sequences, especially long sequences with recurrent neural networks is still a challenge. Register memory and attention mechanisms were both proposed to address it with either high computational cost to retain memory differentiability, or by discounting the RNN representation learning towards encoding shorter local contexts than encouraging long sequence encoding. Associative memory, which studies how multiple patterns can be compressed in a fixed size memory, were rarely studied in recent years. Although some recent work tries to introduce associative memory in RNN and mimic the energy decay process in Hopfield nets, it inherits the shortcoming of rule based memory updates and the memory capacity is limited. This paper proposes a method to learn the memory update rule jointly with task objective in the hope to improve memory capacity, so that long sequences could be remembered faster. On top of the contribution on memory update approach, we propose an architecture that uses multiple such associative memory for more complex input encoding. We observed some interesting facts when compared to other RNN architectures on some well-studied sequence learning tasks.
  • PDF
    We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, $\sim$40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate $\sim$0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the "missing heritability" problem -- i.e., the gap between prediction R-squared and SNP heritability. The $\sim$20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
  • PDF
    Recurrent Neural Networks (RNNS) are now widely used on sequence generation tasks due to their ability to learn long-range dependencies and to generate sequences of arbitrary length. However, their left-to-right generation procedure only allows a limited control from a potential user which makes them unsuitable for interactive and creative usages such as interactive music generation. This paper introduces a novel architecture called Anticipation-RNN which possesses the assets of the RNN-based generative models while allowing to enforce user-defined positional constraints. We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J.S. Bach chorale harmonizations. Sampling using the Anticipation-RNN is of the same order of complexity than sampling from the traditional RNN model. This fast and interactive generation of musical sequences opens ways to devise real-time systems that could be used for creative purposes.
  • PDF
    Some real-world problems revolve to solve the optimization problem \max_x∈\mathcalXf\left(x\right) where f\left(.\right) is a black-box function and X might be the set of non-vectorial objects (e.g., distributions) where we can only define a symmetric and non-negative similarity score on it. This setting requires a novel view for the standard framework of Bayesian Optimization that generalizes the core insightful spirit of this framework. With this spirit, in this paper, we propose Analogical-based Bayesian Optimization that can maximize black-box function over a domain where only a similarity score can be defined. Our pathway is as follows: we first base on the geometric view of Gaussian Processes (GP) to define the concept of influence level that allows us to analytically represent predictive means and variances of GP posteriors and base on that view to enable replacing kernel similarity by a more genetic similarity score. Furthermore, we also propose two strategies to find a batch of query points that can efficiently handle high dimensional data.
  • PDF
    Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.
  • PDF
    We study minimax lower bounds for function estimation problems on large graph when the target function is smoothly varying over the graph. We derive minimax rates in the context of regression and classification problems on graphs that satisfy an asymptotic shape assumption and with a smoothness condition on the target function, both formulated in terms of the graph Laplacian.
  • PDF
    Motivated by the reconstruction and the prediction of electricity consumption, we extend Nonnegative Matrix Factorization~(NMF) to take into account side information (column or row features). We consider general linear measurement settings, and propose a framework which models non-linear relationships between features and the response variables. We extend previous theoretical results to obtain a sufficient condition on the identifiability of the NMF in this setting. Based the classical Hierarchical Alternating Least Squares~(HALS) algorithm, we propose a new algorithm (HALSX, or Hierarchical Alternating Least Squares with eXogeneous variables) which estimates the factorization model. The algorithm is validated on both simulated and real electricity consumption datasets as well as a recommendation dataset, to show its performance in matrix recovery and prediction for new rows and columns.
  • PDF
    Strongly consistent estimates are shown, via relative frequency, for the probability of "white balls" inside a dichotomous urn when such a probability is an arbitrary continuous time dependent function over a bounded time interval. The asymptotic behaviour of relative frequency is studied in a nonstationary context using a Riemann-Dini type theorem for SLLN of random variables with arbitrarily different expectations; furthermore the theoretical results concerning the SLLN can be applied for estimating the mean function of unknown form of a general nonstationary process.
  • PDF
    We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they allow new components to be introduced on the fly as needed. This, however, posts an important challenge to distributed estimation -- how to handle new components efficiently and consistently. To tackle this problem, we propose a new estimation method, which allows new components to be created locally in individual computing nodes. Components corresponding to the same cluster will be identified and merged via a probabilistic consolidation scheme. In this way, we can maintain the consistency of estimation with very low communication cost. Experiments on large real-world data sets show that the proposed method can achieve high scalability in distributed and asynchronous environments without compromising the mixing performance.
  • PDF
    In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.
  • PDF
    This article concerns a class of generalized linear mixed models for clustered data, where the random effects are mapped uniquely onto the grouping structure and are independent between groups. We derive necessary and sufficient conditions that enable the marginal likelihood of such class of models to be expressed in closed-form. Illustrations are provided using the Gaussian, Poisson, binomial and gamma distributions. These models are unified under a single umbrella of conjugate generalized linear mixed models, where "conjugate" refers to the fact that the marginal likelihood can be expressed in closed-form, rather than implying inference via the Bayesian paradigm. Having an explicit marginal likelihood means that these models are more computationally convenient, which can be important in big data contexts. Except for the binomial distribution, these models are able to achieve simultaneous conjugacy, and thus able to accommodate both unit and group level covariates.
  • PDF
    Consider the case that we observe $n$ independent and identically distributed copies of a random variable with a probability distribution known to be an element of a specified statistical model. We are interested in estimating an infinite dimensional target parameter that minimizes the expectation of a specified loss function. In \citegenerally_efficient_TMLE we defined an estimator that minimizes the empirical risk over all multivariate real valued cadlag functions with variation norm bounded by some constant $M$ in the parameter space, and selects $M$ with cross-validation. We referred to this estimator as the Highly-Adaptive-Lasso estimator due to the fact that the constrained can be formulated as a bound $M$ on the sum of the coefficients a linear combination of a very large number of basis functions. Specifically, in the case that the target parameter is a conditional mean, then it can be implemented with the standard LASSO regression estimator. In \citegenerally_efficient_TMLE we proved that the HAL-estimator is consistent w.r.t. the (quadratic) loss-based dissimilarity at a rate faster than $n^{-1/2}$ (i.e., faster than $n^{-1/4}$ w.r.t. a norm), even when the parameter space is completely nonparametric. The only assumption required for this rate is that the true parameter function has a finite variation norm. The loss-based dissimilarity is often equivalent with the square of an $L^2(P_0)$-type norm. In this article, we establish that under some weak continuity condition, the HAL-estimator is also uniformly consistent.
  • PDF
    This work obtains novel finite sample guarantees for Principal Component Analysis (PCA). These hold even when the corrupting noise is non-isotropic, and a part (or all of it) is data-dependent. Because of the latter, in general, the noise and the true data are correlated. The results in this work are a significant improvement over those given in our earlier work where this "correlated-PCA" problem was first studied. In fact, in certain regimes, our results imply that the sample complexity required to achieve subspace recovery error that is a constant fraction of the noise level is near-optimal. Useful corollaries of our result include guarantees for PCA in sparse data-dependent noise and for PCA with missing data. An important application of the former is in proving correctness of the subspace update step of a popular online algorithm for dynamic robust PCA.
  • PDF
    We introduce a new R package, BeSS, for solving the best subset selection problem in linear, logistic and Cox's proportional hazard (CoxPH) models. It utilizes a highly efficient active set algorithm based on primal and dual variables, and supports sequential and golden search strategies for best subset selection. We provide a C++ implementation of the algorithm using Rcpp interface. We demonstrate through numerical experiments based on enormous simulation and real datasets that the new BeSS package has competitive performance compared to other R packages for best subset selection purpose.
  • PDF
    In regression problems where there is no known true underlying model, conformal prediction methods enable prediction intervals to be constructed without any assumptions on the distribution of the underlying data, except that the training and test data are assumed to be exchangeable. However, these methods bear a heavy computational cost-and, to be carried out exactly, the regression algorithm would need to be fitted infinitely many times. In practice, the conformal prediction method is run by simply considering only a finite grid of finely spaced values for the response variable. This paper develops discretized conformal prediction algorithms that are guaranteed to cover the target value with the desired probability, and that offer a tradeoff between computational cost and prediction accuracy.
  • PDF
    A new test of normality based on a standardised empirical process is introduced in this article. The first step is to introduce a Cramér-von Mises type statistic with weights equal to the inverse of the standard normal density function supported on a symmetric interval $[-a_n,a_n]$ depending on the sample size $n.$ The sequence of end points $a_n$ tends to infinity, and is chosen so that the statistic goes to infinity at the speed of $\ln \ln n.$ After substracting the mean, a suitable test statistic is obtained, with the same asymptotic law as the well-known Shapiro-Wilk statistic. The performance of the new test is described and compared with three other well-known tests of normality, namely, Shapiro-Wilk, Anderson-Darling and that of del Barrio-Matrán, Cuesta Albertos, and Rodrı́guez Rodrı́guez, by means of power calculations under many alternative hypotheses.
  • PDF
    The health effects of environmental exposures have been studied for decades, typically using standard regression models to assess exposure-outcome associations found in observational non-experimental data. We propose and illustrate a different approach to examine causal effects of environmental exposures on health outcomes from observational data. Our strategy attempts to structure the observational data to approximate data from a hypothetical, but realistic, randomized experiment. This approach, based on insights from classical experimental design, involves four stages, and relies on modern computing to implement the effort in two of the four stages.More specifically, our strategy involves: 1) a conceptual stage that involves the precise formulation of the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; 2) a design stage that attempts to reconstruct (or approximate) a randomized experiment before any outcome data are observed, 3) a statistical analysis comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and 4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects of the exposure on outcomes. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970's. To complement the traditional purely model-based approaches, our strategy, which includes outcome free matched-sampling, provides workable tools to quantify possible detrimental exposure effects on human health outcomes especially because it also includes transparent diagnostics to assess the assumptions of the four-stage statistical approach being applied.
  • PDF
    In recent years, a number of artificial intelligent services have been developed such as defect detection system or diagnosis system for customer services. Unfortunately, the core in these services is a black-box in which human cannot understand the underlying decision making logic, even though the inspection of the logic is crucial before launching a commercial service. Our goal in this paper is to propose an analytic method of a model explanation that is applicable to general classification models. To this end, we introduce the concept of a contribution matrix and an explanation embedding in a constraint space by using a matrix factorization. We extract a rule-like model explanation from the contribution matrix with the help of the nonnegative matrix factorization. To validate our method, the experiment results provide with open datasets as well as an industry dataset of a LTE network diagnosis and the results show our method extracts reasonable explanations.
  • PDF
    We present a formalization of nested Monte Carlo (NMC) estimation, whereby terms in an outer estimator are themselves the output of separate, nested, Monte Carlo (MC) estimators. We demonstrate that NMC can provide consistent estimates of nested expectations, including cases of repeated nesting, under mild conditions; establish corresponding rates of convergence; and provide empirical evidence that suggests these rates are observed in practice. We further establish a number of pitfalls that can arise from naive nesting of MC estimators and provide guidelines about how they can be avoided. Our results show that whenever an outer estimator depends nonlinearly on an inner estimator, then the number of samples used in both the inner and outer estimators must, in general, be driven to infinity for convergence. We also lay out novel methods for reformulating certain classes of nested expectation problems into a single expectation, leading to improved convergence rates compared with naive NMC. Finally, we derive a new estimator for use in discrete Bayesian experimental design problems which has a better convergence rate than existing methods.
  • PDF
    This paper investigates the theoretical foundations of metric learning, focused on three key questions that are not fully addressed in prior work: 1) we consider learning general low-dimensional (low-rank) metrics as well as sparse metrics; 2) we develop upper and lower (minimax)bounds on the generalization error; 3) we quantify the sample complexity of metric learning in terms of the dimension of the feature space and the dimension/rank of the underlying metric;4) we also bound the accuracy of the learned metric relative to the underlying true generative metric. All the results involve novel mathematical approaches to the metric learning problem, and lso shed new light on the special case of ordinal embedding (aka non-metric multidimensional scaling).
  • PDF
    We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution $f(x,y,z)$ of continuous random vectors $X,Y$ and $Z,$ we determine whether $X \perp Y | Z$. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful classifiers like gradient-boosted trees and deep neural networks. These models can handle complex probability distributions and allow us to perform significantly better compared to the prior state of the art, for high-dimensional CI testing. The main technical challenge in the classification problem is the need for samples from the conditional product distribution $f^{CI}(x,y,z) = f(x|z)f(y|z)f(z)$ -- the joint distribution if and only if $X \perp Y | Z.$ -- when given access only to i.i.d. samples from the true joint distribution $f(x,y,z)$. To tackle this problem we propose a novel nearest neighbor bootstrap procedure and theoretically show that our generated samples are indeed close to $f^{CI}$ in terms of total variational distance. We then develop theoretical results regarding the generalization bounds for classification for our problem, which translate into error bounds for CI testing. We provide a novel analysis of Rademacher type classification bounds in the presence of non-i.i.d near-independent samples. We empirically validate the performance of our algorithm on simulated and real datasets and show performance gains over previous methods.
  • PDF
    This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent with that reported in previous study.
  • PDF
    We analyze the convergence of (stochastic) gradient descent algorithm for learning a convolutional filter with Rectified Linear Unit (ReLU) activation function. Our analysis does not rely on any specific form of the input distribution and our proofs only use the definition of ReLU, in contrast with previous works that are restricted to standard Gaussian input. We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches. To the best of our knowledge, this is the first recovery guarantee of gradient-based algorithms for convolutional filter on non-Gaussian input distributions. Our theory also justifies the two-stage learning rate strategy in deep neural networks. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.
  • PDF
    We present a probabilistic framework for nonlinearities, based on doubly truncated Gaussian distributions. By setting the truncation points appropriately, we are able to generate various types of nonlinearities within a unified framework, including sigmoid, tanh and ReLU, the most commonly used nonlinearities in neural networks. The framework readily integrates into existing stochastic neural networks (with hidden units characterized as random variables), allowing one for the first time to learn the nonlinearities alongside model weights in these networks. Extensive experiments demonstrate the performance improvements brought about by the proposed framework when integrated with the restricted Boltzmann machine (RBM), temporal RBM and the truncated Gaussian graphical model (TGGM).
  • PDF
    Fractional Gaussian noise (fGn) is a stationary time series model with long memory properties applied in various fields like econometrics, hydrology and climatology. The computational cost in fitting an fGn model of length $n$ using a likelihood-based approach is ${\mathcal O}(n^{2})$, exploiting the Toeplitz structure of the covariance matrix. In most realistic cases, we do not observe the fGn process directly but only through indirect Gaussian observations, so the Toeplitz structure is easily lost and the computational cost increases to ${\mathcal O}(n^{3})$. This paper presents an approximate fGn model of ${\mathcal O}(n)$ computational cost, both with direct or indirect Gaussian observations, with or without conditioning. This is achieved by approximating fGn with a weighted sum of independent first-order autoregressive processes, fitting the parameters of the approximation to match the autocorrelation function of the fGn model. The resulting approximation is stationary despite being Markov and gives a remarkably accurate fit using only four components. The performance of the approximate fGn model is demonstrated in simulations and two real data examples.
  • PDF
    We consider the situation where a temporal process is composed of contiguous segments with differing slopes and replicated noise-corrupted time series measurements are observed. The unknown mean of the data generating process is modelled as a piecewise linear function of time with an unknown number of change-points. We develop a Bayesian approach to infer the joint posterior distribution of the number and position of change-points as well as the unknown mean parameters. A-priori, the proposed model uses an overfitting number of mean parameters but, conditionally on a set of change-points, only a subset of them influences the likelihood. An exponentially decreasing prior distribution on the number of change-points gives rise to a posterior distribution concentrating on sparse representations of the underlying sequence. A Metropolis-Hastings Markov chain Monte Carlo (MCMC) sampler is constructed for approximating the posterior distribution. Our method is benchmarked using simulated data and is applied to uncover differences in the dynamics of fungal growth from imaging time course data collected from different strains. The source code is available online.
  • PDF
    In this note we prove a tight lower bound for the MNL-bandit assortment selection model that matches the upper bound given in (Agrawal et al., 2016) for all parameters, up to logarithmic factors.
  • PDF
    Change point analysis is a statistical tool to identify homogeneity within time series data. We propose a pruning approach for approximate nonparametric estimation of multiple change points. This general purpose change point detection procedure `cp3o' applies a pruning routine within a dynamic program to greatly reduce the search space and computational costs. Existing goodness-of-fit change point objectives can immediately be utilized within the framework. We further propose novel change point algorithms by applying cp3o to two popular nonparametric goodness of fit measures: `e-cp3o' uses E-statistics, and `ks-cp3o' uses Kolmogorov-Smirnov statistics. Simulation studies highlight the performance of these algorithms in comparison with parametric and other nonparametric change point methods. Finally, we illustrate these approaches with climatological and financial applications.
  • Sep 20 2017 stat.ME arXiv:1709.06418v1
    PDF
    In this paper we give a brief review of semiparametric theory, using as a running example the common problem of estimating an average causal effect. Semiparametric models allow at least part of the data-generating process to be unspecified and unrestricted, and can often yield robust estimators that nonetheless behave similarly to those based on parametric likelihood assumptions, e.g., fast rates of convergence to normal limiting distributions. We discuss the basics of semiparametric theory, focusing on influence functions.
  • PDF
    The difficulties of detecting association, measuring correlation, and establishing cause and effect have fascinated mankind since time immemorial. Democritus, the Greek philosopher, underscored well the importance and the difficulty of proving causality when he wrote, "I would rather discover one cause than gain the kingdom of Persia." To address the difficulties of relating cause and effect, statisticians have developed many inferential techniques. Perhaps the most well-known method stems from Karl Pearson's coefficient of correlation, which Pearson introduced in the late 19th century based on ideas of Francis Galton. I will describe in this lecture the recently-devised distance correlation coefficient and describe its advantages over the Pearson and other classical measures of correlation. We will examine an application of the distance correlation coefficient to data drawn from large astrophysical databases, where it is desired to classify galaxies according to various types. Further, the lecture will analyze data arising in the ongoing national discussion of the relationship between state-by-state homicide rates and the stringency of state laws governing firearm ownership. The lecture will also describe a remarkable singular integral which lies at the core of the theory of the distance correlation coefficient. We will see that this singular integral admits generalizations to the truncated Maclaurin expansions of the cosine function and to the theory of spherical functions on symmetric cones.
  • PDF
    When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework-DC-Prophet-based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.

Recent comments

Noon van der Silk Mar 08 2017 04:45 UTC

I feel that while the proliferation of GUNs is unquestionable a good idea, there are many unsupervised networks out there that might use this technology in dangerous ways. Do you think Indifferential-Privacy networks are the answer? Also I fear that the extremist binary networks should be banned ent

...(continued)
Noon van der Silk Jan 27 2016 03:39 UTC

Great institute name ...

Alessandro Dec 09 2015 01:12 UTC

Hey, I've already seen this title! http://arxiv.org/abs/1307.0401

Chris Granade Sep 22 2015 19:15 UTC

Thank you for the kind comments, I'm glad that our paper, source code, and tutorial are useful!

Travis Scholten Sep 21 2015 17:05 UTC

This was a really well-written paper! Am very glad to see this kind of work being done.

In addition, the openness about source code is refreshing. By explicitly relating the work to [QInfer](https://github.com/csferrie/python-qinfer), this paper makes it more easy to check the authors' work. Furthe

...(continued)
Chris Granade Sep 15 2015 02:40 UTC

As a quick addendum, please note that the [supplementary video](https://www.youtube.com/watch?v=22ejRV0Kx2g) for this work is available [on YouTube](https://www.youtube.com/watch?v=22ejRV0Kx2g). Thank you!

Richard Kueng Mar 08 2015 22:02 UTC

Neither, Frédéric! Replacing fidelity by superfidelity still requires optimizing over all density matrices. However, the Birkhoff-von Neumann Theorem (see Lemma 1) allows for further restricting this optimization to n scalar variables w.l.o.g.---Theorem 2. Arguably, this greatly simplifies the geome

...(continued)
Frédéric Grosshans Mar 05 2015 11:31 UTC

I fell for that clickbait title and read the paper. I still don’t get why von Neumann didn't want us to know about this weird trick? And which weird trick? The use of superfidelity or the use of non-physical density matrices like $\sigma^\sharp$?

Noon van der Silk Mar 03 2015 03:20 UTC

I took the liberty of uploading the IPython notebook as a github [gist](https://gist.github.com), so it's viewable [here](http://nbviewer.ipython.org/urls/gist.githubusercontent.com/silky/b14fa42c6d5475a3a724/raw/887c19fb04581f1a33f9d03370e4b7b3a33c2ea8/ferrie_kueng_bayes_est_fid.ipynb).