In recent years, there has been a growing interest in applying data assimilation (DA) methods, originally designed for state estimation, to the model selection problem. Along these lines, Carrassi et al. (2017) introduced the contextual formulation of model evidence (CME) and showed that CME can be efficiently computed using a hierarchy of ensemble-based DA procedures. Although Carrassi et al. (2017) analyzed the DA methods most commonly used for operational atmospheric and oceanic prediction worldwide, they did not study these methods in conjunction with localization to a specific domain. Yet any application of ensemble DA methods to realistic geophysical models requires the study of such localization. The present study extends the theory for estimating CME to ensemble DA methods with domain localization. The domain- localized CME (DL-CME) developed herein is tested for model selection with two models: (i) the Lorenz 40-variable mid-latitude atmospheric dynamics model (L95); and (ii) the simplified global atmospheric SPEEDY model. The CME is compared to the root-mean-square-error (RMSE) as a metric for model selection. The experiments show that CME improves systematically over the RMSE, and that such an improved skill is further enhanced by applying localization in the estimate of the CME, using the DL-CME. The potential use and range of applications of the CME and DL-CME as a model selection metric are also discussed.

A multi-fidelity simulator is a numerical model, in which one of the inputs controls a trade-off between the realism and the computational cost of the simulation. Our goal is to estimate the probability of exceeding a given threshold on a multi-fidelity stochastic simulator. We propose a fully Bayesian approach based on Gaussian processes to compute the posterior probability distribution of this probability. We pay special attention to the hyper-parameters of the model. Our methodology is illustrated on an academic example.

With the release of Stata 14 came the mestreg command to fit multilevel mixed effects parametric survival models, assuming normally distributed random effects, estimated with maximum likelihood utilising Gaussian quadrature. In this article, I present the user written stmixed command, which serves as both an alternative and a complimentary program for the fitting of multilevel parametric survival models, to mestreg. The key extensions include incorporation of the flexible parametric Royston-Parmar survival model, and the ability to fit multilevel relative survival models. The methods are illustrated with a commonly used dataset of patients with kidney disease suffering recurrent infections, and a simulated example, illustrating a simple approach to simulating clustered survival data using survsim (Crowther and Lambert 2012, 2013).

We introduce varbvs, a suite of functions written in R and MATLAB for regression analysis of large-scale data sets using Bayesian variable selection methods. We have developed numerical optimization algorithms based on variational approximation methods that make it feasible to apply Bayesian variable selection to very large data sets. With a focus on examples from genome-wide association studies, we demonstrate that varbvs scales well to data sets with hundreds of thousands of variables and thousands of samples, and has features that facilitate rapid data analyses. Moreover, varbvs allows for extensive model customization, which can be used to incorporate external information into the analysis. We expect that the combination of an easy-to-use interface and robust, scalable algorithms for posterior computation will encourage broader use of Bayesian variable selection in areas of applied statistics and computational biology. The most recent R and MATLAB source code is available for download at Github (https://github.com/pcarbo/varbvs), and the R package can be installed from CRAN (https://cran.r-project.org/package=varbvs).

Sep 21 2017

stat.CO arXiv:1709.06849v1

R is a programming language and environment that is a central tool in the applied sciences for writing program. Its impact on the development of modern statistics is inevitable. Current research, especially for big data may not be done solely using R and will likely use different programming languages; hence, having a modern integrated development environment (IDE) is very important. Atom editor is modern IDE that is developed by GitHub, it is described as "A hackable text editor for the 21st Century". This report is intended to present a package deployed entitled Rbox that allows Atom Editor to write and run codes professionally in R.