The Bayesian Information Criterion (BIC) has been widely used for estimating the number of data clusters in an observed data set for decades. The original derivation, referred to as classic BIC, does not include information about the specific model selection problem at hand, which renders it generic. However, very little effort has been made to check its appropriateness for cluster analysis. In this paper we derive BIC from first principle by formulating the problem of estimating the number of clusters in a data set as maximization of the posterior probability of candidate models given observations. We provide a general BIC expression which is independent of the data distribution given some mild assumptions are satisfied. This serves as an important milestone when deriving BIC for specific data distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed observations. We show that incorporating the clustering problem during the derivation of BIC results in an expression whose penalty term is different from the penalty term of the classic BIC. We propose a two-step cluster enumeration algorithm that utilizes a model-based unsupervised learning algorithm to partition the observed data according to each candidate model and the proposed BIC for selecting the model with the optimal number of clusters. The performance of the proposed criterion is tested using synthetic and real-data examples. Simulation results show that our proposed criterion outperforms the existing BIC-based cluster enumeration methods. Our proposed criterion is particularly powerful in estimating the number of data clusters when the observations have unbalanced and overlapping clusters.
Many problems in signal processing require finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. When dealing with real-world data, the presence of outliers and impulsive noise must also be accounted for. In past decades, the vast majority of robust linear regression estimators has focused on robustness against rowwise contamination. Even so called `high breakdown' estimators rely on the assumption that a majority of rows of the regression matrix is not affected by outliers. Only very recently, the first cellwise robust regression estimation methods have been developed. In this paper, we define robust oracle properties, which an estimator must have in order to perform robust model selection for under-determined, or ill-conditioned linear regression models that are contaminated by cellwise outliers in the regression matrix. We propose and analyze a robustly weighted and adaptive Lasso type regularization term which takes into account cellwise outliers for model selection. The proposed regularization term is integrated into the objective function of the MM-estimator, which yields the proposed MM-Robust Weighted Adaptive Lasso (MM-RWAL), for which we prove that at least the weak robust oracle properties hold. A performance comparison to existing robust Lasso estimators is provided using Monte Carlo experiments. Further, the MM-RWAL is applied to determine the temporal releases of the European Tracer Experiment (ETEX) at the source location. This ill-conditioned linear inverse problem contains cellwise and rowwise outliers and is sparse both in the regression matrix and the parameter vector. The proposed RWAL penalty is not limited to the MM-estimator but can easily be integrated into the objective function of other robust estimators.
Apr 21 2017 stat.AP
In this paper, direction-of-arrival (DOA) estimation using non-coherent processing for partly calibrated arrays composed of multiple subarrays is considered. The subarrays are assumed to compute locally the sample covariance matrices of their measurements and communicate them to the processing center. A sufficient condition for the unique identifiability of the sources in the aforementioned non-coherent processing scheme is presented. We prove that, under mild conditions, with the non-coherent system of subarrays, it is possible to identify more sources than identifiable by each individual subarray. This property of non-coherent processing has not been investigated before. We derive the Maximum Likelihood estimator (MLE) for DOA estimation at the processing center using the sample covariance matrices received from the subarrays. Moreover, the Cramer-Rao Bound (CRB) for our measurement model is derived and is used to assess the presented DOA estimators. The behaviour of the CRB at high signal-to-noise ratio (SNR) is analyzed. In contrast to coherent processing, we prove that the CRB approaches zero at high SNR only if at least one subarray can identify the sources individually.
Mar 20 2017 stat.ME
A distributed multi-speaker voice activity detection (DM-VAD) method for wireless acoustic sensor networks (WASNs) is proposed. DM-VAD is required in many signal processing applications, e.g. distributed speech enhancement based on multi-channel Wiener filtering, but is non-existent up to date. The proposed method neither requires a fusion center nor prior knowledge about the node positions, microphone array orientations or the number of observed sources. It consists of two steps: (i) distributed source-specific energy signal unmixing (ii) energy signal based voice activity detection. Existing computationally efficient methods to extract source-specific energy signals from the mixed observations, e.g., multiplicative non-negative independent component analysis (MNICA) quickly loose performance with an increasing number of sources, and require a fusion center. To overcome these limitations, we introduce a distributed energy signal unmixing method based on a source-specific node clustering method to locate the nodes around each source. To determine the number of sources that are observed in the WASN, a source enumeration method that uses a Lasso penalized Poisson generalized linear model is developed. Each identified cluster estimates the energy signal of a single (dominant) source by applying a two-component MNICA. The VAD problem is transformed into a clustering task, by extracting features from the energy signals and applying K-means type clustering algorithms. All steps of the proposed method are evaluated using numerical experiments. A VAD accuracy of $> 85 \%$ is achieved for a challenging scenario where 20 nodes observe 7 sources in a simulated reverberant rectangular room.
We consider the problem of decentralized clustering and estimation over multi-task networks, where agents infer and track different models of interest. The agents do not know beforehand which model is generating their own data. They also do not know which agents in their neighborhood belong to the same cluster. We propose a decentralized clustering algorithm aimed at identifying and forming clusters of agents of similar objectives, and at guiding cooperation to enhance the inference performance. One key feature of the proposed technique is the integration of the learning and clustering tasks into a single strategy. We analyze the performance of the procedure and show that the error probabilities of types I and II decay exponentially to zero with the step-size parameter. While links between agents following different objectives are ignored in the clustering process, we nevertheless show how to exploit these links to relay critical information across the network for enhanced performance. Simulation results illustrate the performance of the proposed method in comparison to other useful techniques.
Oct 24 2016 stat.ML
We present a sparse estimation and dictionary learning framework for compressed fiber sensing based on a probabilistic hierarchical sparse model. To handle severe dictionary coherence, selective shrinkage is achieved using a Weibull prior, which can be related to non-convex optimization with $p$-norm constraints for $0 < p < 1$. In addition, we leverage the specific dictionary structure to promote collective shrinkage based on a local similarity model. This is incorporated in form of a kernel function in the joint prior density of the sparse coefficients, thereby establishing a Markov random field-relation. Approximate inference is accomplished using a hybrid technique that combines Hamilton Monte Carlo and Gibbs sampling. To estimate the dictionary parameter, we pursue two strategies, relying on either a deterministic or a probabilistic model for the dictionary parameter. In the first strategy, the parameter is estimated based on alternating estimation. In the second strategy, it is jointly estimated along with the sparse coefficients. The performance is evaluated in comparison to an existing method in various scenarios using simulations and experimental data.
We propose a compressed sampling and dictionary learning framework for fiber-optic sensing using wavelength-tunable lasers. A redundant dictionary is generated from a model for the reflected sensor signal. Imperfect prior knowledge is considered in terms of uncertain local and global parameters. To estimate a sparse representation and the dictionary parameters, we present an alternating minimization algorithm that is equipped with a pre-processing routine to handle dictionary coherence. The support of the obtained sparse signal indicates the reflection delays, which can be used to measure impairments along the sensing fiber. The performance is evaluated by simulations and experimental data for a fiber sensor system with common core architecture.
Jul 06 2016 stat.ME
A new robust and statistically efficient estimator for ARMA models called the bounded influence propagation (BIP) \tau-estimator is proposed. The estimator incorporates an auxiliary model, which prevents the propagation of outliers. Strong consistency and asymptotic normality of the estimator for ARMA models that are driven by independently and identically distributed (iid) innovations with symmetric distributions are established. To analyze the infinitesimal effect of outliers on the estimator, the influence function is derived and computed explicitly for an AR(1) model with additive outliers. To obtain estimates for the AR(p) model, a robust Durbin-Levinson type and a forward-backward algorithm are proposed. An iterative algorithm to robustly obtain ARMA(p,q) parameter estimates is also presented. The problem of finding a robust initialization is addressed, which for orders p+q>2 is a non-trivial matter. Numerical experiments are conducted to compare the finite sample performance of the proposed estimator to existing robust methodologies for different types of outliers both in terms of average and of worst-case performance, as measured by the maximum bias curve. To illustrate the practical applicability of the proposed estimator, a real-data example of outlier cleaning for R-R interval plots derived from electrocardiographic (ECG) data is considered. The proposed estimator is not limited to biomedical applications, but is also useful in any real-world problem whose observations can be modeled as an ARMA process disturbed by outliers or impulsive noise.
Jun 03 2016 stat.ME
Linear inverse problems are ubiquitous. Often the measurements do not follow a Gaussian distribution. Additionally, a model matrix with a large condition number can complicate the problem further by making it ill-posed. In this case, the performance of popular estimators may deteriorate significantly. We have developed a new estimator that is both nearly optimal in the presence of Gaussian errors while being also robust against outliers. Furthermore, it obtains meaningful estimates when the problem is ill-posed through the inclusion of $\ell_1$ and $\ell_2$ regularizations. The computation of our estimate involves minimizing a non-convex objective function. Hence, we are not guaranteed to find the global minimum in a reasonable amount of time. Thus, we propose two algorithms that converge to a good local minimum in a reasonable (and adjustable) amount of time, as an approximation of the global minimum. We also analyze how the introduction of the regularization term affects the statistical properties of our estimator. We confirm high robustness against outliers and asymptotic efficiency for Gaussian distributions by deriving measures of robustness such as the influence function, sensitivity curve, bias, asymptotic variance, and mean square error. We verify the theoretical results using numerical experiments and show that the proposed estimator outperforms recently proposed methods, especially for increasing amounts of outlier contamination. Python code for all of the algorithms are available online in the spirit of reproducible research.
Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used e.g. for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g. they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.
Inverse reinforcement learning (IRL) has become a useful tool for learning behavioral models from demonstration data. However, IRL remains mostly unexplored for multi-agent systems. In this paper, we show how the principle of IRL can be extended to homogeneous large-scale problems, inspired by the collective swarming behavior of natural systems. In particular, we make the following contributions to the field: 1) We introduce the swarMDP framework, a sub-class of decentralized partially observable Markov decision processes endowed with a swarm characterization. 2) Exploiting the inherent homogeneity of this framework, we reduce the resulting multi-agent IRL problem to a single-agent one by proving that the agent-specific value functions in this model coincide. 3) To solve the corresponding control problem, we propose a novel heterogeneous learning scheme that is particularly tailored to the swarm setting. Results on two example systems demonstrate that our framework is able to produce meaningful local reward models from which we can replicate the observed global system dynamics.
Feb 11 2016 stat.AP
Glucometers present an important self-monitoring tool for diabetes patients and therefore must exhibit high accu- racy as well as good usability features. Based on an invasive, photometric measurement principle that drastically reduces the volume of the blood sample needed from the patient, we present a framework that is capable of dealing with small blood samples, while maintaining the required accuracy. The framework consists of two major parts: 1) image segmentation; and 2) convergence detection. Step 1) is based on iterative mode-seeking methods to estimate the intensity value of the region of interest. We present several variations of these methods and give theoretical proofs of their convergence. Our approach is able to deal with changes in the number and position of clusters without any prior knowledge. Furthermore, we propose a method based on sparse approximation to decrease the computational load, while maintaining accuracy. Step 2) is achieved by employing temporal tracking and prediction, herewith decreasing the measurement time, and, thus, improving usability. Our framework is validated on several real data sets with different characteristics. We show that we are able to estimate the underlying glucose concentration from much smaller blood samples than is currently state-of-the- art with sufficient accuracy according to the most recent ISO standards and reduce measurement time significantly compared to state-of-the-art methods.
Jul 01 2015 stat.AP
In this paper, we consider performance analysis of the decentralized power method for the eigendecomposition of the sample covariance matrix based on the averaging consensus protocol. An analytical expression of the second order statistics of the eigenvectors obtained from the decentralized power method which is required for computing the mean square error (MSE) of subspace-based estimators is presented. We show that the decentralized power method is not an asymptotically consistent estimator of the eigenvectors of the true measurement covariance matrix unless the averaging consensus protocol is carried out over an infinitely large number of iterations. Moreover, we introduce the decentralized ESPRIT algorithm which yields fully decentralized direction-of-arrival (DOA) estimates. Based on the performance analysis of the decentralized power method, we derive an analytical expression of the MSE of DOA estimators using the decentralized ESPRIT algorithm. The validity of our asymptotic results is demonstrated by simulations.
Multi-target tracking is an important problem in civilian and military applications. This paper investigates multi-target tracking in distributed sensor networks. Data association, which arises particularly in multi-object scenarios, can be tackled by various solutions. We consider sequential Monte Carlo implementations of the Probability Hypothesis Density (PHD) filter based on random finite sets. This approach circumvents the data association issue by jointly estimating all targets in the region of interest. To this end, we develop the Diffusion Particle PHD Filter (D-PPHDF) as well as a centralized version, called the Multi-Sensor Particle PHD Filter (MS-PPHDF). Their performance is evaluated in terms of the Optimal Subpattern Assignment (OSPA) metric, benchmarked against a distributed extension of the Posterior Cramér-Rao Lower Bound (PCRLB), and compared to the performance of an existing distributed PHD Particle Filter.
Under some mild Markov assumptions it is shown that the problem of designing optimal sequential tests for two simple hypotheses can be formulated as a linear program. The result is derived by investigating the Lagrangian dual of the sequential testing problem, which is an unconstrained optimal stopping problem, depending on two unknown Lagrangian multipliers. It is shown that the derivative of the optimal cost function with respect to these multipliers coincides with the error probabilities of the corresponding sequential test. This property is used to formulate an optimization problem that is jointly linear in the cost function and the Lagrangian multipliers and an be solved for both with off-the-shelf algorithms. To illustrate the procedure, optimal sequential tests for Gaussian random sequences with different dependency structures are derived, including the Gaussian AR(1) process.
Jan 22 2015 stat.OT
A robust minimax test for two composite hypotheses, which are determined by the neighborhoods of two nominal distributions with respect to a set of distances - called $\alpha-$divergence distances, is proposed. Sion's minimax theorem is adopted to characterize the saddle value condition. Least favorable distributions, the robust decision rule and the robust likelihood ratio test are derived. If the nominal probability distributions satisfy a symmetry condition, the design procedure is shown to be simplified considerably. The parameters controlling the degree of robustness are bounded from above and the bounds are shown to be resulting from a solution of a set of equations. The simulations performed evaluate and exemplify the theoretical derivations.