Recent advances in the field of inverse reinforcement learning (IRL) have yielded sophisticated frameworks which relax the original modeling assumption that the behavior of an observed agent reflects only a single intention. Instead, the demonstration data is typically divided into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework which provides smooth policy estimates that are consistent with the expert's plan and significantly outperform existing IRL solutions. Most notably, our framework naturally handles situations where the intentions of the agent change with time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in an active learning setting to guide the demonstration process of the expert.
We consider the problem of sequential binary hypothesis testing with a distributed sensor network in a non-Gaussian noise environment. To this end, we present a general formulation of the Consensus + Innovations Sequential Probability Ratio Test (CISPRT). Furthermore, we introduce two different concepts for robustifying the CISPRT and propose four different algorithms, namely, the Least-Favorable-Density-CISPRT, the Median-CISPRT, the M-CISPRT, and the Myriad-CISPRT. Subsequently, we analyze their suitability for different binary hypothesis tests before verifying and evaluating their performance in a shift-in-mean and a shift-in-variance scenario.
We derive a new Bayesian Information Criterion (BIC) from first principles by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as an important milestone when deriving the BIC for specific data distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed observations. We show that incorporating data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the optimal cluster number is determined as the one associated to the model for which the proposed BIC is maximal. The performance of the proposed criterion is tested using synthetic and real data sets. Despite the fact that the original BIC is a generic criterion which does not include information about the specific model selection problem at hand, it has been widely used in the literature to estimate the number of clusters in an observed data set. We, therefore, consider it as a benchmark comparison. Simulation results show that our proposed criterion outperforms the existing cluster enumeration methods that are based on the original BIC.
Distributed signal processing for wireless sensor networks enables that different devices cooperate to solve different signal processing tasks. A crucial first step is to answer the question: who observes what? Recently, several distributed algorithms have been proposed, which frame the signal/object labelling problem in terms of cluster analysis after extracting source-specific features, however, the number of clusters is assumed to be known. We propose a new method called Gravitational Clustering (GC) to adaptively estimate the time-varying number of clusters based on a set of feature vectors. The key idea is to exploit the physical principle of gravitational force between mass units: streaming-in feature vectors are considered as mass units of fixed position in the feature space, around which mobile mass units are injected at each time instant. The cluster enumeration exploits the fact that the highest attraction on the mobile mass units is exerted by regions with a high density of feature vectors, i.e., gravitational clusters. By sharing estimates among neighboring nodes via a diffusion-adaptation scheme, cooperative and distributed cluster enumeration is achieved. Numerical experiments concerning robustness against outliers, convergence and computational complexity are conducted. The application in a distributed cooperative multi-view camera network illustrates the applicability to real-world problems.
The problem of minimizing convex functionals of probability distributions is solved under the assumption that the density of every distribution is bounded from above and below. A system of sufficient and necessary first-order optimality conditions as well as a bound on the optimality gap of feasible candidate solutions are derived. Based on these results, two numerical algorithms are proposed that iteratively solve the system of optimality conditions on a grid of discrete points. Both algorithms use a block coordinate descent strategy and terminate once the optimality gap falls below the desired tolerance. While the first algorithm is conceptually simpler and more efficient, it is not guaranteed to converge for objective functions that are not strictly convex. This shortcoming is overcome in the second algorithm, which uses an additional outer proximal iteration, and, which is proven to converge under mild assumptions. Two examples are given to demonstrate the theoretical usefulness of the optimality conditions as well as the high efficiency and accuracy of the proposed numerical algorithms.
Feb 28 2017 cs.CV
Hyperspectral imaging is an important tool in remote sensing, allowing for accurate analysis of vast areas. Due to a low spatial resolution, a pixel of a hyperspectral image rarely represents a single material, but rather a mixture of different spectra. HSU aims at estimating the pure spectra present in the scene of interest, referred to as endmembers, and their fractions in each pixel, referred to as abundances. Today, many HSU algorithms have been proposed, based either on a geometrical or statistical model. While most methods assume that the number of endmembers present in the scene is known, there is only little work about estimating this number from the observed data. In this work, we propose a Bayesian nonparametric framework that jointly estimates the number of endmembers, the endmembers itself, and their abundances, by making use of the Indian Buffet Process as a prior for the endmembers. Simulation results and experiments on real data demonstrate the effectiveness of the proposed algorithm, yielding results comparable with state-of-the-art methods while being able to reliably infer the number of endmembers. In scenarios with strong noise, where other algorithms provide only poor results, the proposed approach tends to overestimate the number of endmembers slightly. The additional endmembers, however, often simply represent noisy replicas of present endmembers and could easily be merged in a post-processing step.
Learning from demonstrations has gained increasing interest in the recent past, enabling an agent to learn how to make decisions by observing an experienced teacher. While many approaches have been proposed to solve this problem, there is only little work that focuses on reasoning about the observed behavior. We assume that, in many practical problems, an agent makes its decision based on latent features, indicating a certain action. Therefore, we propose a generative model for the states and actions. Inference reveals the number of features, the features, and the policies, allowing us to learn and to analyze the underlying structure of the observed behavior. Further, our approach enables prediction of actions for new states. Simulations are used to assess the performance of the algorithm based upon this model. Moreover, the problem of learning a driver's behavior is investigated, demonstrating the performance of the proposed model in a real-world scenario.
The sequential analysis of the problem of joint signal detection and signal-to-noise ratio (SNR) estimation for a linear Gaussian observation model is considered. The problem is posed as an optimization setup where the goal is to minimize the number of samples required to achieve the desired (i) type I and type II error probabilities and (ii) mean squared error performance. This optimization problem is reduced to a more tractable formulation by transforming the observed signal and noise sequences to a single sequence of Bernoulli random variables; joint detection and estimation is then performed on the Bernoulli sequence. This transformation renders the problem easily solvable, and results in a computationally simpler sufficient statistic compared to the one based on the (untransformed) observation sequences. Experimental results demonstrate the advantages of the proposed method, making it feasible for applications having strict constraints on data storage and computation.
We consider the problem of decentralized clustering and estimation over multi-task networks, where agents infer and track different models of interest. The agents do not know beforehand which model is generating their own data. They also do not know which agents in their neighborhood belong to the same cluster. We propose a decentralized clustering algorithm aimed at identifying and forming clusters of agents of similar objectives, and at guiding cooperation to enhance the inference performance. One key feature of the proposed technique is the integration of the learning and clustering tasks into a single strategy. We analyze the performance of the procedure and show that the error probabilities of types I and II decay exponentially to zero with the step-size parameter. While links between agents following different objectives are ignored in the clustering process, we nevertheless show how to exploit these links to relay critical information across the network for enhanced performance. Simulation results illustrate the performance of the proposed method in comparison to other useful techniques.
We propose a compressed sampling and dictionary learning framework for fiber-optic sensing using wavelength-tunable lasers. A redundant dictionary is generated from a model for the reflected sensor signal. Imperfect prior knowledge is considered in terms of uncertain local and global parameters. To estimate a sparse representation and the dictionary parameters, we present an alternating minimization algorithm that is equipped with a pre-processing routine to handle dictionary coherence. The support of the obtained sparse signal indicates the reflection delays, which can be used to measure impairments along the sensing fiber. The performance is evaluated by simulations and experimental data for a fiber sensor system with common core architecture.
Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used e.g. for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g. they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.
Inverse reinforcement learning (IRL) has become a useful tool for learning behavioral models from demonstration data. However, IRL remains mostly unexplored for multi-agent systems. In this paper, we show how the principle of IRL can be extended to homogeneous large-scale problems, inspired by the collective swarming behavior of natural systems. In particular, we make the following contributions to the field: 1) We introduce the swarMDP framework, a sub-class of decentralized partially observable Markov decision processes endowed with a swarm characterization. 2) Exploiting the inherent homogeneity of this framework, we reduce the resulting multi-agent IRL problem to a single-agent one by proving that the agent-specific value functions in this model coincide. 3) To solve the corresponding control problem, we propose a novel heterogeneous learning scheme that is particularly tailored to the swarm setting. Results on two example systems demonstrate that our framework is able to produce meaningful local reward models from which we can replicate the observed global system dynamics.
The density band model proposed by Kassam for robust hypothesis testing is revisited in this paper. First, a novel criterion for the general characterization of least favorable distributions is proposed, which unifies existing results. This criterion is then used to derive an implicit definition of the least favorable distributions under band uncertainties. In contrast to the existing solution, it only requires two scalar values to be determined and eliminates the need for case-by-case statements. Based on this definition, a generic fixed-point algorithm is proposed that iteratively calculates the least favorable distributions for arbitrary band specifications. Finally, three different types of robust tests that emerge from band models are discussed and a numerical example is presented to illustrate their potential use in practice.
Multi-target tracking is an important problem in civilian and military applications. This paper investigates multi-target tracking in distributed sensor networks. Data association, which arises particularly in multi-object scenarios, can be tackled by various solutions. We consider sequential Monte Carlo implementations of the Probability Hypothesis Density (PHD) filter based on random finite sets. This approach circumvents the data association issue by jointly estimating all targets in the region of interest. To this end, we develop the Diffusion Particle PHD Filter (D-PPHDF) as well as a centralized version, called the Multi-Sensor Particle PHD Filter (MS-PPHDF). Their performance is evaluated in terms of the Optimal Subpattern Assignment (OSPA) metric, benchmarked against a distributed extension of the Posterior Cramér-Rao Lower Bound (PCRLB), and compared to the performance of an existing distributed PHD Particle Filter.
The minimax robust hypothesis testing problem for the case where the nominal probability distributions are subject to both modeling errors and outliers is studied in twofold. First, a robust hypothesis testing scheme based on a relative entropy distance is designed. This approach provides robustness with respect to modeling errors and is a generalization of a previous work proposed by Levy. Then, it is shown that this scheme can be combined with Huber's robust test through a composite uncertainty class, for which the existence of a saddle value condition is also proven. The composite version of the robust hypothesis testing scheme as well as the individual robust tests are extended to fixed sample size and sequential probability ratio tests. The composite model is shown to extend to robust estimation problems as well. Simulation results are provided to validate the proposed assertions.
Minimax decentralized detection is studied under two scenarios: with and without a fusion center when the source of uncertainty is the Bayesian prior. When there is no fusion center, the constraints in the network design are determined. Both for a single decision maker and multiple decision makers, the maximum loss in detection performance due to minimax decision making is obtained. In the presence of a fusion center, the maximum loss of detection performance between with- and without fusion center networks is derived assuming that both networks are minimax robust. The results are finally generalized.