Nov 29 2017 cs.AI
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. , we open-source a fast hybrid CPU-GPU implementation of CASL.
A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware.
Sep 21 2017 cs.SY
This paper proposes a statistical verification framework using Gaussian processes (GPs) for simulation-based verification of stochastic nonlinear systems with parametric uncertainties. Given a small number of stochastic simulations, the proposed framework constructs a GP regression model and predicts the system's performance over the entire set of possible uncertainties. Included in the framework is a new metric to estimate the confidence in those predictions based on the variance of the GP's cumulative distribution function. This variance-based metric forms the basis of active sampling algorithms that aim to minimize prediction error through careful selection of simulations. In three case studies, the new active sampling algorithms demonstrate up to a 35% improvement in prediction error over other approaches and are able to correctly identify regions with low prediction confidence through the variance metric.
Sep 21 2017 cs.RO
Due to the distributed nature of cooperative simultaneous localization and mapping (CSLAM), detecting inter-robot loop closures necessitates sharing sensory data with other robots. A naive approach to data sharing can easily lead to a waste of mission-critical resources. This paper investigates the logistical aspects of CSLAM. Particularly, we present a general resource-efficient communication planning framework that takes into account both the total amount of exchanged data and the induced division of labor between the participating robots. Compared to other state-of-the-art approaches, our framework is able to verify the same set of potential inter-robot loop closures while exchanging considerably less data and influencing the induced workloads. We present a fast algorithm for finding globally optimal communication policies, and theoretical analysis to characterize the necessary and sufficient conditions under which simpler strategies are optimal. The proposed framework is extensively evaluated with data from the KITTI odometry benchmark datasets.
Sep 21 2017 cs.RO
Sparsity has been widely recognized as crucial for efficient optimization in graph-based SLAM. Because the sparsity and structure of the SLAM graph reflect the set of incorporated measurements, many methods for sparsification have been proposed in hopes of reducing computation. These methods often focus narrowly on reducing edge count without regard for structure at a global level. Such structurally-naive techniques can fail to produce significant computational savings, even after aggressive pruning. In contrast, simple heuristics such as measurement decimation and keyframing are known to reliably produce significant computation reductions. To demonstrate why, we propose a quantitative metric called elimination complexity (EC) that bridges the existing analytic gap between graph structure and computation. EC quantifies the complexity of the primary computational bottleneck: the factorization step of a Gauss-Newton iteration. Using this metric, we show analytically that decimation and keyframing impose favorable global structures and therefore achieve computation reductions on the order of $r^2/9$ and $r^3$ , respectively, where $r$ is the pruning rate. We additionally present numerical results that show EC provides a good approximation of computation in both batch and incremental (iSAM2) optimization and demonstrate that pruning methods promoting global sparsity patterns outperform those that do not.
This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.
Jun 15 2017 cs.SY
Nonlinear, adaptive, or otherwise complex control techniques are increasingly relied upon to ensure the safety of systems operating in uncertain environments; however, the nonlinearity of the resulting closed-loop trajectories complicates verification that the system does in fact satisfy those safety requirements. Current techniques use analytical function-based approaches to certify the safe operation of the closed-loop system over a set of possible perturbations before actual implementation. The challenge is with a poor selection of the analytical function, that certification process could produce very conservative results. Furthermore, it is very difficult to select a suitable analytical function in some applications. This work presents a data-driven verification procedure that instead constructs statistical learning models from simulated training data to separate the set of possible perturbations into "safe" and "unsafe" subsets. Binary evaluations of closed-loop system requirement satisfaction at various realizations of the uncertainties are obtained through temporal logic robustness metrics, which are then used to construct predictive models of requirement satisfaction over the full set of possible uncertainties. As the accuracy of these predictive statistical models is inherently coupled to the quality of the training data, an active learning algorithm selects additional sample points in order to maximize the expected change in the data-driven model and thus, indirectly, minimize the prediction error. This closed-loop verification procedure is demonstrated on various case studies and demonstrates improvements over both baseline analytical certificates and data-driven models that utilize passive learning.
May 04 2017 cs.SY
Increasingly demanding performance requirements for dynamical systems motivates the adoption of nonlinear and adaptive control techniques. One challenge is the nonlinearity of the resulting closed-loop system complicates verification that the system does satisfy the requirements at all possible operating conditions. This paper presents a data-driven procedure for efficient simulation-based, statistical verification without the reliance upon exhaustive simulations. In contrast to previous work, this approach introduces a method for online estimation of prediction accuracy without the use of external validation sets. This work also develops a novel active sampling algorithm that iteratively selects additional training points in order to maximize the accuracy of the predictions while still limited to a sample budget. Three case studies demonstrate the utility of the new approach and the results show up to a 50% improvement over state-of-the-art techniques.
Mapping and self-localization in unknown environments are fundamental capabilities in many robotic applications. These tasks typically involve the identification of objects as unique features or landmarks, which requires the objects both to be detected and then assigned a unique identifier that can be maintained when viewed from different perspectives and in different images. The \textitdata association and \textitsimultaneous localization and mapping (SLAM) problems are, individually, well-studied in the literature. But these two problems are inherently tightly coupled, and that has not been well-addressed. Without accurate SLAM, possible data associations are combinatorial and become intractable easily. Without accurate data association, the error of SLAM algorithms diverge easily. This paper proposes a novel nonparametric pose graph that models data association and SLAM in a single framework. An algorithm is further introduced to alternate between inferring data association and performing SLAM. Experimental results show that our approach has the new capability of associating object detections and localizing objects at the same time, leading to significantly better performance on both the data association and SLAM problems than achieved by considering only one and ignoring imperfections in the other.
For robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules. However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.
Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrently-exploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to learn and store distinct policies for each task, but in practice identities of tasks are often non-observable, making these approaches inapplicable. This paper formalizes and addresses the problem of multi-task multi-agent reinforcement learning under partial observability. We introduce a decentralized single-task learning approach that is robust to concurrent interactions of teammates, and present an approach for distilling single-task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity.
This paper presents the first ever approach for solving \emphcontinuous-observation Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and their semi-Markovian counterparts, Dec-POSMDPs. This contribution is especially important in robotics, where a vast number of sensors provide continuous observation data. A continuous-observation policy representation is introduced using Stochastic Kernel-based Finite State Automata (SK-FSAs). An SK-FSA search algorithm titled Entropy-based Policy Search using Continuous Kernel Observations (EPSCKO) is introduced and applied to the first ever continuous-observation Dec-POMDP/Dec-POSMDP domain, where it significantly outperforms state-of-the-art discrete approaches. This methodology is equally applicable to Dec-POMDPs and Dec-POSMDPs, though the empirical analysis presented focuses on Dec-POSMDPs due to their higher scalability. To improve convergence, an entropy injection policy search acceleration approach for both continuous and discrete observation cases is also developed and shown to improve convergence rates without degrading policy quality.
Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerometer data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.
Mar 08 2017 cs.RO
In autonomous Mobility on Demand (MOD) systems, customers request rides from a fleet of shared vehicles that can be automatically positioned in response to customer demand. Recent approaches to MOD systems have focused on environments where customers can only request rides through an app or by waiting at a station. This paper develops MOD fleet management approaches for ride hailing, where customers may instead request rides simply by hailing a passing vehicle, an approach of particular importance for campus MOD systems. The challenge for ride hailing is that customer demand is not explicitly provided as it would be with an app, but rather customers are only served if a vehicle happens to be located at the arrival location. This work focuses on maximizing the number of served hailing customers in an MOD system by learning and utilizing customer demand. A Bayesian framework is used to define a novel customer demand model which incorporates observed pedestrian traffic to estimate customer arrival locations with a quantification of uncertainty. An exploration planner is proposed which routes MOD vehicles in order to reduce arrival rate uncertainty. A robust ride hailing fleet management planner is proposed which routes vehicles under the presence of uncertainty using a chance-constrained formulation. Simulation of a real-world MOD system on MIT's campus demonstrates the effectiveness of the planners. The customer demand model and exploration planner are demonstrated to reduce estimation error over time and the ride hailing planner is shown to improve the fraction of served customers in the system by 73\% over a baseline exploration approach.
Mar 08 2017 cs.RO
Mobility On Demand (MOD) systems are revolutionizing transportation in urban settings by improving vehicle utilization and reducing parking congestion. A key factor in the success of an MOD system is the ability to measure and respond to real-time customer arrival data. Real time traffic arrival rate data is traditionally difficult to obtain due to the need to install fixed sensors throughout the MOD network. This paper presents a framework for measuring pedestrian traffic arrival rates using sensors onboard the vehicles that make up the MOD fleet. A novel distributed fusion algorithm is presented which combines onboard LIDAR and camera sensor measurements to detect trajectories of pedestrians with a 90% detection hit rate with 1.5 false positives per minute. A novel moving observer method is introduced to estimate pedestrian arrival rates from pedestrian trajectories collected from mobile sensors. The moving observer method is evaluated in both simulation and hardware and is shown to achieve arrival rate estimates comparable to those that would be obtained with multiple stationary sensors.
Sep 27 2016 cs.MA
Finding feasible, collision-free paths for multiagent systems can be challenging, particularly in non-communicating scenarios where each agent's intent (e.g. goal) is unobservable to the others. In particular, finding time efficient paths often requires anticipating interaction with neighboring agents, the process of which can be computationally prohibitive. This work presents a decentralized multiagent collision avoidance algorithm based on a novel application of deep reinforcement learning, which effectively offloads the online computation (for predicting interaction patterns) to an offline learning procedure. Specifically, the proposed approach develops a value network that encodes the estimated time to the goal given an agent's joint configuration (positions and velocities) with its neighbors. Use of the value network not only admits efficient (i.e., real-time implementable) queries for finding a collision-free velocity vector, but also considers the uncertainty in the other agents' motion. Simulation results show more than 26 percent improvement in paths quality (i.e., time to reach the goal) when compared with optimal reciprocal collision avoidance (ORCA), a state-of-the-art collision avoidance strategy.
Sep 27 2016 cs.MA
Autonomous Mobility On Demand (MOD) systems can utilize fleet management strategies in order to provide a high customer quality of service (QoS). Previous works on autonomous MOD systems have developed methods for rebalancing single capacity vehicles, where QoS is maintained through large fleet sizing. This work focuses on MOD systems utilizing a small number of vehicles, such as those found on a campus, where additional vehicles cannot be introduced as demand for rides increases. A predictive positioning method is presented for improving customer QoS by identifying key locations to position the fleet in order to minimize expected customer wait time. Ridesharing is introduced as a means for improving customer QoS as arrival rates increase. However, with ridesharing perceived QoS is dependent on an often unknown customer preference. To address this challenge, a customer ratings model, which learns customer preference from a 5-star rating, is developed and incorporated directly into a ridesharing algorithm. The predictive positioning and ridesharing methods are applied to simulation of a real-world campus MOD system.A combined predictive positioning and ridesharing approach is shown to reduce customer service times by up to 29% and the customer ratings model is shown to provide the best overall MOD fleet management performance over a range of customer preferences.
Optimal control in non-stationary Markov decision processes (MDP) is a challenging problem. The aim in such a control problem is to maximize the long-term discounted reward when the transition dynamics or the reward function can change over time. When a prior knowledge of change statistics is available, the standard Bayesian approach to this problem is to reformulate it as a partially observable MDP (POMDP) and solve it using approximate POMDP solvers, which are typically computationally demanding. In this paper, the problem is analyzed through the viewpoint of quickest change detection (QCD), a set of tools for detecting a change in the distribution of a sequence of random variables. Current methods applying QCD to such problems only passively detect changes by following prescribed policies, without optimizing the choice of actions for long term performance. We demonstrate that ignoring the reward-detection trade-off can cause a significant loss in long term rewards, and propose a two threshold switching strategy to solve the issue. A non-Bayesian problem formulation is also proposed for scenarios where a Bayesian formulation cannot be defined. The performance of the proposed two threshold strategy is examined through numerical analysis on a non-stationary MDP task, and the strategy outperforms the state-of-the-art QCD methods in both Bayesian and non-Bayesian settings.
Structural regularities in man-made environments reflect in the distribution of their surface normals. Describing these surface normal distributions is important in many computer vision applications, such as scene understanding, plane segmentation, and regularization of 3D reconstructions. Based on the small-variance limit of Bayesian nonparametric von-Mises-Fisher (vMF) mixture distributions, we propose two new flexible and efficient k-means-like clustering algorithms for directional data such as surface normals. The first, DP-vMF-means, is a batch clustering algorithm derived from the Dirichlet process (DP) vMF mixture. Recognizing the sequential nature of data collection in many applications, we extend this algorithm to DDP-vMF-means, which infers temporally evolving cluster structure from streaming data. Both algorithms naturally respect the geometry of directional data, which lies on the unit sphere. We demonstrate their performance on synthetic directional data and real 3D surface normals from RGB-D sensors. While our experiments focus on 3D data, both algorithms generalize to high dimensional directional data such as protein backbone configurations and semantic word vectors.
May 04 2016 cs.CV
Robust environment perception is essential for decision-making on robots operating in complex domains. Principled treatment of uncertainty sources in a robot's observation model is necessary for accurate mapping and object detection. This is important not only for low-level observations (e.g., accelerometer data), but for high-level observations such as semantic object labels as well. This paper presents an approach for filtering sequences of object classification probabilities using online modeling of the noise characteristics of the classifier outputs. A hierarchical Bayesian approach is used to model per-class noise distributions, while simultaneously allowing sharing of high-level noise characteristics between classes. The proposed filtering scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to outperform classification accuracy of existing methods. The paper also presents real-time filtered classification hardware experiments running fully onboard a moving quadrotor, where the proposed approach is demonstrated to work in a challenging domain where noise-agnostic filtering fails.
Mar 17 2016 cs.CV
Point cloud alignment is a common problem in computer vision and robotics, with applications ranging from 3D object recognition to reconstruction. We propose a novel approach to the alignment problem that utilizes Bayesian nonparametrics to describe the point cloud and surface normal densities, and branch and bound (BB) optimization to recover the relative transformation. BB uses a novel, refinable, near-uniform tessellation of rotation space using 4D tetrahedra, leading to more efficient optimization compared to the common axis-angle tessellation. We provide objective function bounds for pruning given the proposed tessellation, and prove that BB converges to the optimum of the cost function along with providing its computational complexity. Finally, we empirically demonstrate the efficiency of the proposed approach as well as its robustness to real-world conditions such as missing data and partial overlap.
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. In the proposed framework, processing nodes receive a sequence of data minibatches, compute a variational posterior for each, and make asynchronous streaming updates to a central model. In contrast to previous algorithms, the proposed framework is truly streaming, distributed, asynchronous, learning-rate-free, and truncation-free. The key challenge in developing the framework, arising from the fact that BNP models do not impose an inherent ordering on their components, is finding the correspondence between minibatch and central BNP posterior components before performing each update. To address this, the paper develops a combinatorial optimization problem over component correspondences, and provides an efficient solution technique. The paper concludes with an application of the methodology to the DP mixture model, with experimental results demonstrating its practical scalability and performance.
Sep 29 2015 cs.RO
Active SLAM is the task of actively planning robot paths while simultaneously building a map and localizing within. Existing work has focused on planning paths with occupancy grid maps, which do not scale well and suffer from long term drift. This work proposes a Topological Feature Graph (TFG) representation that scales well and develops an active SLAM algorithm with it. The TFG uses graphical models, which utilize independences between variables, and enables a unified quantification of exploration and exploitation gains with a single entropy metric. Hence, it facilitates a natural and principled balance between map exploration and refinement. A probabilistic roadmap path-planner is used to generate robot paths in real time. Experimental results demonstrate that the proposed approach achieves better accuracy than a standard grid-map based approach while requiring orders of magnitude less computation and memory resources.
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emphdecentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the decentralized partially observable semi-Markov decision process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed method's performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems.
In sparse target inference problems it has been shown that significant gains can be achieved by adaptive sensing using convex criteria. We generalize previous work on adaptive sensing to (a) include multiple classes of targets with different levels of importance and (b) accommodate multiple sensor models. New optimization policies are developed to allocate a limited resource budget to simultaneously locate, classify and estimate a sparse number of targets embedded in a large space. Upper and lower bounds on the performance of the proposed policies are derived by analyzing a baseline policy, which allocates resources uniformly across the scene, and an oracle policy which has a priori knowledge of the target locations/classes. These bounds quantify analytically the potential benefit of adaptive sensing as a function of target frequency and importance. Numerical results indicate that the proposed policies perform close to the oracle bound (<3dB) when signal quality is sufficiently high (e.g.~performance within 3 dB for SNR above 15 dB). Moreover, the proposed policies improve on previous policies in terms of reducing estimation error, reducing misclassification probability, and increasing expected return. To account for sensors with different levels of agility, three sensor models are considered: global adaptive (GA), which can allocate different amounts of resource to each location in the space; global uniform (GU), which can allocate resources uniformly across the scene; and local adaptive (LA), which can allocate fixed units to a subset of locations. Policies that use a mixture of GU and LA sensors are shown to perform similarly to those that use GA sensors while being more easily implementable.
May 23 2014 cs.RO
To plan safe trajectories in urban environments, autonomous vehicles must be able to quickly assess the future intentions of dynamic agents. Pedestrians are particularly challenging to model, as their motion patterns are often uncertain and/or unknown a priori. This paper presents a novel changepoint detection and clustering algorithm that, when coupled with offline unsupervised learning of a Gaussian process mixture model (DPGP), enables quick detection of changes in intent and online learning of motion patterns not seen in prior training data. The resulting long-term movement predictions demonstrate improved accuracy relative to offline learning alone, in terms of both intent and trajectory prediction. By embedding these predictions within a chance-constrained motion planner, trajectories which are probabilistically safe to pedestrian motions can be identified in real-time. Hardware experiments demonstrate that this approach can accurately predict pedestrian motion patterns from onboard sensor/perception data and facilitate robust navigation within a dynamic environment.
Mar 31 2014 cs.LG
This paper presents an approximate method for performing Bayesian inference in models with conditional independence over a decentralized network of learning agents. The method first employs variational inference on each individual learning agent to generate a local approximate posterior, the agents transmit their local posteriors to other agents in the network, and finally each agent combines its set of received local posteriors. The key insight in this work is that, for many Bayesian models, approximate inference schemes destroy symmetry and dependencies in the model that are crucial to the correct application of Bayes' rule when combining the local posteriors. The proposed method addresses this issue by including an additional optimization step in the combination procedure that accounts for these broken dependencies. Experiments on synthetic and real data demonstrate that the decentralized method provides advantages in computational performance and predictive test likelihood over previous batch and distributed methods.
We describe a probabilistic framework for synthesizing control policies for general multi-robot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While Dec-POMDPs are typically intractable to solve for real-world problems, recent research on the use of macro-actions in Dec-POMDPs has significantly increased the size of problem that can be practically solved as a Dec-POMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can synthesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate.
Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.
This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets.