Feb 22 2017 cs.AI
There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that learn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these methods on Super Smash Bros. Melee (SSBM), a popular console fighting game. The SSBM environment has complex dynamics and partial observability, making it challenging for human and machine alike. The multi-player aspect poses an additional challenge, as the vast majority of recent advances in RL have focused on single-agent environments. Nonetheless, we will show that it is possible to train agents that are competitive against and even surpass human professionals, a new result for the multi-player video game setting.
Mobile robots are increasingly being employed for performing complex tasks in dynamic environments. Reinforcement learning (RL) methods are recognized to be promising for specifying such tasks in a relatively simple manner. However, the strong dependency between the learning method and the task to learn is a well-known problem that restricts practical implementations of RL in robotics, often requiring major modifications of parameters and adding other techniques for each particular task. In this paper we present a practical core implementation of RL which enables the learning process for multiple robotic tasks with minimal per-task tuning or none. Based on value iteration methods, this implementation includes a novel approach for action selection, called Q-biased softmax regression (QBIASSR), which avoids poor performance of the learning process when the robot reaches new unexplored states. Our approach takes advantage of the structure of the state space by attending the physical variables involved (e.g., distances to obstacles, X,Y,\theta pose, etc.), thus experienced sets of states may favor the decision-making process of unexplored or rarely-explored states. This improvement has a relevant role in reducing the tuning of the algorithm for particular tasks. Experiments with real and simulated robots, performed with the software framework also introduced here, show that our implementation is effectively able to learn different robotic tasks without tuning the learning method. Results also suggest that the combination of true online SARSA(\lambda) with QBIASSR can outperform the existing RL core algorithms in low-dimensional robotic tasks.
Reason and inference require process as well as memory skills by humans. Neural networks are able to process tasks like image recognition (better than humans) but in memory aspects are still limited (by attention mechanism, size). Recurrent Neural Network (RNN) and it's modified version LSTM are able to solve small memory contexts, but as context becomes larger than a threshold, it is difficult to use them. The Solution is to use large external memory. Still, it poses many challenges like, how to train neural networks for discrete memory representation, how to describe long term dependencies in sequential data etc. Most prominent neural architectures for such tasks are Memory networks: inference components combined with long term memory and Neural Turing Machines: neural networks using external memory resources. Also, additional techniques like attention mechanism, end to end gradient descent on discrete memory representation are needed to support these solutions. Preliminary results of above neural architectures on simple algorithms (sorting, copying) and Question Answering (based on story, dialogs) application are comparable with the state of the art. In this paper, I explain these architectures (in general), the additional techniques used and the results of their application.
In order to obtain reliable accuracy estimates for automatic MOOC dropout predictors, it is important to train and test them in a manner consistent with how they will be used in practice. Yet most prior research on MOOC dropout prediction has measured test accuracy on the same course used for training the classifier, which can lead to overly optimistic accuracy estimates. In order to understand better how accuracy is affected by the training+testing regime, we compared the accuracy of a standard dropout prediction architecture (clickstream features + logistic regression) across 4 different training paradigms. Results suggest that (1) training and testing on the same course ("post-hoc") can overestimate accuracy by several percentage points; (2) dropout classifiers trained on proxy labels based on students' persistence are surprisingly competitive with post-hoc training (87.33% versus 90.20% AUC averaged over 8 weeks of 40 HarvardX MOOCs); and (3) classifier performance does not vary significantly with the academic discipline. Finally, we also research new dropout prediction architectures based on deep, fully-connected, feed-forward neural networks and find that (4) networks with as many as 5 hidden layers can statistically significantly increase test accuracy over that of logistic regression.
We present a novel algorithm that synthesizes imperative programs for introductory programming courses. Given a set of input-output examples and a partial program, our algorithm generates a complete program that is consistent with every example. Our key idea is to combine enumerative program synthesis and static analysis, which aggressively prunes out a large search space while guaranteeing to find, if any, a correct solution. We have implemented our algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in introductory programming courses. The results show that SIMPL is able to solve the benchmark problems in 6.6 seconds on average.
Arising naturally in many fields, optimal stopping problems consider the question of deciding when to stop an observation-generating process. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.
Feb 22 2017 cs.AI
Generative model has been one of the most common approaches for solving the Dialog State Tracking Problem with the capabilities to model the dialog hypotheses in an explicit manner. The most important task in such Bayesian networks models is constructing the most reliable user models by learning and reflecting the training data into the probability distribution of user actions conditional on networks states. This paper provides an overall picture of the learning process in a Bayesian framework with an emphasize on the state-of-the-art theoretical analyses of the Expectation Maximization learning algorithm.