Training a task-completion dialogue agent with real users via reinforcement learning (RL) could be prohibitively expensive, because it requires many interactions with users. One alternative is to resort to a user simulator, while the discrepancy of between simulated and real users makes the learned policy unreliable in practice. This paper addresses these challenges by integrating planning into the dialogue policy learning based on Dyna-Q framework, and provides a more sample-efficient approach to learn the dialogue polices. The proposed agent consists of a planner trained on-line with limited real user experience that can generate large amounts of simulated experience to supplement with limited real user experience, and a policy model trained on these hybrid experiences. The effectiveness of our approach is validated on a movie-booking task in both a simulation setting and a human-in-the-loop setting.
Jan 19 2018 cs.CL
While existing machine learning models have achieved great success for sentiment classification, they typically do not explicitly capture sentiment-oriented word interaction, which can lead to poor results for fine-grained analysis at the snippet level (a phrase or sentence). Factorization Machine provides a possible approach to learning element-wise interaction for recommender systems, but they are not directly applicable to our task due to the inability to model contexts and word sequences. In this work, we develop two Position-aware Factorization Machines which consider word interaction, context and position information. Such information is jointly encoded in a set of sentiment-oriented word interaction vectors. Compared to traditional word embeddings, SWI vectors explicitly capture sentiment-oriented word interaction and simplify the parameter learning. Experimental results show that while they have comparable performance with state-of-the-art methods for document-level classification, they benefit the snippet/sentence-level sentiment analysis.
Transfer learning has revolutionized computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Fine-tuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a state-of-the-art language model. Our method significantly outperforms the state-of-the-art on five text classification tasks, reducing the error by 18-24% on the majority of datasets. We open-source our pretrained models and code to enable adoption by the community.
We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudo-English sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The difference-vector between two sentences can be added to change a third sentence with similar features in a meaningful way.