results for au:Zhou_B in:cs

- Aug 02 2017 cs.CL arXiv:1708.00308v1We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable topics since we can now illustrate topics by sampling representative sentences instead of bag of words or phrases. We present a variational auto-encoder approach for learning in which we use a factorized variational encoder that independently models the posterior over topical mixture vectors of documents using a feed-forward network, and the posterior over topic assignments to sentences using an RNN. Our preliminary experiments on two different datasets indicate early promise, but also expose many challenges that remain to be addressed.
- Aug 01 2017 cs.CV arXiv:1707.09700v1Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Objects, phrases, and caption regions are first aligned with a dynamic graph based on their spatial and semantic connections. Then a feature refining structure is used to pass messages across the three levels of semantic tasks through the graph. We benchmark the learned model on three tasks, and show the joint learning across three tasks with our proposed method can bring mutual improvements over previous models. Particularly, on the scene graph generation task, our proposed method outperforms the state-of-art method with more than 3% margin.
- Jul 10 2017 cs.LG arXiv:1707.02198v1We propose discriminative adversarial networks (DAN) for semi-supervised learning and loss function learning. Our DAN approach builds upon generative adversarial networks (GANs) and conditional GANs but includes the key differentiator of using two discriminators instead of a generator and a discriminator. DAN can be seen as a framework to learn loss functions for predictors that also implements semi-supervised learning in a straightforward manner. We propose instantiations of DAN for two different prediction tasks: classification and ranking. Our experimental results on three datasets of different tasks demonstrate that DAN is a promising framework for both semi-supervised learning and learning loss functions for predictors. For all tasks, the semi-supervised capability of DAN can significantly boost the predictor performance for small labeled sets with minor architecture changes across tasks. Moreover, the loss functions automatically learned by DANs are very competitive and usually outperform the standard pairwise and negative log-likelihood loss functions for both semi-supervised and supervised learning.
- The majority of medical documents and electronic health records (EHRs) are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health & medical corpora is topic modeling, however, this approach still needs new perspectives. In this research we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health & medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation (LDA), the most popular topic model.
- Relation detection is a core component for many NLP applications including Knowledge Base Question Answering (KBQA). In this paper, we propose a hierarchical recurrent neural network enhanced by residual learning that detects KB relations given an input question. Our method uses deep residual bidirectional LSTMs to compare questions and relation names via different hierarchies of abstraction. Additionally, we propose a simple KBQA system that integrates entity linking and our proposed relation detector to enable one enhance another. Experimental results evidence that our approach achieves not only outstanding relation detection performance, but more importantly, it helps our KBQA system to achieve state-of-the-art accuracy for both single-relation (SimpleQuestions) and multi-relation (WebQSP) QA benchmarks.
- We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.
- Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets. In this paper, we propose a new task that aims at parsing scenes with a large and open vocabulary, and several evaluation metrics are explored for this problem. Our proposed approach to this problem is a joint image pixel and word concept embeddings framework, where word concepts are connected by semantic relations. We validate the open vocabulary prediction ability of our framework on ADE20K dataset which covers a wide variety of scenes and objects. We further explore the trained joint embedding space to show its interpretability.
- This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.
- Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios. To improve these systems' perceptive speed and robustness, we present SegICP, a novel integrated solution to object recognition and pose estimation. SegICP couples convolutional neural networks and multi-hypothesis point cloud registration to achieve both robust pixel-wise semantic segmentation as well as accurate and real-time 6-DOF pose estimation for relevant objects, even in the presence of occlusions and sensor noise. Our architecture achieves 1cm position error and <5^∘$ angle error in real time without an initial seed. We evaluate and benchmark SegICP against an annotated dataset generated by motion capture.
- Feb 21 2017 cs.CV arXiv:1702.05729v2Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual description of a person, the algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. Since there is no person dataset or benchmark with textual description available, we collect a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES). A wide range of possible models and baselines have been evaluated and compared on the person search benchmark. An Recurrent Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to establish the state-of-the art performance on person search.
- Jan 26 2017 cs.OH arXiv:1701.07095v1In this paper we outline our results for validating the precision of the internal power meters of smart-phones under different workloads. We compare its results with an external power meter. This is the first step towards creating customized energy models on the fly and towards optimizing battery efficiency using genetic program improvements. Our experimental results indicate that the internal meters are sufficiently precise when large enough time windows are considered. This is part of our work on the "dreaming smart-phone". For a technical demonstration please watch our videos https://www.youtube.com/watch?v=xeeFz2GLFdU and https://www.youtube.com/watch?v=C7WHoLW1KYw.
- Jan 17 2017 cs.CL arXiv:1701.04027v1Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside-Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks.
- Jan 05 2017 cs.CV arXiv:1701.01077v3Convolutional Neural Networks (CNNs) have become the state-of-the-art in various computer vision tasks, but they are still premature for most sensor data, especially in pervasive and wearable computing. A major reason for this is the limited amount of annotated training data. In this paper, we propose the idea of leveraging the discriminative power of pre-trained deep CNNs on 2-dimensional sensor data by transforming the sensor modality to the visual domain. By three proposed strategies, 2D sensor output is converted into pressure distribution imageries. Then we utilize a pre-trained CNN for transfer learning on the converted imagery data. We evaluate our method on a gait dataset of floor surface pressure mapping. We obtain a classification accuracy of 87.66%, which outperforms the conventional machine learning methods by over 10%.
- Dec 13 2016 cs.AI arXiv:1612.03364v1Sparsity-constrained optimization is an important and challenging problem that has wide applicability in data mining, machine learning, and statistics. In this paper, we focus on sparsity-constrained optimization in cases where the cost function is a general nonlinear function and, in particular, the sparsity constraint is defined by a graph-structured sparsity model. Existing methods explore this problem in the context of sparse estimation in linear models. To the best of our knowledge, this is the first work to present an efficient approximation algorithm, namely, Graph-structured Matching Pursuit (Graph-Mp), to optimize a general nonlinear function subject to graph-structured constraints. We prove that our algorithm enjoys the strong guarantees analogous to those designed for linear models in terms of convergence rate and approximation accuracy. As a case study, we specialize Graph-Mp to optimize a number of well-known graph scan statistic models for the connected subgraph detection task, and empirical evidence demonstrates that our general algorithm performs superior over state-of-the-art methods that are designed specifically for the task of connected subgraph detection.
- Nov 22 2016 cs.SI arXiv:1611.06947v2Censorship in social media has been well studied and provides insight into how governments stifle freedom of expression online. Comparatively less (or no) attention has been paid to detecting (self) censorship in traditional media (e.g., news) using social media as a bellweather. We present a novel unsupervised approach that views social media as a sensor to detect censorship in news media wherein statistically significant differences between information published in the news media and the correlated information published in social media are automatically identified as candidate censored events. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords, and a new near-linear-time algorithm (called GraphDPD) to identify the highest scoring clusters as indicators of censorship. We outline extensive experiments on semi-synthetic data as well as real datasets (with Twitter and local news media) from Mexico and Venezuela, highlighting the capability to accurately detect real-world self censorship events.
- Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. The unique challenge of TaaS is that it must satisfy a wide range of customers who have no experience and resources to tune DL hyper-parameters, and meticulous tuning for each user's dataset is prohibitively expensive. Therefore, TaaS hyper-parameters must be fixed with values that are applicable to all users. IBM Watson Natural Language Classifier (NLC) service, the most popular IBM cognitive service used by thousands of enterprise-level clients around the globe, is a typical TaaS service. By evaluating the NLC workloads, we show that only the conservative hyper-parameter setup (e.g., small mini-batch size and small learning rate) can guarantee acceptable model accuracy for a wide range of customers. We further justify theoretically why such a setup guarantees better model convergence in general. Unfortunately, the small mini-batch size causes a high volume of communication traffic in a parameter-server based system. We characterize the high communication bandwidth requirement of TaaS using representative industrial deep learning workloads and demonstrate that none of the state-of-the-art scale-up or scale-out solutions can satisfy such a requirement. We then present GaDei, an optimized shared-memory based scale-up parameter server design. We prove that the designed protocol is deadlock-free and it processes each gradient exactly once. Our implementation is evaluated on both commercial benchmarks and public benchmarks to demonstrate that it significantly outperforms the state-of-the-art parameter-server based implementation while maintaining the required accuracy and our implementation reaches near the best possible runtime performance, constrained only by the hardware limitation. Furthermore, to the best of our knowledge, GaDei is the only scale-up DL system that provides fault-tolerance.
- Nov 15 2016 cs.CL arXiv:1611.04230v1We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.
- Nov 15 2016 cs.CL arXiv:1611.04244v1We present two novel and contrasting Recurrent Neural Network (RNN) based architectures for extractive summarization of documents. The Classifier based architecture sequentially accepts or rejects each sentence in the original document order for its membership in the final summary. The Selector architecture, on the other hand, is free to pick one sentence at a time in any arbitrary order to piece together the summary. Our models under both architectures jointly capture the notions of salience and redundancy of sentences. In addition, these models have the advantage of being very interpretable, since they allow visualization of their predictions broken up by abstract features such as information content, salience and redundancy. We show that our models reach or outperform state-of-the-art supervised models on two different corpora. We also recommend the conditions under which one architecture is superior to the other based on experimental evidence.
- Nov 01 2016 cs.CL arXiv:1610.09996v2This paper proposes dynamic chunk reader (DCR), an end-to-end neural reading comprehension (RC) model that is able to extract and rank a set of answer candidates from a given document to answer questions. DCR is able to predict answers of variable lengths, whereas previous neural RC models primarily focused on predicting single tokens or entities. DCR encodes a document and an input question with recurrent neural networks, and then applies a word-by-word attention mechanism to acquire question-aware representations for the document, followed by the generation of chunk representations and a ranking module to propose the top-ranked chunk as the answer. Experimental results show that DCR achieves state-of-the-art exact match and F1 scores on the SQuAD dataset.
- The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.
- Structured sparse optimization is an important and challenging problem for analyzing high-dimensional data in a variety of applications such as bioinformatics, medical imaging, social networks, and astronomy. Although a number of structured sparsity models have been explored, such as trees, groups, clusters, and paths, connected subgraphs have been rarely explored in the current literature. One of the main technical challenges is that there is no structured sparsity-inducing norm that can directly model the space of connected subgraphs, and there is no exact implementation of a projection oracle for connected subgraphs due to its NP-hardness. In this paper, we explore efficient approximate projection oracles for connected subgraphs, and propose two new efficient algorithms, namely, Graph-IHT and Graph-GHTP, to optimize a generic nonlinear objective function subject to connectivity constraint on the support of the variables. Our proposed algorithms enjoy strong guarantees analogous to several current methods for sparsity-constrained optimization, such as Projected Gradient Descent (PGD), Approximate Model Iterative Hard Thresholding (AM-IHT), and Gradient Hard Thresholding Pursuit (GHTP) with respect to convergence rate and approximation accuracy. We apply our proposed algorithms to optimize several well-known graph scan statistics in several applications of connected subgraph detection as a case study, and the experimental results demonstrate that our proposed algorithms outperform state-of-the-art methods.
- Aug 22 2016 cs.CV arXiv:1608.05442v1Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community's efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A generic network design called Cascade Segmentation Module is then proposed to enable the segmentation networks to parse a scene into stuff, objects, and object parts in a cascade. We evaluate the proposed module integrated within two existing semantic segmentation networks, yielding significant improvements for scene parsing. We further show that the scene parsing networks trained on ADE20K can be applied to a wide variety of scenes and objects.
- Using physical layer network coding, compute-and-forward is a promising relaying scheme that effectively exploits the interference between users and thus achieves high rates. In this paper, we consider the problem of finding the optimal integer-valued coefficient vector for a relay in the compute-and-forward scheme to maximize the computation rate at that relay. Although this problem turns out to be a shortest vector problem, which is suspected to be NP-hard, we show that it can be relaxed to a series of equality-constrained quadratic programmings. The solutions of the relaxed problems serve as real-valued approximations of the optimal coefficient vector, and are quantized to a set of integer-valued vectors, from which a coefficient vector is selected. The key to the efficiency of our method is that the closed-form expressions of the real-valued approximations can be derived with the Lagrange multiplier method. Numerical results demonstrate that compared with the existing methods, our method offers comparable rates at an impressively low complexity.
- Jun 13 2016 cs.CL arXiv:1606.03391v2This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity mention in the question by a character-level convolutional neural network (char-CNN), and match the predicate in that fact with the question by a word-level CNN (word-CNN). This work makes two main contributions. (i) A simple and effective entity linker over Freebase is proposed. Our entity linker outperforms the state-of-the-art entity linker over SimpleQA task. (ii) A novel attentive maxpooling is stacked over word-CNN, so that the predicate representation can be matched with the predicate-focused question representation more effectively. Experiments show that our system sets new state-of-the-art in this task.
- We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study. On Twitter, the model appears to generate more relevant and on-topic responses according to automatic evaluation metrics. Finally, our experiments demonstrate that the proposed model is more adept at overcoming the sparsity of natural language and is better able to capture long-term structure.
- To improve the temporal and spatial storage efficiency, researchers have intensively studied various techniques, including compression and deduplication. Through our evaluation, we find that methods such as photo tags or local features help to identify the content-based similar- ity between raw images. The images can then be com- pressed more efficiently to get better storage space sav- ings. Furthermore, storing similar raw images together enables rapid data sorting, searching and retrieval if the images are stored in a distributed and large-scale envi- ronment by reducing fragmentation. In this paper, we evaluated the compressibility by designing experiments and observing the results. We found that on a statistical basis the higher similarity photos have, the better com- pression results are. This research helps provide a clue for future large-scale storage system design.
- The problem of rare and unknown words is an important issue that can potentially influence the performance of many NLP systems, including both the traditional count-based and the deep learning models. We propose a novel way to deal with the rare and unseen words for the neural network models using attention. Our model uses two softmax layers in order to predict the next word in conditional language models: one predicts the location of a word in the source sentence, and the other predicts a word in the shortlist vocabulary. At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.
- Caching in wireless device-to-device (D2D) networks can be utilized to offload data traffic during peak times. However, the design of incentive mechanisms is challenging due to the heterogeneous preference and selfish nature of user terminals (UTs). In this paper, we propose an incentive mechanism in which the base station (BS) rewards those UTs that share contents with others using D2D communication. We study the cost minimization problem for the BS and the utility maximization problem for each UT. In particular, the BS determines the rewarding policy to minimize his total cost, while each UT aims to maximize his utility by choosing his caching policy. We formulate the conflict among UTs and the tension between the BS and the UTs as a Stackelberg game. We show the existence of the equilibrium and propose an iterative gradient algorithm (IGA) to obtain the Stackelberg Equilibrium. Extensive simulations are carried out to evaluate the performance of the proposed caching scheme and comparisons are drawn with several baseline caching schemes with no incentives. Numerical results show that the caching scheme under our incentive mechanism outperforms other schemes in terms of the BS serving cost and the utilities of the UTs.
- Online media offers opportunities to marketers to deliver brand messages to a large audience. Advertising technology platforms enables the advertisers to find the proper group of audiences and deliver ad impressions to them in real time. The recent growth of the real time bidding has posed a significant challenge on monitoring such a complicated system. With so many components we need a reliable system that detects the possible changes in the system and alerts the engineering team. In this paper we describe the mechanism that we invented for recovering the representative metrics and detecting the change in their behavior. We show that this mechanism is able to detect the possible problems in time by describing some incident cases.
- Feb 22 2016 cs.CL arXiv:1602.06023v5In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.
- In this work, we propose Attentive Pooling (AP), a two-way attention mechanism for discriminative model training. In the context of pair-wise ranking or classification with neural networks, AP enables the pooling layer to be aware of the current input pair, in a way that information from the two input items can directly influence the computation of each other's representations. Along with such representations of the paired inputs, AP jointly learns a similarity measure over projected segments (e.g. trigrams) of the pair, and subsequently, derives the corresponding attention vector for each input to guide the pooling. Our two-way attention mechanism is a general framework independent of the underlying representation learning, and it has been applied to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in our studies. The empirical results, from three very different benchmark tasks of question answering/answer selection, demonstrate that our proposed models outperform a variety of strong baselines and achieve state-of-the-art performance in all the benchmarks.
- Jan 08 2016 cs.CL arXiv:1601.01530v4Recurrent Neural Network (RNN) and one of its specific architectures, Long Short-Term Memory (LSTM), have been widely used for sequence labeling. In this paper, we first enhance LSTM-based sequence labeling to explicitly model label dependencies. Then we propose another enhancement to incorporate the global information spanning over the whole input sequence. The latter proposed method, encoder-labeler LSTM, first encodes the whole input sequence into a fixed length vector with the encoder LSTM, and then uses this encoded vector as the initial state of another LSTM for sequence labeling. Combining these methods, we can predict the label sequence with considering label dependencies and information of whole input sequence. In the experiments of a slot filling task, which is an essential component of natural language understanding, with using the standard ATIS corpus, we achieved the state-of-the-art F1-score of 95.66%.
- Dec 17 2015 cs.CL arXiv:1512.05193v3How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNN; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNN achieves state-of-the-art performance on AS, PI and TE tasks.
- Dec 15 2015 cs.CV arXiv:1512.04150v1In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them
- We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .
- Nov 20 2015 cs.CL arXiv:1511.06312v1We propose two methods of learning vector representations of words and phrases that each combine sentence context with structural features extracted from dependency trees. Using several variations of neural network classifier, we show that these combined methods lead to improved performance when used as input features for supervised term-matching.
- In this paper, we apply a general deep learning (DL) framework for the answer selection task, which does not depend on manually defined features or linguistic tools. The basic framework is to build the embeddings of questions and answers based on bidirectional long short-term memory (biLSTM) models, and measure their closeness by cosine similarity. We further extend this basic model in two directions. One direction is to define a more composite representation for questions and answers by combining convolutional neural network with the basic framework. The other direction is to utilize a simple but efficient attention mechanism in order to generate the answer representation according to the question context. Several variations of models are provided. The models are examined by two datasets, including TREC-QA and InsuranceQA. Experimental results demonstrate that the proposed models substantially outperform several strong baselines.
- This paper is an empirical study of the distributed deep learning for question answering subtasks: answer selection and question classification. Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale. This paper demonstrates the importance of distributed training. For example, with 48 workers, a 24x speedup is achievable for the answer selection task and running time is decreased from 138.2 hours to 5.81 hours, which will increase the productivity significantly.
- In this paper we explore deep learning models with memory component or attention mechanism for question answering task. We combine and compare three models, Neural Machine Translation, Neural Turing Machine, and Memory Networks for a simulated QA data set. This paper is the first one that uses Neural Machine Translation and Neural Turing Machines for solving QA tasks. Our results suggest that the combination of attention and memory have potential to solve certain QA problem.
- Recently, there has been rising interest in Bayesian optimization -- the optimization of an unknown function with assumptions usually expressed by a Gaussian Process (GP) prior. We study an optimization strategy that directly uses an estimate of the argmax of the function. This strategy offers both practical and theoretical advantages: no tradeoff parameter needs to be selected, and, moreover, we establish close connections to the popular GP-UCB and GP-PI strategies. Our approach can be understood as automatically and adaptively trading off exploration and exploitation in GP-UCB and GP-PI. We illustrate the effects of this adaptive tuning via bounds on the regret as well as an extensive empirical evaluation on robotics and vision tasks, demonstrating the robustness of this strategy for a range of performance criteria.
- Neural Turing Machines (NTM) contain memory component that simulates "working memory" in the brain to store and retrieve information to ease simple algorithms learning. So far, only linearly organized memory is proposed, and during experiments, we observed that the model does not always converge, and overfits easily when handling certain tasks. We think memory component is key to some faulty behaviors of NTM, and better organization of memory component could help fight those problems. In this paper, we propose several different structures of memory for NTM, and we proved in experiments that two of our proposed structured-memory NTMs could lead to better convergence, in term of speed and prediction accuracy on copy task and associative recall task as in (Graves et al. 2014).
- Caching at small base stations (SBSs) has demonstrated significant benefits in alleviating the backhaul requirement in heterogeneous cellular networks (HetNets). While many existing works focus on what contents to cache at each SBS, an equally important problem is what contents to deliver so as to satisfy dynamic user demands given the cache status. In this paper, we study optimal content delivery in cache-enabled HetNets by taking into account the inherent multicast capability of wireless medium. We consider stochastic content multicast scheduling to jointly minimize the average network delay and power costs under a multiple access constraint. We establish a content-centric request queue model and formulate this stochastic optimization problem as an infinite horizon average cost Markov decision process (MDP). By using \emphrelative value iteration and special properties of the request queue dynamics, we characterize some properties of the value function of the MDP. Based on these properties, we show that the optimal multicast scheduling policy is of threshold type. Then, we propose a structure-aware optimal algorithm to obtain the optimal policy. We also propose a low-complexity suboptimal policy, which possesses similar structural properties to the optimal policy, and develop a low-complexity algorithm to obtain this policy.
- Many Java applications instantiate objects within the Java heap that are persistent but seldom if ever referenced by the application. Examples include strings, such as error messages, and collections of value objects that are preloaded for fast access but they may include objects that are seldom referenced. This paper describes a stack-based framework for detecting these "cold" objects at runtime, with a view to marshaling and sequestering them in designated regions of the heap where they may be preferentially paged out to a backing store, thereby freeing physical memory pages for occupation by more active objects. Furthermore, we evaluate the correctness and efficiency of stack-based approach with an Access Barrier. The experimental results from a series of SPECjvm2008 benchmarks are presented.
- We apply a general deep learning framework to address the non-factoid question answering task. Our approach does not rely on any linguistic tools and can be applied to different languages or domains. Various architectures are presented and compared. We create and release a QA corpus and setup a new QA task in the insurance domain. Experimental results demonstrate superior performance compared to the baseline methods and various technologies give further improvements. For this highly challenging task, the top-1 accuracy can reach up to 65.3% on a test set, which indicates a great potential for practical use.
- Jul 10 2015 cs.CV arXiv:1507.02379v2Convolutional Neural Network (CNN) has been successful in image recognition tasks, and recent works shed lights on how CNN separates different classes with the learned inter-class knowledge through visualization. In this work, we instead visualize the intra-class knowledge inside CNN to better understand how an object class is represented in the fully-connected layers. To invert the intra-class knowledge into more interpretable images, we propose a non-parametric patch prior upon previous CNN visualization models. With it, we show how different "styles" of templates for an object class are organized by CNN in terms of location and content, and represented in a hierarchical and ensemble way. Moreover, such intra-class knowledge can be used in many interesting applications, e.g. style-based image retrieval and style-based object completion.
- In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To exploit both deep learning and linguistic structures, we propose a tree-based convolutional neural network model which exploit various long-distance relationships between words. Our model improves the sequential baselines on all three sentiment and question classification tasks, and achieves the highest published accuracy on TREC.
- Jun 02 2015 cs.CL arXiv:1506.00528v1In this paper, we present a novel approach for medical synonym extraction. We aim to integrate the term embedding with the medical domain knowledge for healthcare applications. One advantage of our method is that it is very scalable. Experiments on a dataset with more than 1M term pairs show that the proposed approach outperforms the baseline approaches by a large margin.
- Relation classification is an important semantic processing task for which state-ofthe-art systems still rely on costly handcrafted features. In this work we tackle the relation classification task using a convolutional neural network that performs classification by ranking (CR-CNN). We propose a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes. We perform experiments using the the SemEval-2010 Task 8 dataset, which is designed for the task of classifying the relationship between two nominals marked in a sentence. Using CRCNN, we outperform the state-of-the-art for this dataset and achieve a F1 of 84.1 without using any costly handcrafted features. Additionally, our experimental results show that: (1) our approach is more effective than CNN followed by a softmax classifier; (2) omitting the representation of the artificial class Other improves both precision and recall; and (3) using only word embeddings as input features is enough to achieve state-of-the-art results if we consider only the text between the two target nominals.
- Caching and multicasting at base stations are two promising approaches to support massive content delivery over wireless networks. However, existing scheduling designs do not make full use of the advantages of the two approaches. In this paper, we consider the optimal dynamic multicast scheduling to jointly minimize the average delay, power, and fetching costs for cache-enabled content-centric wireless networks. We formulate this stochastic optimization problem as an infinite horizon average cost Markov decision process (MDP). It is well-known to be a difficult problem due to the curse of dimensionality, and there generally only exist numerical solutions. By using relative value iteration algorithm and the special structures of the request queue dynamics, we analyze the properties of the value function and the state-action cost function of the MDP for both the uniform and nonuniform channel cases. Based on these properties, we show that the optimal policy, which is adaptive to the request queue state, has a switch structure in the uniform case and a partial switch structure in the nonuniform case. Moreover, in the uniform case with two contents, we show that the switch curve is monotonically non-decreasing. Then, by exploiting these structural properties of the optimal policy, we propose two low-complexity optimal algorithms. Motivated by the switch structures of the optimal policy, to further reduce the complexity, we also propose a low-complexity suboptimal policy, which possesses similar structural properties to the optimal policy, and develop a low-complexity algorithm to compute this policy.
- In this paper, a macroblock classification method is proposed for various video processing applications involving motions. Based on the analysis of the Motion Vector field in the compressed video, we propose to classify Macroblocks of each video frame into different classes and use this class information to describe the frame content. We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame. Based on the proposed macroblock classification, we further propose algorithms for different video processing applications, including shot change detection, motion discontinuity detection, and outlier rejection for global motion estimation. Experimental results demonstrate that the methods based on the proposed approach can work effectively on these applications.
- Mar 03 2015 cs.CV arXiv:1503.00090v1Image deblurring techniques play important roles in many image processing applications. As the blur varies spatially across the image plane, it calls for robust and effective methods to deal with the spatially-variant blur problem. In this paper, a Saliency-based Deblurring (SD) approach is proposed based on the saliency detection for salient-region segmentation and a corresponding compensate method for image deblurring. We also propose a PDE-based deblurring method which introduces an anisotropic Partial Differential Equation (PDE) model for latent image prediction and employs an adaptive optimization model in the kernel estimation and deconvolution steps. Experimental results demonstrate the effectiveness of the proposed algorithm.
- Expression cloning plays an important role in facial expression synthesis. In this paper, a novel algorithm is proposed for facial expression cloning. The proposed algorithm first introduces a new elastic model to balance the global and local warping effects, such that the impacts from facial feature diversity among people can be minimized, and thus more effective geometric warping results can be achieved. Furthermore, a muscle-distribution-based (MD) model is proposed, which utilizes the muscle distribution of the human face and results in more accurate facial illumination details. In addition, we also propose a new distance-based metric to automatically select the optimal parameters such that the global and local warping effects in the elastic model can be suitably balanced. Experimental results show that our proposed algorithm outperforms the existing methods.
- Feb 23 2015 physics.soc-ph cs.SI arXiv:1502.05760v1The bidirectional selection between two classes widely emerges in various social lives, such as commercial trading and mate choosing. Until now, the discussions on bidirectional selection in structured human society are quite limited. We demonstrated theoretically that the rate of successfully matching is affected greatly by individuals neighborhoods in social networks, regardless of the type of networks. Furthermore, it is found that the high average degree of networks contributes to increasing rates of successful matches. The matching performance in different types of networks has been quantitatively investigated, revealing that the small-world networks reinforces the matching rate more than scale-free networks at given average degree. In addition, our analysis is consistent with the modeling result, which provides the theoretical understanding of underlying mechanisms of matching in complex networks.
- With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene classification. As scenes are composed of objects, the CNN for scene classification automatically discovers meaningful objects detectors, representative of the learned scene categories. With object detectors emerging as a result of learning to recognize scenes, our work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.
- Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition system, since it is expensive to obtain fully labeled data for a large number of concept categories. In this paper, we propose ConceptLearner, which is a scalable approach to discover visual concepts from weakly labeled image collections. Thousands of visual concept detectors are learned automatically, without human in the loop for additional annotation. We show that these learned detectors could be applied to recognize concepts at image-level and to detect concepts at image region-level accurately. Under domain-specific supervision, we further evaluate the learned concepts for scene recognition on SUN database and for object detection on Pascal VOC 2007. ConceptLearner shows promising performance compared to fully supervised and weakly supervised methods.
- We consider the problem of finding the optimal coefficient vector that maximizes the computation rate at a relay in the compute-and-forward scheme. Based on the idea of sphere decoding, we propose a highly efficient algorithm that finds the optimal coefficient vector. First, we derive a novel algorithm to transform the original quadratic form optimization problem into a shortest vector problem (SVP) using the Cholesky factorization. Instead of computing the Cholesky factor explicitly, the proposed algorithm realizes the Cholesky factorization with only $\bigO(n)$ flops by taking advantage of the structure of the Gram matrix in the quadratic form. Then, we propose some conditions that can be checked with $\bigO(n)$ flops, under which a unit vector is the optimal coefficient vector. Finally, by taking into account some useful properties of the optimal coefficient vector, we modify the Schnorr-Euchner search algorithm to solve the SVP. We show that the estimated average complexity of our new algorithm is $\bigO(n^{1.5}P^{0.5})$ flops for i.i.d. Gaussian channel entries with SNR $P$ based on the Gaussian heuristic. Simulations show that our algorithm is not only much more efficient than the existing ones that give the optimal solution, but also faster than some best known suboptimal methods. Besides, we show that our algorithm can be readily adapted to output a list of $L$ best candidate vectors for use in the compute-and-forward design. The estimated average complexity of the resultant list-output algorithm is $\bigO\left(n^{1.5}P^{0.5}\log L + nL\right)$ flops for i.i.d. Gaussian channel entries.
- Sep 19 2014 physics.soc-ph cs.SI arXiv:1409.5298v1The network topology can be described by the number of nodes and the interconnections among them. The degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network. Therefore, the degree is very important structural parameter of network topology. However, given the number of nodes and the degree of each node in a network, the topology of the network cannot be determined. Therefore, we propose the degree-layer theory of network topology to describe deeply the network topology. First, we propose the concept of degree-tree with the breadth-first search tree. The degrees of all nodes are layered and have a hierarchical structure. Second,the degree-layer theory is described in detail. Two new concepts are defined in the theory. An index is proposed to quantitatively distinguish the two network topologies. It also can quantitatively measure the stability of network topology built by a model mechanism. One theorem is given and proved, furthermore, and one corollary is derived directly from the theorem. Third, the applications of the degree-layer theory are discussed in the ER random network, WS small world network and BA scale-free network, and the influences of the degree distribution on the stability of network topology are studied in the three networks. In conclusion, the degree-layer theory is helpful for accurately describing the network topology, and provides a new starting point for researching the similarity and isomorphism between two network topologies.
- Optimal queueing control of multi-hop networks remains a challenging problem even in the simplest scenarios. In this paper, we consider a two-hop half-duplex relaying system with random channel connectivity. The relay is equipped with a finite buffer. We focus on stochastic link selection and transmission rate control to maximize the average system throughput subject to a half-duplex constraint. We formulate this stochastic optimization problem as an infinite horizon average cost Markov decision process (MDP), which is well-known to be a difficult problem. By using sample-path analysis and exploiting the specific problem structure, we first obtain an \emphequivalent Bellman equation with reduced state and action spaces. By using \emphrelative value iteration algorithm, we analyze the properties of the value function of the MDP. Then, we show that the optimal policy has a threshold-based structure by characterizing the \emphsupermodularity in the optimal control. Based on the threshold-based structure and Markov chain theory, we further simplify the original complex stochastic optimization problem to a static optimization problem over a small discrete feasible set and propose a low-complexity algorithm to solve the simplified static optimization problem by making use of its special structure. Furthermore, we obtain the closed-form optimal threshold for the symmetric case. The analytical results obtained in this paper also provide design insights for two-hop relaying systems with multiple relays equipped with finite relay buffers.
- Virtual machine (VM) scheduling is an important technique to efficiently operate the computing resources in a data center. Previous work has mainly focused on consolidating VMs to improve resource utilization and thus to optimize energy consumption. However, the interference between collocated VMs is usually ignored, which can result in very worse performance degradation to the applications running in those VMs due to the contention of the shared resources. Based on this observation, we aim at designing efficient VM assignment and scheduling strategies where we consider optimizing both the operational cost of the data center and the performance degradation of running applications and then, we propose a general model which captures the inherent tradeoff between the two contradictory objectives. We present offline and online solutions for this problem by exploiting the spatial and temporal information of VMs where VM scheduling is done by jointly consider the combinations and the life-cycle overlapping of the VMs. Evaluation results show that the proposed methods can generate efficient schedules for VMs, achieving low operational cost while significantly reducing the performance degradation of applications in cloud data centers.
- Mar 04 2014 cs.SI physics.soc-ph arXiv:1403.0448v1Aiming to understand real-world hierarchical networks whose degree distributions are neither power law nor exponential, we construct a hybrid clique network that includes both homogeneous and inhomogeneous parts, and introduce an inhomogeneity parameter to tune the ratio between the homogeneous part and the inhomogeneous one. We perform Monte-Carlo simulations to study various properties of such a network, including the degree distribution, the average shortest-path-length, the clustering coefficient, the clustering spectrum, and the communicability.
- In this paper, we focus on analyzing the period distribution of the inversive pseudorandom number generators (IPRNGs) over finite field $({\rm Z}_{N},+,\times)$, where $N>3$ is a prime. The sequences generated by the IPRNGs are transformed to 2-dimensional linear feedback shift register (LFSR) sequences. By employing the generating function method and the finite field theory, the period distribution is obtained analytically. The analysis process also indicates how to choose the parameters and the initial values such that the IPRNGs fit specific periods. The analysis results show that there are many small periods if $N$ is not chosen properly. The experimental examples show the effectiveness of the theoretical analysis.
- In 2009, Solé and Zinoviev (\emphEur. J. Combin., vol. 30, no. 2, pp. 458-467, 2009) proposed an open problem of arithmetic interest to study the period of the inversive pseudorandom number generators (IPRNGs) and to give conditions bearing on $a, b$ to achieve maximal period, we focus on resolving this open problem. In this paper, the period distribution of the IPRNGs over the Galois ring $({\rm Z}_{p^{e}},+,\times)$ is considered, where $p>3$ is a prime and $e\geq 2$ is an integer. The IPRNGs are transformed to 2-dimensional linear feedback shift registers (LFSRs) so that the analysis of the period distribution of the IPRNGs is transformed to the analysis of the period distribution of the LFSRs. Then, by employing some analytical approaches, the full information on the period distribution of the IPRNGs is obtained, which is to make exact statistics about the period of the IPRNGs then count the number of IPRNGs of a specific period when $a$, $b$ and $x_{0}$ traverse all elements in ${\rm Z}_{p^{e}}$. The analysis process also indicates how to choose the parameters and the initial values such that the IPRNGs fit specific periods.
- This paper consists of three parts. The first part presents a large class of new binary quasi-cyclic (QC)-LDPC codes with girth of at least 6 whose parity-check matrices are constructed based on cyclic subgroups of finite fields. Experimental results show that the codes constructed perform well over the binary-input AWGN channel with iterative decoding using the sum-product algorithm (SPA). The second part analyzes the ranks of the parity-check matrices of codes constructed based on finite fields with characteristic of 2 and gives combinatorial expressions for these ranks. The third part identifies a subclass of constructed QC-LDPC codes that have large minimum distances. Decoding of codes in this subclass with the SPA converges very fast.