results for au:Yu_H in:cs

- Dec 06 2017 cs.CV arXiv:1712.01493v1Attribute based person re-identification (Re-ID) aims to search persons in large-scale image databases using attribute queries. Despite the great practical significance,the huge gap between semantic attributes and visual images makes it a very challenging task. Existing researches usually focus on the match between query attributes and attributes scores predicted from human images, which suffers from imperfect attribute prediction and low discriminability. In this work, we propose to formulate the attribute based person Re-ID as a joint space learning problem. To alleviate the negative impact resulted by the huge heterogeneous gap between different modalities, we apply a novel adversary training strategy to generate homogeneous features for both modalities, providing distribution alignment between different modalities in the feature level and keeping semantic consistency across modalities. Our experiments validate the effectiveness of our model, and show great improvement on the performance over the state of the art models.
- In a directional modulation network, a general power iterative (GPI) based beamforming scheme is proposed to maximize the secrecy rate (SR), where there are two optimization variables required to be optimized. The first one is the useful precoding vector of transmitting confidential messages to the desired user while the second one is the artificial noise (AN) projection matrix of forcing more AN to eavesdroppers. In such a secure network, the paramount problem is how to design or optimize the two optimization variables by different criteria. To maximize the SR (Max-SR), an alternatively iterative structure (AIS) is established between the AN projection matrix and the precoding vector for confidential messages. To choose a good initial value of iteration process of GPI, the proposed Max-SR method can readily double its convergence speed compared to the random choice of initial value. With only four iterations, it may rapidly converge to its rate ceil. From simulation results, it follows that the SR performance of the proposed AIS of GPI-based Max-SR is much better than those of conventional leakage-based and null-space projection methods in the medium and large signal-to-noise ratio (SNR) regions, and its achievable SR performance gain gradually increases as SNR increases.
- This paper presents a novel variational inference framework for deriving a family of Bayesian sparse Gaussian process regression (SGPR) models whose approximations are variationally optimal with respect to the full-rank GPR model enriched with various corresponding correlation structures of the observation noises. Our variational Bayesian SGPR (VBSGPR) models jointly treat both the distributions of the inducing variables and hyperparameters as variational parameters, which enables the decomposability of the variational lower bound that in turn can be exploited for stochastic optimization. Such a stochastic optimization involves iteratively following the stochastic gradient of the variational lower bound to improve its estimates of the optimal variational distributions of the inducing variables and hyperparameters (and hence the predictive distribution) of our VBSGPR models and is guaranteed to achieve asymptotic convergence to them. We show that the stochastic gradient is an unbiased estimator of the exact gradient and can be computed in constant time per iteration, hence achieving scalability to big data. We empirically evaluate the performance of our proposed framework on two real-world, massive datasets.
- Sep 28 2017 cs.DS arXiv:1709.09574v1In the fillable array problem one must maintain an array A[1..n] of $w$-bit entries subject to random access reads and writes, and also a $\texttt{fill}(\Delta)$ operation which sets every entry of to some $\Delta\in\{0,\ldots,2^w-1\}$. We show that with just one bit of redundancy, i.e. a data structure using $nw+1$ bits of memory, $\texttt{read}/\texttt{fill}$ can be implemented in worst case constant time, and $\texttt{write}$ can be implemented in either amortized constant time (deterministically) or worst case expected constant (randomized). In the latter case, we need to store an additional $O(\log n)$ random bits to specify a permutation drawn from an $1/n^2$-almost pairwise independent family.
- In OFDM relay networks with IQ imbalances and full-duplex relay station (RS), how to optimize pilot pattern and power allocation using the criterion of minimizing the sum of mean square errors (Sum-MSE) for the frequency-domain least-squares channel estimator has a heavy impact on self-interference cancellation. Firstly, the design problem of pilot pattern is casted as a convex optimization. From the KKT conditions, the optimal analytical expression is derived given the fixed source power and RS power. Subsequently, an optimal power allocation (OPA) strategy is proposed and presented to further alleviate the effect of Sum-MSE under the total transmit power sum constraint of source node and RS. Simulation results show that the proposed OPA performs better than equal power allocation (EPA) in terms of Sum-MSE, and the Sum-MSE performance gain grows with deviating $\rho$ from the value of $\rho^o$ minimizing the Sum-MSE, where $\rho$ is defined as the average ratio of the residual SI channel at RS to the intended channel from source to RS. For example, the OPA achieves about 5dB SNR gain over EPA by shrinking or stretching $\rho$ with a factor $4$. More importantly, as $\rho$ decreases or increases more, the performance gain becomes more significant.
- Abstraction and realization are bilateral processes that are key in deriving intelligence and creativity. In many domains, the two processes are approached through rules: high-level principles that reveal invariances within similar yet diverse examples. Under a probabilistic setting for discrete input spaces, we focus on the rule realization problem which generates input sample distributions that follow the given rules. More ambitiously, we go beyond a mechanical realization that takes whatever is given, but instead ask for proactively selecting reasonable rules to realize. This goal is demanding in practice, since the initial rule set may not always be consistent and thus intelligent compromises are needed. We formulate both rule realization and selection as two strongly connected components within a single and symmetric bi-convex problem, and derive an efficient algorithm that works at large scale. Taking music compositional rules as the main example throughout the paper, we demonstrate our model's efficiency in not only music realization (composition) but also music interpretation and understanding (analysis).
- Sep 06 2017 cs.GR arXiv:1709.01221v2Edge bundling methods can effectively alleviate visual clutter and reveal high-level graph structures in large graph visualization. Researchers have devoted significant efforts to improve edge bundling according to different metrics. As the edge bundling family evolve rapidly, the quality of edge bundles receives increasing attention in the literature accordingly. In this paper, we present MLSEB, a novel method to generate edge bundles based on moving least squares (MLS) approximation. In comparison with previous edge bundling methods, we argue that our MLSEB approach can generate better results based on a quantitative metric of quality, and also ensure scalability and the efficiency for visualizing large graphs.
- Aug 29 2017 cs.CV arXiv:1708.08062v2While metric learning is important for Person re-identification (RE-ID), a significant problem in visual surveillance for cross-view pedestrian matching, existing metric models for RE-ID are mostly based on supervised learning that requires quantities of labeled samples in all pairs of camera views for training. However, this limits their scalabilities to realistic applications, in which a large amount of data over multiple disjoint camera views is available but not labelled. To overcome the problem, we propose unsupervised asymmetric metric learning for unsupervised RE-ID. Our model aims to learn an asymmetric metric, i.e., specific projection for each view, based on asymmetric clustering on cross-view person images. Our model finds a shared space where view-specific bias is alleviated and thus better matching performance can be achieved. Extensive experiments have been conducted on a baseline and five large-scale RE-ID datasets to demonstrate the effectiveness of the proposed model. Through the comparison, we show that our model works much more suitable for unsupervised RE-ID compared to classical unsupervised metric learning models. We also compare with existing unsupervised RE-ID methods, and our model outperforms them with notable margins. Specifically, we report the results on large-scale unlabelled RE-ID dataset, which is important but unfortunately less concerned in literatures.
- Aug 04 2017 cs.CV arXiv:1708.00961v1In this paper, we introduce a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity. The Wasserstein distance is a key concept of the optimal transform theory, and promises to improve the performance of the GAN. The perceptual loss compares the perceptual features of a denoised output against those of the ground truth in an established feature space, while the GAN helps migrate the data noise distribution from strong to weak. Therefore, our proposed method transfers our knowledge of visual perception to the image denoising task, is capable of not only reducing the image noise level but also keeping the critical information at the same time. Promising results have been obtained in our experiments with clinical CT images.
- Jul 28 2017 cs.CV arXiv:1707.08682v1SSD is one of the state-of-the-art object detection algorithms, and it combines high detection accuracy with real-time speed. However, it is widely recognized that SSD is less accurate in detecting small objects compared to large objects, because it ignores the context from outside the proposal boxes. In this paper, we present CSSD--a shorthand for context-aware single-shot multibox object detector. CSSD is built on top of SSD, with additional layers modeling multi-scale contexts. We describe two variants of CSSD, which differ in their context layers, using dilated convolution layers (DiCSSD) and deconvolution layers (DeCSSD) respectively. The experimental results show that the multi-scale context modeling significantly improves the detection accuracy. In addition, we study the relationship between effective receptive fields (ERFs) and the theoretical receptive fields (TRFs), particularly on a VGGNet. The empirical results further strengthen our conclusion that SSD coupled with context layers achieves better detection results especially for small objects ($+3.2\% {\rm AP}_{@0.5}$ on MS-COCO compared to the newest SSD), while maintaining comparable runtime performance.
- Jun 22 2017 cs.IR arXiv:1706.06716v2Understanding users' behavior and predicting their future purchase are critical for e-commerce companies to boost their revenue. While explicit user feedback such as ratings plays the most significant role in eliciting users' preferences, such feedback is scarce, which prompts the need for leveraging more abundant implicit user feedback such as purchase record. Consequently, recent studies focused on leveraging users' past purchase record to predict their purchase. However, their performance is not satisfactory due to 1) the lack of purchase history of users, and 2) more importantly the ill-posed assumption of non-purchased items equally being considered as negative feedback. In this paper, we define new pairwise relationships among items aiming at overcoming the limitations of existing works, and propose a novel method called P3S that stands for modeling pairwise relationships among three disjoint item sets, which leverages users' click record in conjunction with their purchase record. Precisely, we partition the items into three disjoint sets based on users' purchase and click record, define new pairwise relationships among them with respect to users, and reflect these relationships into our pairwise learning-to-rank method. Experiments on two real-world datasets demonstrate that our proposed method outperforms the state-of-the-art baselines in the task of predicting users' future purchase in e-commerce.
- Jun 13 2017 cs.SD arXiv:1706.03397v1In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition.
- May 30 2017 cs.CL arXiv:1705.09906v1One of the long-term goals of artificial intelligence is to build an agent that can communicate intelligently with human in natural language. Most existing work on natural language learning relies heavily on training over a pre-collected dataset with annotated labels, leading to an agent that essentially captures the statistics of the fixed external training data. As the training data is essentially a static snapshot representation of the knowledge from the annotator, the agent trained this way is limited in adaptiveness and generalization of its behavior. Moreover, this is very different from the language learning process of humans, where language is acquired during communication by taking speaking action and learning from the consequences of speaking action in an interactive manner. This paper presents an interactive setting for grounded natural language learning, where an agent learns natural language by interacting with a teacher and learning from feedback, thus learning and improving language skills while taking part in the conversation. To achieve this goal, we propose a model which incorporates both imitation and reinforcement by leveraging jointly sentence and reward feedbacks from the teacher. Experiments are conducted to validate the effectiveness of the proposed approach.
- May 09 2017 cs.LG arXiv:1705.02699v1Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.
- May 08 2017 cs.CV arXiv:1705.02233v4One of the major challenges in object detection is to propose detectors with highly accurate localization of objects. The online sampling of high-loss region proposals (hard examples) uses the multitask loss with equal weight settings across all loss types (e.g, classification and localization, rigid and non-rigid categories) and ignores the influence of different loss distributions throughout the training process, which we find essential to the training efficacy. In this paper, we present the Stratified Online Hard Example Mining (S-OHEM) algorithm for training higher efficiency and accuracy detectors. S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling technique, to choose the training examples according to this influence during hard example mining, and thus enhance the performance of object detectors. We show through systematic experiments that S-OHEM yields an average precision (AP) improvement of 0.5% on rigid categories of PASCAL VOC 2007 for both the IoU threshold of 0.6 and 0.7. For KITTI 2012, both results of the same metric are 1.6%. Regarding the mean average precision (mAP), a relative increase of 0.3% and 0.5% (1% and 0.5%) is observed for VOC07 (KITTI12) using the same set of IoU threshold. Also, S-OHEM is easy to integrate with existing region-based detectors and is capable of acting with post-recognition level regressors.
- In this paper, we make an investigation on the sum-mean-square-error (sum-MSE) performance gain achieved by DFT-based least-square (LS) channel estimator over frequency-domain LS one in full-duplex OFDM system in the presence of colored interference and noise. The closed-form expression of the sum-MSE performance gain is given. Its simple upper and lower bounds are derived by using inequalities of matrix eigen-values. By simulation and analysis, the upper lower bound is shown to be close to the exact value of MSE gain as the ratio of the number N of total subcarriers to the cyclic prefix length L grows and the correlation factor of colored interference increases. More importantly, we also find that the MSE gain varies from one to N/L as the correlation among colored interferences decreases gradually. According to theoretical analysis, we also find the MSE gain has very simple forms in two extreme scenarios. In the first extreme case that the colored interferences over all subchannels are fully correlated, i.e., their covariance matrix is a matrix of all-ones, the sum-MSE gain reduces to 1. In other words, there is no performance gain. In the second extreme case that the colored-interference covariance matrix is an identity matrix, i.e, they are mutually independent, the achievable sum-MSE performance gain is N/L. A large ratio N/L will achieve a significant sum-MSE gain.
- Apr 25 2017 cs.CV arXiv:1704.07142v1With the increasing demands of applications in virtual reality such as 3D films, virtual Human-Machine Interactions and virtual agents, the analysis of 3D human face analysis is considered to be more and more important as a fundamental step for those virtual reality tasks. Due to information provided by an additional dimension, 3D facial reconstruction enables aforementioned tasks to be achieved with higher accuracy than those based on 2D facial analysis. The denser the 3D facial model is, the more information it could provide. However, most existing dense 3D facial reconstruction methods require complicated processing and high system cost. To this end, this paper presents a novel method that simplifies the process of dense 3D facial reconstruction by employing only one frame of depth data obtained with an off-the-shelf RGB-D sensor. The experiments showed competitive results with real world data.
- In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a closer and more natural correspondence to dynamic data structures than classic communication models do, and hence presents a new perspective on data structures. We first present a tight lower bound for the online set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a stronger lower bound. We then apply the online communication model to prove data structure lower bounds for two dynamic data structure problems: the Group Range problem and the Dynamic Connectivity problem for forests. Both of the problems admit a worst case $O(\log n)$-time data structure. Using online communication complexity, we prove a tight cell-probe lower bound for each: spending $o(\log n)$ (even amortized) time per operation results in at best an $\exp(-\delta^2 n)$ probability of correctly answering a $(1/2+\delta)$-fraction of the $n$ queries.
- We consider off-policy temporal-difference (TD) learning in discounted Markov decision processes, where the goal is to evaluate a policy in a model-free way by using observations of a state process generated without executing the policy. To curb the high variance issue in off-policy TD learning, we propose a new scheme of setting the $\lambda$-parameters of TD, based on generalized Bellman equations. Our scheme is to set $\lambda$ according to the eligibility trace iterates calculated in TD, thereby easily keeping these traces in a desired bounded range. Compared to prior works, this scheme is more direct and flexible, and allows much larger $\lambda$ values for off-policy TD learning with bounded traces. Using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process. These results not only lead immediately to a characterization of the convergence behavior of least-squares based implementation of our scheme, but also prepare the ground for further analysis of gradient-based implementations.
- Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.
- We tackle a task where an agent learns to navigate in a 2D maze-like environment called XWORLD. In each session, the agent perceives a sequence of raw-pixel frames, a natural language command issued by a teacher, and a set of rewards. The agent learns the teacher's language from scratch in a grounded and compositional manner, such that after training it is able to correctly execute zero-shot commands: 1) the combination of words in the command never appeared before, and/or 2) the command contains new object concepts that are learned from another task but never learned from navigation. Our deep framework for the agent is trained end to end: it learns simultaneously the visual representations of the environment, the syntax and semantics of the language, and the action module that outputs actions. The zero-shot learning capability of our framework results from its compositionality and modularity with parameter tying. We visualize the intermediate outputs of the framework, demonstrating that the agent truly understands how to solve the problem. We believe that our results provide some preliminary insights on how to train an agent with similar abilities in a 3D environment.
- This paper proves the first super-logarithmic lower bounds on the cell probe complexity of dynamic boolean (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new method for proving dynamic cell probe lower bounds and use it to prove a $\tilde{\Omega}(\log^{1.5} n)$ lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting over $\mathbb{F}_2$ ([Pat07]). Proving an $\omega(\lg n)$ lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai PÇŽtraÅŸcu's obituary [Tho13]. This result also implies the first $\omega(\lg n)$ lower bound for the classical 2D range counting problem, one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D rectangle stabbing, and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient one-way communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].
- This paper studies a green paradigm for the underlay coexistence of primary users (PUs) and secondary users (SUs) in energy harvesting cognitive radio networks (EH-CRNs), wherein battery-free SUs capture both the spectrum and the energy of PUs to enhance spectrum efficiency and green energy utilization. To lower the transmit powers of SUs, we employ multi-hop transmission with time division multiple access, by which SUs first harvest energy from the RF signals of PUs and then transmit data in the allocated time concurrently with PUs, all in the licensed spectrum. In this way, the available transmit energy of each SU mainly depends on the harvested energy before the turn to transmit, namely energy causality. Meanwhile, the transmit powers of SUs must be strictly controlled to protect PUs from harmful interference. Thus, subject to the energy causality constraint and the interference power constraint, we study the end-to-end throughput maximization problem for optimal time and power allocation. To solve this nonconvex problem, we first equivalently transform it into a convex optimization problem and then propose the joint optimal time and power allocation (JOTPA) algorithm that iteratively solves a series of feasibility problems until convergence. Extensive simulations evaluate the performance of EH-CRNs with JOTPA in three typical deployment scenarios and validate the superiority of JOTPA by making comparisons with two other resource allocation algorithms.
- Both feedback of ratings and trust relationships can be used to reveal users' tastes for improving recommendation performance, especially for cold users. However, both of them are facing data sparsity problem, which may severely degrade recommendation performance. In this paper, we propose to utilize the idea of Denoising Auto-Encoders (DAE) to tackle this problem. Specially, we propose a novel deep learning model, the \textitTrust-aware Collaborative Denoising Auto-Encoder (TDAE), to learn compact and effective representations from both rating and trust data for top-N recommendation. In particular, we present a novel neutral network with a weighted hidden layer to balance the importance of these representations. Moreover, we propose a novel correlative regularization to bridge relations between user preferences in different perspectives. We also conduct comprehensive experiments on two public datasets to compare with several state-of-the-art approaches. The results demonstrate that the proposed method significantly outperforms other comparisons for top-N recommendation task.
- Mar 03 2017 cs.CL arXiv:1703.00538v2Background: Electronic health record (EHR) notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Objective: The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. Methods: We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views for term importance: patient use of medical concepts, document-level term salience, word-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and compared it with three benchmark unsupervised ensemble ranking methods. Results: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. It outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. Conclusions: FIT can automatically identify EHR terms important to patients and may help develop personalized interventions to improve quality of care. By using unsupervised learning as well as a robust and flexible framework for information fusion, FIT can be readily applied to other domains and applications.
- Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning.
- Mar 01 2017 cs.CL arXiv:1702.08563v2Item Response Theory (IRT) allows for measuring ability of Machine Learning models as compared to a human population. However, it is difficult to create a large dataset to train the ability of deep neural network models (DNNs). We propose Crowd-Informed Fine-Tuning (CIFT) as a new training process, where a pre-trained model is fine-tuned with a specialized supplemental training set obtained via IRT model-fitting on a large set of crowdsourced response patterns. With CIFT we can leverage the specialized set of data obtained through IRT to inform parameter tuning in DNNs. We experiment with two loss functions in CIFT to represent (i) memorization of fine-tuning items and (ii) learning a probability distribution over potential labels that is similar to the crowdsourced distribution over labels to simulate crowd knowledge. Our results show that CIFT improves ability for a state-of-the art DNN model for Recognizing Textual Entailment (RTE) tasks and is generalizable to a large-scale RTE test set.
- FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed accelerator architecture for binary CNNs running on a Virtex-7 FPGA is 8.3x faster and 75x more energy-efficient than a Titan X GPU for processing online individual requests in small batch sizes. For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5x higher energy efficiency.
- Feb 17 2017 cs.CL arXiv:1702.04811v2Deep neural networks (DNNs) have made significant progress in a number of Machine Learning applications. However without a consistent set of evaluation tasks, interpreting performance across test datasets is impossible. In most previous work, characteristics of individual data points are not considered during evaluation, and each data point is treated equally. Using Item Response Theory (IRT) from psychometrics it is possible to model characteristics of specific data points that then inform an estimate of model ability as compared to a population of humans. We report the results of several experiments to determine how different Deep Neural Network (DNN) models perform under different training circumstances with respect to ability. As DNNs train on larger datasets, performance begins to look like human performance under the assumptions of IRT models. That is, easy questions start to have a higher probability of being answered correctly than harder questions. We also report the results of additional analyses regarding model robustness to noise and performance as a function of training set size that further inform our main conclusion
- With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.
- Feb 13 2017 cs.LG arXiv:1702.03006v1To estimate the value functions of policies from exploratory data, most model-free off-policy algorithms rely on importance sampling, where the use of importance sampling ratios often leads to estimates with severe variance. It is thus desirable to learn off-policy without using the ratios. However, such an algorithm does not exist for multi-step learning with function approximation. In this paper, we introduce the first such algorithm based on temporal-difference (TD) learning updates. We show that an explicit use of importance sampling ratios can be eliminated by varying the amount of bootstrapping in TD updates in an action-dependent manner. Our new algorithm achieves stability using a two-timescale gradient-based TD update. A prior algorithm based on lookup table representation called Tree Backup can also be retrieved using action-dependent bootstrapping, becoming a special case of our algorithm. In two challenging off-policy tasks, we demonstrate that our algorithm is stable, effectively avoids the large variance issue, and can perform substantially better than its state-of-the-art counterpart.
- The backpressure algorithm has been widely used as a distributed solution to the problem of joint rate control and routing in multi-hop data networks. By controlling a parameter $V$ in the algorithm, the backpressure algorithm can achieve an arbitrarily small utility optimality gap. However, this in turn brings in a large queue length at each node and hence causes large network delay. This phenomenon is known as the fundamental utility-delay tradeoff. The best known utility-delay tradeoff for general networks is $[O(1/V), O(V)]$ and is attained by a backpressure algorithm based on a drift-plus-penalty technique. This may suggest that to achieve an arbitrarily small utility optimality gap, the existing backpressure algorithms necessarily yield an arbitrarily large queue length. However, this paper proposes a new backpressure algorithm that has a vanishing utility optimality gap, so utility converges to exact optimality as the algorithm keeps running, while queue lengths are bounded throughout by a finite constant. The technique uses backpressure and drift concepts with a new method for convex programming.
- Jan 13 2017 cs.SY arXiv:1701.03343v1In this work, we investigate a state estimation problem for a full-car semi-active suspension system. To account for the complex calculation and optimization problems, a vehicle-to- cloud-to-vehicle (V2C2V) scheme is utilized. Moving horizon estimation is introduced for the state estimation system design. All the optimization problems are solved in a remotely-embedded agent with high computational ability. Measurements and state estimates are transmitted between the vehicle and the remote agent via networked communication channels. The effectiveness of the proposed method is illustrated via a set of simulations.
- Dec 13 2016 cs.CV arXiv:1612.03630v1In this paper, we develop a binary convolutional encoder-decoder network (B-CEDNet) for natural scene text processing (NSTP). It converts a text image to a class-distinguished salience map that reveals the categorical, spatial and morphological information of characters. The existing solutions are either memory consuming or run-time consuming that cannot be applied to real-time applications on resource-constrained devices such as advanced driver assistance systems. The developed network can process multiple regions containing characters by one-off forward operation, and is trained to have binary weights and binary feature maps, which lead to both remarkable inference run-time speedup and memory usage reduction. By training with over 200, 000 synthesis scene text images (size of $32\times128$), it can achieve $90\%$ and $91\%$ pixel-wise accuracy on ICDAR-03 and ICDAR-13 datasets. It only consumes $4.59\ ms$ inference run-time realized on GPU with a small network size of 2.14 MB, which is up to $8\times$ faster and $96\%$ smaller than it full-precision version.
- Nov 30 2016 cs.CV arXiv:1611.09587v1Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications. However,pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (1) roughly parses the frames within the video segment, (2) estimates the optical flow between frames and (3) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts.
- Nov 29 2016 cs.NI arXiv:1611.09011v2The OpenFlow-based SDN is widely studied to better network performance through planning fine-grained paths. However, being designed to configure path hop-by-hop, it faces the scalability issue that both the flow table overhead and path setup delay are unacceptable for large-scale networks. In this paper, we propose PACO, a framework based on Source Routing to address that problem through quickly pushing paths into the packet header at network edges and pre-installing few rules at the network core. The straightforward implementation of SR is inefficient as it would incur too many path labels; other efficient approaches would sacrifice path flexibility (e.g., DEFO). To implement SR efficiently and flexibly, PACO presents each path as a concatenation of pathlets and introduces algorithms to compute pathlets and concatenate paths with minimum path labels. Our extensive simulations confirm the scalability of PACO as it saves the flow table overhead up to 94% compared with OpenFlow-SDN solutions and show that PACO outperforms SR-SDN solutions by supporting more than 40% paths with few label overhead.
- Deep learning, as a promising new area of machine learning, has attracted a rapidly increasing attention in the field of medical imaging. Compared to the conventional machine learning methods, deep learning requires no hand-tuned feature extractor, and has shown a superior performance in many visual object recognition applications. In this study, we develop a deep convolutional neural network (CNN) and apply it to thoracic CT images for the classification of lung nodules. We present the CNN architecture and classification accuracy for the original images of lung nodules. In order to understand the features of lung nodules, we further construct new datasets, based on the combination of artificial geometric nodules and some transformations of the original images, as well as a stochastic nodule shape model. It is found that simplistic geometric nodules cannot capture the important features of lung nodules.
- Nov 15 2016 cs.CL arXiv:1611.04491v1Objective: Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, medical jargon, which abounds in EHR notes, has been shown to be a barrier for patient EHR comprehension. Existing knowledge bases that link medical jargon to lay terms or definitions play an important role in alleviating this problem but have low coverage of medical jargon in EHRs. We developed a data-driven approach that mines EHRs to identify and rank medical jargon based on its importance to patients, to support the building of EHR-centric lay language resources. Methods: We developed an innovative adapted distant supervision (ADS) model based on support vector machines to rank medical jargon from EHRs. For distant supervision, we utilized the open-access, collaborative consumer health vocabulary, a large, publicly available resource that links lay terms to medical jargon. We explored both knowledge-based features from the Unified Medical Language System and distributed word representations learned from unlabeled large corpora. We evaluated the ADS model using physician-identified important medical terms. Results: Our ADS model significantly surpassed two state-of-the-art automatic term recognition methods, TF*IDF and C-Value, yielding 0.810 ROC-AUC versus 0.710 and 0.667, respectively. Our model identified 10K important medical jargon terms after ranking over 100K candidate terms mined from over 7,500 EHR narratives. Conclusion: Our work is an important step towards enriching lexical resources that link medical jargon to lay terms/definitions to support patient EHR comprehension. The identified medical jargon terms and their rankings are available upon request.
- Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones with the highest text similarities. In this study, we first develop an innovative crowd-sourcing approach to build an expert-annotated document-ranking corpus. Using this corpus as the gold standard, we then evaluate the approaches of using text similarity to rank the relatedness of articles. Finally, we develop and evaluate a new supervised model to automatically rank related scientific articles. Our results show that authors' ranking differ significantly from rankings by text-similarity-based models. By training a learning-to-rank model on a subset of the annotated corpus, we found the best supervised learning-to-rank model (SVM-Rank) significantly surpassed state-of-the-art baseline systems.
- One of the biggest open problems in external memory data structures is the priority queue problem with DecreaseKey operations. If only Insert and ExtractMin operations need to be supported, one can design a comparison-based priority queue performing $O((N/B)\lg_{M/B} N)$ I/Os over a sequence of $N$ operations, where $B$ is the disk block size in number of words and $M$ is the main memory size in number of words. This matches the lower bound for comparison-based sorting and is hence optimal for comparison-based priority queues. However, if we also need to support DecreaseKeys, the performance of the best known priority queue is only $O((N/B) \lg_2 N)$ I/Os. The big open question is whether a degradation in performance really is necessary. We answer this question affirmatively by proving a lower bound of $\Omega((N/B) \lg_{\lg N} B)$ I/Os for processing a sequence of $N$ intermixed Insert, ExtraxtMin and DecreaseKey operations. Our lower bound is proved in the cell probe model and thus holds also for non-comparison-based priority queues.
- Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets.
- Maximum Inner Product Search (MIPS) is an important task in many machine learning applications such as the prediction phase of a low-rank matrix factorization model for a recommender system. There have been some works on how to perform MIPS in sub-linear time recently. However, most of them do not have the flexibility to control the trade-off between search efficient and search quality. In this paper, we study the MIPS problem with a computational budget. By carefully studying the problem structure of MIPS, we develop a novel Greedy-MIPS algorithm, which can handle budgeted MIPS by design. While simple and intuitive, Greedy-MIPS yields surprisingly superior performance compared to state-of-the-art approaches. As a specific example, on a candidate set containing half a million vectors of dimension 200, Greedy-MIPS runs 200x faster than the naive approach while yielding search results with the top-5 precision greater than 75\%.
- Sep 08 2016 cs.GT arXiv:1609.01961v2Motivated by the recent efforts in extending LTE to the unlicensed spectrum, we propose a novel spectrum sharing framework for the coopetition (i.e., cooperation and competition) between LTE and Wi-Fi in the unlicensed band. Basically, the LTE network can choose to work in one of the two modes: in the competition mode, it randomly accesses an unlicensed channel, and interferes with the Wi-Fi access point using the same channel; in the cooperation mode, it delivers traffic for the Wi-Fi users in exchange for the exclusive access of the corresponding channel. Because the LTE network works in an interference-free manner in the cooperation mode, it can achieve a much larger data rate than that in the competition mode, which allows it to effectively serve both its own users and the Wi-Fi users. We design a second-price reverse auction mechanism, which enables the LTE provider and the Wi-Fi access point owners (APOs) to effectively negotiate the operation mode. Specifically, the LTE provider is the auctioneer (buyer), and the APOs are the bidders (sellers) who compete to sell their channel access opportunities to the LTE provider. In Stage I of the auction, the LTE provider announces a reserve rate. In Stage II of the auction, the APOs submit their bids. We show that the auction involves allocative externalities, i.e., the cooperation between the LTE provider and one APO benefits other APOs who are not directly involved in this cooperation. As a result, a particular APO's willingness to cooperate is affected by its belief about other APOs' willingness to cooperate. This makes our analysis much more challenging than that of the conventional second-price auction, where bidding truthfully is a weakly dominant strategy. We show that the APOs have a unique form of the equilibrium bidding strategies in Stage II, based on which we analyze the LTE provider's optimal reserve rate in Stage I.
- Sep 08 2016 cs.GT arXiv:1609.01951v3The proliferation of public Wi-Fi hotspots has brought new business potentials for Wi-Fi networks, which carry a significant amount of global mobile data traffic today. In this paper, we propose a novel Wi-Fi monetization model for venue owners (VOs) deploying public Wi-Fi hotspots, where the VOs can generate revenue by providing two different Wi-Fi access schemes for mobile users (MUs): (i) the premium access, in which MUs directly pay VOs for their Wi-Fi usage, and (ii) the advertising sponsored access, in which MUs watch advertisements in exchange of the free usage of Wi-Fi. VOs sell their ad spaces to advertisers (ADs) via an ad platform, and share the ADs' payments with the ad platform. We formulate the economic interactions among the ad platform, VOs, MUs, and ADs as a three-stage Stackelberg game. In Stage I, the ad platform announces its advertising revenue sharing policy. In Stage II, VOs determine the Wi-Fi prices (for MUs) and advertising prices (for ADs). In Stage III, MUs make access choices and ADs purchase advertising spaces. We analyze the sub-game perfect equilibrium (SPE) of the proposed game systematically, and our analysis shows the following useful observations. First, the ad platform's advertising revenue sharing policy in Stage I will affect only the VOs' Wi-Fi prices but not the VOs' advertising prices in Stage II. Second, both the VOs' Wi-Fi prices and advertising prices are non-decreasing in the advertising concentration level and non-increasing in the MU visiting frequency. Numerical results further show that the VOs are capable of generating large revenues through mainly providing one type of Wi-Fi access (the premium access or advertising sponsored access), depending on their advertising concentration levels and MU visiting frequencies.
- In 1982, Drezner proposed the (1|1)-centroid problem on the plane, in which two players, called the leader and the follower, open facilities to provide service to customers in a competitive manner. The leader opens the first facility, and then the follower opens the second. Each customer will patronize the facility closest to him (ties broken in favor of the leader's one), thereby decides the market share of the two players. The goal is to find the best position for the leader's facility so that his market share is maximized. The best algorithm for this problem is an $O(n^2 \log n)$-time parametric search approach, which searches over the space of possible market share values. In the same paper, Drezner also proposed a general version of (1|1)-centroid problem by introducing a minimal distance constraint $R$, such that the follower's facility is not allowed to be located within a distance $R$ from the leader's. He proposed an $O(n^5 \log n)$-time algorithm for this general version by identifying $O(n^4)$ points as the candidates of the optimal solution and checking the market share for each of them. In this paper, we develop a new parametric search approach searching over the $O(n^4)$ candidate points, and present an $O(n^2 \log n)$-time algorithm for the general version, thereby close the $O(n^3)$ gap between the two bounds.
- Aug 09 2016 cs.CV arXiv:1608.02023v2Study of cultural-heritage objects with embellished realistic and abstract designs made up of connected and intertwined curves crosscuts a number of related disciplines, including archaeology, art history, and heritage management. However, many objects, such as pottery sherds found in the archaeological record, are fragmentary, making the underlying complete designs unknowable at the scale of the sherd fragment. The challenge to reconstruct and study complete designs is stymied because 1) most fragmentary cultural-heritage objects contain only a small portion of the underlying full design, 2) in the case of a stamping application, the same design may be applied multiple times with spatial overlap on one object, and 3) curve patterns detected on an object are usually incomplete and noisy. As a result, classical curve-pattern matching algorithms, such as Chamfer matching, may perform poorly in identifying the underlying design. In this paper, we develop a new partial-to-global curve matching algorithm to address these challenges and better identify the full design from a fragmented cultural heritage object. Specifically, we develop the algorithm to identify the designs of the carved wooden paddles of the Southeastern Woodlands from unearthed pottery sherds. A set of pottery sherds from the Snow Collection, curated at Georgia Southern University, are used to test the proposed algorithm, with promising results.
- We study the cooperation of the mobile network operator (MNO) and the venue owners (VOs) on the public Wi-Fi deployment. We consider a one-to-many bargaining framework, where the MNO bargains with VOs sequentially to determine where to deploy Wi-Fi and how much to pay. Taking into account the negative externalities among different steps of bargaining, we analyze the following two cases: for the exogenous bargaining sequence case, we compute the optimal bargaining solution on the cooperation decisions and payments under a predetermined bargaining sequence; for the endogenous bargaining sequence case, the MNO decides the bargaining sequence to maximize its payoff. Through exploring the structural property of the optimal bargaining sequence, we design a low-complexity Optimal VO Bargaining Sequencing (OVBS) algorithm to search the optimal sequence. More specifically, we categorize the VOs into three types based on the impact of the Wi-Fi deployment at their venues, and show that it is optimal for the MNO to bargain with these three types of VOs sequentially. Numerical results show that compared with the random and worst bargaining sequences, the optimal bargaining sequence improves the MNO's payoff by up to 14.8% and 45.3%, respectively.
- Aug 03 2016 cs.CL arXiv:1608.00612v1Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
- We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.
- Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.
- In this paper, we consider an intensity modulated direct detection MIMO optical wireless communication (OWC) system. For such a system, a novel super-rectangular cover theory is developed to characterize both the unique identifiability and full reliability. This theory states that a transmitted matrix signal can be uniquely identified if and only if the cover order is equal to the transmitter aperture number, i.e., full cover. In addition, we prove that full reliability is guaranteed for space-time block coded MIMO-OWC over commonly used log-normal fading channels with an ML detector if and only if the STBC enables full cover. In addition, the diversity gain can be geometrically interpreted as the cover order of the super-rectangle, which should be maximized, and the volume of this super-rectangle, as the diversity loss, should be minimized. Using this established error performance criterion, the optimal linear STBC for block fading channels is proved to be spatial repetition code with an optimal power allocation. The design of the optimal non-linear STBC is shown to be equivalent to constructing the optimal multi-dimensional constellation. Specifically, a multi-dimensional constellation from Diophantine equations is proposed and then, shown to be more energy-efficient than the commonly used nonnegative pulse amplitude modulation constellation.
- Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding of EHRs. It has important applications in health informatics including pharmacovigilance and drug surveillance. The state of the art supervised machine learning models in this domain are based on Conditional Random Fields (CRFs) with features calculated from fixed context windows. In this application, we explored various recurrent neural network frameworks and show that they significantly outperformed the CRF models.
- Jun 28 2016 cs.CL arXiv:1606.07993v1Biomedical information extraction (BioIE) is important to many applications, including clinical decision support, integrative biology, and pharmacovigilance, and therefore it has been an active research. Unlike existing reviews covering a holistic view on BioIE, this review focuses on mainly recent advances in learning based approaches, by systematically summarizing them into different aspects of methodological development. In addition, we dive into open information extraction and deep learning, two emerging and influential techniques and envision next generation of BioIE.
- Throughout music history, theorists have identified and documented interpretable rules that capture the decisions of composers. This paper asks, "Can a machine behave like a music theorist?" It presents MUS-ROVER, a self-learning system for automatically discovering rules from symbolic music. MUS-ROVER performs feature learning via $n$-gram models to extract compositional rules --- statistical patterns over the resulting features. We evaluate MUS-ROVER on Bach's (SATB) chorales, demonstrating that it can recover known rules, as well as identify new, characteristic patterns for further study. We discuss how the extracted rules can be used in both machine and human composition.
- Jun 16 2016 cs.CL arXiv:1606.04597v1We introduce an agreement-based approach to learning parallel lexicons and phrases from non-parallel corpora. The basic idea is to encourage two asymmetric latent-variable translation models (i.e., source-to-target and target-to-source) to agree on identifying latent phrase and word alignments. The agreement is defined at both word and phrase levels. We develop a Viterbi EM algorithm for jointly training the two unidirectional models efficiently. Experiments on the Chinese-English dataset show that agreement-based learning significantly improves both alignment and translation performance.
- May 31 2016 cs.CL arXiv:1605.08889v2Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
- Previous studies in Open Information Extraction (Open IE) are mainly based on extraction patterns. They manually define patterns or automatically learn them from a large corpus. However, these approaches are limited when grasping the context of a sentence, and they fail to capture implicit relations. In this paper, we address this problem with the following methods. First, we exploit long short-term memory (LSTM) networks to extract higher-level features along the shortest dependency paths, connecting headwords of relations and arguments. The path-level features from LSTM networks provide useful clues regarding contextual information and the validity of arguments. Second, we constructed samples to train LSTM networks without the need for manual labeling. In particular, feedback negative sampling picks highly negative samples among non-positive samples through a model trained with positive samples. The experimental results show that our approach produces more precise and abundant extractions than state-of-the-art open IE systems. To the best of our knowledge, this is the first work to apply deep learning to Open IE.
- May 10 2016 cs.LG arXiv:1605.02099v1This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective. It supplements the latter analysis with simulation results and illustrates the behavior of some of the ETD algorithms using three example problems.
- Apr 12 2016 cs.DS arXiv:1604.03030v1This paper develops a new technique for proving amortized, randomized cell-probe lower bounds on dynamic data structure problems. We introduce a new randomized nondeterministic four-party communication model that enables "accelerated", error-preserving simulations of dynamic data structures. We use this technique to prove an $\Omega(n(\log n/\log\log n)^2)$ cell-probe lower bound for the dynamic 2D weighted orthogonal range counting problem (2D-ORC) with $n/\mathrm{poly}\log n$ updates and $n$ queries, that holds even for data structures with $\exp(-\tilde{\Omega}(n))$ success probability. This result not only proves the highest amortized lower bound to date, but is also tight in the strongest possible sense, as a matching upper bound can be obtained by a deterministic data structure with worst-case operational time. This is the first demonstration of a "sharp threshold" phenomenon for dynamic data structures. Our broader motivation is that cell-probe lower bounds for exponentially small success facilitate reductions from dynamic to static data structures. As a proof-of-concept, we show that a slightly strengthened version of our lower bound would imply an $\Omega((\log n /\log\log n)^2)$ lower bound for the static 3D-ORC problem with $O(n\log^{O(1)}n)$ space. Such result would give a near quadratic improvement over the highest known static cell-probe lower bound, and break the long standing $\Omega(\log n)$ barrier for static data structures.
- This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional projection based online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{\max\{\beta,1-\beta\}})$ regret and $O(T^{1-\beta/2})$ constraint violations where $\beta\in (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(\sqrt{T})$ regret bound with finite constraint violations.
- Apr 08 2016 cs.DC arXiv:1604.01950v1Demand response is widely employed by today's data centers to reduce energy consumption in response to the increasing of electricity cost. To incentivize users of data centers participate in the demand response programs, i.e., breaking the "split incentive" hurdle, some prior researches propose market-based mechanisms such as dynamic pricing and static monetary rewards. However, these mechanisms are either intrusive or unfair. In this paper, we use time-varying rewards to incentivize users, who have flexible deadlines and are willing to trading performance degradation for monetary rewards, grant time-shifting of their requests. With a game-theoretic framework, we model the game between a single data center and its users. Further, we extend our design via integrating it with two other emerging practical demand response strategies: server shutdown and local renewable energy generation. With real-world data traces, we show that a DC with our design can effectively shed its peak electricity load and overall electricity cost without reducing its profit, when comparing it with the current practice where no incentive mechanism is established.
- Hashing has emerged as a popular technique for large-scale similarity search. Most learning-based hashing methods generate compact yet correlated hash codes. However, this redundancy is storage-inefficient. Hence we propose a lossless variable-length hashing (VLH) method that is both storage- and search-efficient. Storage efficiency is achieved by converting the fixed-length hash code into a variable-length code. Search efficiency is obtained by using a multiple hash table structure. With VLH, we are able to deliberately add redundancy into hash codes to improve retrieval performance with little sacrifice in storage efficiency or search complexity. In particular, we propose a block K-means hashing (B-KMH) method to obtain significantly improved retrieval performance with no increase in storage and marginal increase in computational cost.
- Feb 22 2016 cs.SI arXiv:1602.06033v8Modeling and predicting the popularity of online content is a significant problem for the practice of information dissemination, advertising, and consumption. Recent work analyzing massive datasets advances our understanding of popularity, but one major gap remains: To precisely quantify the relationship between the popularity of an online item and the external promotions it receives. This work supplies the missing link between exogenous inputs from public social media platforms, such as Twitter, and endogenous responses within the content platform, such as YouTube. We develop a novel mathematical model, the Hawkes intensity process, which can explain the complex popularity history of each video according to its type of content, network of diffusion, and sensitivity to promotion. Our model supplies a prototypical description of videos, called an endo-exo map. This map explains popularity as the result of an extrinsic factor - the amount of promotions from the outside world that the video receives, acting upon two intrinsic factors - sensitivity to promotion, and inherent virality. We use this model to forecast future popularity given promotions on a large 5-months feed of the most-tweeted videos, and found it to lower the average error by 28.6% from approaches based on popularity history. Finally, we can identify videos that have a high potential to become viral, as well as those for which promotions will have hardly any effect.
- Massive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential.
- This paper considers dynamic transmit covariance design in point-to-point MIMO fading systems with unknown channel state distributions and inaccurate channel state information subject to both long term and short term power constraints. First, the case of instantaneous but possibly inaccurate channel state information at the transmitter (CSIT) is treated. By extending the drift-plus-penalty technique, a dynamic transmit covariance policy is developed and is shown to approach optimality with an $O(\delta)$ gap, where $\delta$ is the inaccuracy measure of CSIT, regardless of the channel state distribution and without requiring knowledge of this distribution. Next, the case of delayed and inaccurate channel state information is considered. The optimal transmit covariance solution that maximizes the ergodic capacity is fundamentally different in this case, and a different online algorithm based on convex projections is developed. The proposed algorithm for this delayed-CSIT case also has an $O(\delta)$ optimality gap, where $\delta$ is again the inaccuracy measure of CSIT.
- Dec 23 2015 cs.AI arXiv:1512.07162v1Attribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on addition-deletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.
- Dec 22 2015 cs.NI arXiv:1512.06428v2The explosive growth of global mobile traffic has lead to a rapid growth in the energy consumption in communication networks. In this paper, we focus on the energy-aware design of the network selection, subchannel, and power allocation in cellular and Wi-Fi networks, while taking into account the traffic delay of mobile users. The problem is particularly challenging due to the two-timescale operations for the network selection (large timescale) and subchannel and power allocation (small timescale). Based on the two-timescale Lyapunov optimization technique, we first design an online Energy-Aware Network Selection and Resource Allocation (ENSRA) algorithm. The ENSRA algorithm yields a power consumption within O(1/V) bound of the optimal value, and guarantees an O(V) traffic delay for any positive control parameter V. Motivated by the recent advancement in the accurate estimation and prediction of user mobility, channel conditions, and traffic demands, we further develop a novel predictive Lyapunov optimization technique to utilize the predictive information, and propose a Predictive Energy-Aware Network Selection and Resource Allocation (P-ENSRA) algorithm. We characterize the performance bounds of P-ENSRA in terms of the power-delay tradeoff theoretically. To reduce the computational complexity, we finally propose a Greedy Predictive Energy-Aware Network Selection and Resource Allocation (GP-ENSRA) algorithm, where the operator solves the problem in P-ENSRA approximately and iteratively. Numerical results show that GP-ENSRA significantly improves the power-delay performance over ENSRA in the large delay regime. For a wide range of system parameters, GP-ENSRA reduces the traffic delay over ENSRA by 20~30% under the same power consumption.
- Dec 07 2015 cs.DS arXiv:1512.01293v1In this paper, we develop a new communication model to prove a data structure lower bound for the dynamic interval union problem. The problem is to maintain a multiset of intervals $\mathcal{I}$ over $[0, n]$ with integer coordinates, supporting the following operations: - insert(a, b): add an interval $[a, b]$ to $\mathcal{I}$, provided that $a$ and $b$ are integers in $[0, n]$; - delete(a, b): delete a (previously inserted) interval $[a, b]$ from $\mathcal{I}$; - query(): return the total length of the union of all intervals in $\mathcal{I}$. It is related to the two-dimensional case of Klee's measure problem. We prove that there is a distribution over sequences of operations with $O(n)$ insertions and deletions, and $O(n^{0.01})$ queries, for which any data structure with any constant error probability requires $\Omega(n\log n)$ time in expectation. Interestingly, we use the sparse set disjointness protocol of HÃ¥stad and Wigderson [ToC'07] to speed up a reduction from a new kind of nondeterministic communication games, for which we prove lower bounds. For applications, we prove lower bounds for several dynamic graph problems by reducing them from dynamic interval union.
- Nov 25 2015 cs.LG arXiv:1511.07471v3We consider the emphatic temporal-difference (TD) algorithm, ETD($\lambda$), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD($\lambda$) algorithm was recently proposed by Sutton, Mahmood, and White to solve a long-standing divergence problem of the standard TD algorithm when it is applied to off-policy training, where data from an exploratory policy are used to evaluate other policies of interest. The almost sure convergence of ETD($\lambda$) has been proved in our recent work under general off-policy training conditions, but for a narrow range of diminishing stepsize. In this paper we present convergence results for constrained versions of ETD($\lambda$) with constant stepsize and with diminishing stepsize from a broad range. Our results characterize the asymptotic behavior of the trajectory of iterates produced by those algorithms, and are derived by combining key properties of ETD($\lambda$) with powerful convergence theorems from the weak convergence methods in stochastic approximation theory. For the case of constant stepsize, in addition to analyzing the behavior of the algorithms in the limit as the stepsize parameter approaches zero, we also analyze their behavior for a fixed stepsize and bound the deviations of their averaged iterates from the desired solution. These results are obtained by exploiting the weak Feller property of the Markov chains associated with the algorithms, and by using ergodic theorems for weak Feller Markov chains, in conjunction with the convergence results we get from the weak convergence methods. Besides ETD($\lambda$), our analysis also applies to the off-policy TD($\lambda$) algorithm, when the divergence issue is avoided by setting $\lambda$ sufficiently large.
- Nov 19 2015 cs.CV arXiv:1511.05914v1We make available to the community a new dataset to support action-recognition research. This dataset is different from prior datasets in several key ways. It is significantly larger. It contains streaming video with long segments containing multiple action occurrences that often overlap in space and/or time. All actions were filmed in the same collection of backgrounds so that background gives little clue as to action class. We had five humans replicate the annotation of temporal extent of action occurrences labeled with their class and measured a surprisingly low level of intercoder agreement. A baseline experiment shows that recent state-of-the-art methods perform poorly on this dataset. This suggests that this will be a challenging dataset to foster advances in action-recognition research. This manuscript serves to describe the novel content and characteristics of the LCA dataset, present the design decisions made when filming the dataset, and document the novel methods employed to annotate the dataset.