results for au:Yu_H in:cs

- Apr 25 2017 cs.CV arXiv:1704.07142v1With the increasing demands of applications in virtual reality such as 3D films, virtual Human-Machine Interactions and virtual agents, the analysis of 3D human face analysis is considered to be more and more important as a fundamental step for those virtual reality tasks. Due to information provided by an additional dimension, 3D facial reconstruction enables aforementioned tasks to be achieved with higher accuracy than those based on 2D facial analysis. The denser the 3D facial model is, the more information it could provide. However, most existing dense 3D facial reconstruction methods require complicated processing and high system cost. To this end, this paper presents a novel method that simplifies the process of dense 3D facial reconstruction by employing only one frame of depth data obtained with an off-the-shelf RGB-D sensor. The experiments showed competitive results with real world data.
- In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players Bob his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result it presents Bob with the next piece. This model has a closer and more natural correspondence to dynamic data structures than the classic communication models do and hence presents a new perspective on data structures. We first present a lower bound for the \emphonline set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a stronger lower bound. Then we apply the online communication model to data structure lower bounds by studying the Group Range Problem, a dynamic data structure problem. This problem admits an $O(\log n)$-time worst-case data structure. Using online communication complexity, we prove a tight cell-probe lower bound: spending $o(\log n)$ (even amortized) time per operation results in at best an $\exp(-\delta^2 n)$ probability of correctly answering a $(1/2+\delta)$-fraction of the $n$ queries.
- We consider off-policy temporal-difference (TD) learning in discounted Markov decision processes, where the goal is to evaluate a policy in a model-free way by using observations of a state process generated without executing the policy. To curb the high variance issue in off-policy TD learning, we propose a new scheme of setting the $\lambda$-parameters of TD, based on generalized Bellman equations. Our scheme is to set $\lambda$ according to the eligibility trace iterates calculated in TD, thereby easily keeping these traces in a desired bounded range. Compared to prior works, this scheme is more direct and flexible, and allows much larger $\lambda$ values for off-policy TD learning with bounded traces. Using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process. These results not only lead immediately to a characterization of the convergence behavior of least-squares based implementation of our scheme, but also prepare the ground for further analysis of gradient-based implementations.
- Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.
- We tackle a task where an agent learns to navigate in a 2D maze-like environment called XWORLD. In each session, the agent perceives a sequence of raw-pixel frames, a natural language command issued by a teacher, and a set of rewards. The agent learns the teacher's language from scratch in a grounded and compositional manner, such that after training it is able to correctly execute zero-shot commands: 1) the combination of words in the command never appeared before, and/or 2) the command contains new object concepts that are learned from another task but never learned from navigation. Our deep framework for the agent is trained end to end: it learns simultaneously the visual representations of the environment, the syntax and semantics of the language, and the action module that outputs actions. The zero-shot learning capability of our framework results from its compositionality and modularity with parameter tying. We visualize the intermediate outputs of the framework, demonstrating that the agent truly understands how to solve the problem. We believe that our results provide some preliminary insights on how to train an agent with similar abilities in a 3D environment.
- This paper proves the first super-logarithmic lower bounds on the cell probe complexity of dynamic boolean (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new method for proving dynamic cell probe lower bounds and use it to prove a $\tilde{\Omega}(\log^{1.5} n)$ lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting over $\mathbb{F}_2$ ([Pat07]). Proving an $\omega(\lg n)$ lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Pǎtraşcu's obituary [Tho13]. This result also implies the first $\omega(\lg n)$ lower bound for the classical 2D range counting problem, one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D rectangle stabbing, and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient one-way communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10].
- This paper studies a green paradigm for the underlay coexistence of primary users (PUs) and secondary users (SUs) in energy harvesting cognitive radio networks (EH-CRNs), wherein battery-free SUs capture both the spectrum and the energy of PUs to enhance spectrum efficiency and green energy utilization. To lower the transmit powers of SUs, we employ multi-hop transmission with time division multiple access, by which SUs first harvest energy from the RF signals of PUs and then transmit data in the allocated time concurrently with PUs, all in the licensed spectrum. In this way, the available transmit energy of each SU mainly depends on the harvested energy before the turn to transmit, namely energy causality. Meanwhile, the transmit powers of SUs must be strictly controlled to protect PUs from harmful interference. Thus, subject to the energy causality constraint and the interference power constraint, we study the end-to-end throughput maximization problem for optimal time and power allocation. To solve this nonconvex problem, we first equivalently transform it into a convex optimization problem and then propose the joint optimal time and power allocation (JOTPA) algorithm that iteratively solves a series of feasibility problems until convergence. Extensive simulations evaluate the performance of EH-CRNs with JOTPA in three typical deployment scenarios and validate the superiority of JOTPA by making comparisons with two other resource allocation algorithms.
- Both feedback of ratings and trust relationships can be used to reveal user preference to improve recommendation performance, especially for cold users. However, the high-order correlations between tow kind of data are always ignored by existing works. Towards this problem, we propose a Correlative Denoising Autoencoder (CoDAE) model to learn correlations from both rating and trust data for Top-N recommendation. First, a novel deep learning model CoDAE, in which two mid-layers from separate stack denoising autoencoders are fused into one shared layer. Advancing previous works which utilize these data in shallow level, this model can effectively extract high-order correlations from low-level representations of these data for recommendation. Second, to further learn implicit corrections between these two autoencoders, we develop a novel correlative regulation to build the relations between other hidden layers of the two separate autoencoders. In this way, this model can learn correlations more effectively and thus improve the recommendation quality. Comprehensive experiments on two public datasets demonstrate that the CoDAE significantly outperforms other state-of-the-art approaches in top-N recommendation task.
- Mar 03 2017 cs.CL arXiv:1703.00538v2Background: Electronic health record (EHR) notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Objective: The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. Methods: We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views for term importance: patient use of medical concepts, document-level term salience, word-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and compared it with three benchmark unsupervised ensemble ranking methods. Results: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. It outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. Conclusions: FIT can automatically identify EHR terms important to patients and may help develop personalized interventions to improve quality of care. By using unsupervised learning as well as a robust and flexible framework for information fusion, FIT can be readily applied to other domains and applications.
- Deep neural networks have been successfully applied in applications with a large amount of labeled data. However, there are major drawbacks of the neural networks that are related to rapid generalization with small data and continual learning of new concepts without forgetting. We present a novel meta learning method, Meta Networks (MetaNet), that acquires a meta-level knowledge across tasks and shifts its inductive bias via fast parameterization for the rapid generalization. When tested on the standard one-shot learning benchmarks, our MetaNet models achieved near human-level accuracy. We demonstrated several appealing properties of MetaNet relating to generalization and continual learning.
- Mar 01 2017 cs.CL arXiv:1702.08563v1Item Response Theory (IRT) allows for measuring ability of Machine Learning models as compared to a human population. However, it is difficult to create a large dataset to train the ability of deep neural network models (DNNs). We propose fine-tuning as a new training process, where a model pre-trained on a large dataset is fine-tuned with a small supplemental training set. Our results show that fine-tuning can improve the ability of a state-of-the-art DNN model for Recognizing Textual Entailment tasks.
- FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. Experiment results show that the proposed architecture is 8.3x faster and 75x more energy-efficient than a Titan X GPU for processing online individual requests (in small batch size). For processing static data (in large batch size), the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5x higher energy efficiency.
- Feb 17 2017 cs.CL arXiv:1702.04811v1Deep neural networks (DNNs) have set state of the art results in many machine learning and NLP tasks. However, we do not have a strong understanding of what DNN models learn. In this paper, we examine learning in DNNs through analysis of their outputs. We compare DNN performance directly to a human population, and use characteristics of individual data points such as difficulty to see how well models perform on easy and hard examples. We investigate how training size and the incorporation of noise affect a DNN's ability to generalize and learn. Our experiments show that unlike traditional machine learning models (e.g., Naive Bayes, Decision Trees), DNNs exhibit human-like learning properties. As they are trained with more data, they are more able to distinguish between easy and difficult items, and performance on easy items improves at a higher rate than difficult items. We find that different DNN models exhibit different strengths in learning and are robust to noise in training data.
- With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.
- Feb 13 2017 cs.LG arXiv:1702.03006v1To estimate the value functions of policies from exploratory data, most model-free off-policy algorithms rely on importance sampling, where the use of importance sampling ratios often leads to estimates with severe variance. It is thus desirable to learn off-policy without using the ratios. However, such an algorithm does not exist for multi-step learning with function approximation. In this paper, we introduce the first such algorithm based on temporal-difference (TD) learning updates. We show that an explicit use of importance sampling ratios can be eliminated by varying the amount of bootstrapping in TD updates in an action-dependent manner. Our new algorithm achieves stability using a two-timescale gradient-based TD update. A prior algorithm based on lookup table representation called Tree Backup can also be retrieved using action-dependent bootstrapping, becoming a special case of our algorithm. In two challenging off-policy tasks, we demonstrate that our algorithm is stable, effectively avoids the large variance issue, and can perform substantially better than its state-of-the-art counterpart.
- The backpressure algorithm has been widely used as a distributed solution to the problem of joint rate control and routing in multi-hop data networks. By controlling a parameter $V$ in the algorithm, the backpressure algorithm can achieve an arbitrarily small utility optimality gap. However, this in turn brings in a large queue length at each node and hence causes large network delay. This phenomenon is known as the fundamental utility-delay tradeoff. The best known utility-delay tradeoff for general networks is $[O(1/V), O(V)]$ and is attained by a backpressure algorithm based on a drift-plus-penalty technique. This may suggest that to achieve an arbitrarily small utility optimality gap, the existing backpressure algorithms necessarily yield an arbitrarily large queue length. However, this paper proposes a new backpressure algorithm that has a vanishing utility optimality gap, so utility converges to exact optimality as the algorithm keeps running, while queue lengths are bounded throughout by a finite constant. The technique uses backpressure and drift concepts with a new method for convex programming.
- Jan 13 2017 cs.SY arXiv:1701.03343v1In this work, we investigate a state estimation problem for a full-car semi-active suspension system. To account for the complex calculation and optimization problems, a vehicle-to- cloud-to-vehicle (V2C2V) scheme is utilized. Moving horizon estimation is introduced for the state estimation system design. All the optimization problems are solved in a remotely-embedded agent with high computational ability. Measurements and state estimates are transmitted between the vehicle and the remote agent via networked communication channels. The effectiveness of the proposed method is illustrated via a set of simulations.
- Dec 13 2016 cs.CV arXiv:1612.03630v1In this paper, we develop a binary convolutional encoder-decoder network (B-CEDNet) for natural scene text processing (NSTP). It converts a text image to a class-distinguished salience map that reveals the categorical, spatial and morphological information of characters. The existing solutions are either memory consuming or run-time consuming that cannot be applied to real-time applications on resource-constrained devices such as advanced driver assistance systems. The developed network can process multiple regions containing characters by one-off forward operation, and is trained to have binary weights and binary feature maps, which lead to both remarkable inference run-time speedup and memory usage reduction. By training with over 200, 000 synthesis scene text images (size of $32\times128$), it can achieve $90\%$ and $91\%$ pixel-wise accuracy on ICDAR-03 and ICDAR-13 datasets. It only consumes $4.59\ ms$ inference run-time realized on GPU with a small network size of 2.14 MB, which is up to $8\times$ faster and $96\%$ smaller than it full-precision version.
- Nov 30 2016 cs.CV arXiv:1611.09587v1Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications. However,pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (1) roughly parses the frames within the video segment, (2) estimates the optical flow between frames and (3) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts.
- Nov 29 2016 cs.NI arXiv:1611.09011v2The OpenFlow-based SDN is widely studied to better network performance through planning fine-grained paths. However, being designed to configure path hop-by-hop, it faces the scalability issue that both the flow table overhead and path setup delay are unacceptable for large-scale networks. In this paper, we propose PACO, a framework based on Source Routing to address that problem through quickly pushing paths into the packet header at network edges and pre-installing few rules at the network core. The straightforward implementation of SR is inefficient as it would incur too many path labels; other efficient approaches would sacrifice path flexibility (e.g., DEFO). To implement SR efficiently and flexibly, PACO presents each path as a concatenation of pathlets and introduces algorithms to compute pathlets and concatenate paths with minimum path labels. Our extensive simulations confirm the scalability of PACO as it saves the flow table overhead up to 94% compared with OpenFlow-SDN solutions and show that PACO outperforms SR-SDN solutions by supporting more than 40% paths with few label overhead.
- Deep learning, as a promising new area of machine learning, has attracted a rapidly increasing attention in the field of medical imaging. Compared to the conventional machine learning methods, deep learning requires no hand-tuned feature extractor, and has shown a superior performance in many visual object recognition applications. In this study, we develop a deep convolutional neural network (CNN) and apply it to thoracic CT images for the classification of lung nodules. We present the CNN architecture and classification accuracy for the original images of lung nodules. In order to understand the features of lung nodules, we further construct new datasets, based on the combination of artificial geometric nodules and some transformations of the original images, as well as a stochastic nodule shape model. It is found that simplistic geometric nodules cannot capture the important features of lung nodules.
- Nov 15 2016 cs.CL arXiv:1611.04491v1Objective: Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, medical jargon, which abounds in EHR notes, has been shown to be a barrier for patient EHR comprehension. Existing knowledge bases that link medical jargon to lay terms or definitions play an important role in alleviating this problem but have low coverage of medical jargon in EHRs. We developed a data-driven approach that mines EHRs to identify and rank medical jargon based on its importance to patients, to support the building of EHR-centric lay language resources. Methods: We developed an innovative adapted distant supervision (ADS) model based on support vector machines to rank medical jargon from EHRs. For distant supervision, we utilized the open-access, collaborative consumer health vocabulary, a large, publicly available resource that links lay terms to medical jargon. We explored both knowledge-based features from the Unified Medical Language System and distributed word representations learned from unlabeled large corpora. We evaluated the ADS model using physician-identified important medical terms. Results: Our ADS model significantly surpassed two state-of-the-art automatic term recognition methods, TF*IDF and C-Value, yielding 0.810 ROC-AUC versus 0.710 and 0.667, respectively. Our model identified 10K important medical jargon terms after ranking over 100K candidate terms mined from over 7,500 EHR narratives. Conclusion: Our work is an important step towards enriching lexical resources that link medical jargon to lay terms/definitions to support patient EHR comprehension. The identified medical jargon terms and their rankings are available upon request.
- Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones with the highest text similarities. In this study, we first develop an innovative crowd-sourcing approach to build an expert-annotated document-ranking corpus. Using this corpus as the gold standard, we then evaluate the approaches of using text similarity to rank the relatedness of articles. Finally, we develop and evaluate a new supervised model to automatically rank related scientific articles. Our results show that authors' ranking differ significantly from rankings by text-similarity-based models. By training a learning-to-rank model on a subset of the annotated corpus, we found the best supervised learning-to-rank model (SVM-Rank) significantly surpassed state-of-the-art baseline systems.
- One of the biggest open problems in external memory data structures is the priority queue problem with DecreaseKey operations. If only Insert and ExtractMin operations need to be supported, one can design a comparison-based priority queue performing $O((N/B)\lg_{M/B} N)$ I/Os over a sequence of $N$ operations, where $B$ is the disk block size in number of words and $M$ is the main memory size in number of words. This matches the lower bound for comparison-based sorting and is hence optimal for comparison-based priority queues. However, if we also need to support DecreaseKeys, the performance of the best known priority queue is only $O((N/B) \lg_2 N)$ I/Os. The big open question is whether a degradation in performance really is necessary. We answer this question affirmatively by proving a lower bound of $\Omega((N/B) \lg_{\lg N} B)$ I/Os for processing a sequence of $N$ intermixed Insert, ExtraxtMin and DecreaseKey operations. Our lower bound is proved in the cell probe model and thus holds also for non-comparison-based priority queues.
- Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets.
- Maximum Inner Product Search (MIPS) is an important task in many machine learning applications such as the prediction phase of a low-rank matrix factorization model for a recommender system. There have been some works on how to perform MIPS in sub-linear time recently. However, most of them do not have the flexibility to control the trade-off between search efficient and search quality. In this paper, we study the MIPS problem with a computational budget. By carefully studying the problem structure of MIPS, we develop a novel Greedy-MIPS algorithm, which can handle budgeted MIPS by design. While simple and intuitive, Greedy-MIPS yields surprisingly superior performance compared to state-of-the-art approaches. As a specific example, on a candidate set containing half a million vectors of dimension 200, Greedy-MIPS runs 200x faster than the naive approach while yielding search results with the top-5 precision greater than 75\%.
- Sep 08 2016 cs.GT arXiv:1609.01961v2Motivated by the recent efforts in extending LTE to the unlicensed spectrum, we propose a novel spectrum sharing framework for the coopetition (i.e., cooperation and competition) between LTE and Wi-Fi in the unlicensed band. Basically, the LTE network can choose to work in one of the two modes: in the competition mode, it randomly accesses an unlicensed channel, and interferes with the Wi-Fi access point using the same channel; in the cooperation mode, it delivers traffic for the Wi-Fi users in exchange for the exclusive access of the corresponding channel. Because the LTE network works in an interference-free manner in the cooperation mode, it can achieve a much larger data rate than that in the competition mode, which allows it to effectively serve both its own users and the Wi-Fi users. We design a second-price reverse auction mechanism, which enables the LTE provider and the Wi-Fi access point owners (APOs) to effectively negotiate the operation mode. Specifically, the LTE provider is the auctioneer (buyer), and the APOs are the bidders (sellers) who compete to sell their channel access opportunities to the LTE provider. In Stage I of the auction, the LTE provider announces a reserve rate. In Stage II of the auction, the APOs submit their bids. We show that the auction involves allocative externalities, i.e., the cooperation between the LTE provider and one APO benefits other APOs who are not directly involved in this cooperation. As a result, a particular APO's willingness to cooperate is affected by its belief about other APOs' willingness to cooperate. This makes our analysis much more challenging than that of the conventional second-price auction, where bidding truthfully is a weakly dominant strategy. We show that the APOs have a unique form of the equilibrium bidding strategies in Stage II, based on which we analyze the LTE provider's optimal reserve rate in Stage I.
- Sep 08 2016 cs.GT arXiv:1609.01951v3The proliferation of public Wi-Fi hotspots has brought new business potentials for Wi-Fi networks, which carry a significant amount of global mobile data traffic today. In this paper, we propose a novel Wi-Fi monetization model for venue owners (VOs) deploying public Wi-Fi hotspots, where the VOs can generate revenue by providing two different Wi-Fi access schemes for mobile users (MUs): (i) the premium access, in which MUs directly pay VOs for their Wi-Fi usage, and (ii) the advertising sponsored access, in which MUs watch advertisements in exchange of the free usage of Wi-Fi. VOs sell their ad spaces to advertisers (ADs) via an ad platform, and share the ADs' payments with the ad platform. We formulate the economic interactions among the ad platform, VOs, MUs, and ADs as a three-stage Stackelberg game. In Stage I, the ad platform announces its advertising revenue sharing policy. In Stage II, VOs determine the Wi-Fi prices (for MUs) and advertising prices (for ADs). In Stage III, MUs make access choices and ADs purchase advertising spaces. We analyze the sub-game perfect equilibrium (SPE) of the proposed game systematically, and our analysis shows the following useful observations. First, the ad platform's advertising revenue sharing policy in Stage I will affect only the VOs' Wi-Fi prices but not the VOs' advertising prices in Stage II. Second, both the VOs' Wi-Fi prices and advertising prices are non-decreasing in the advertising concentration level and non-increasing in the MU visiting frequency. Numerical results further show that the VOs are capable of generating large revenues through mainly providing one type of Wi-Fi access (the premium access or advertising sponsored access), depending on their advertising concentration levels and MU visiting frequencies.
- In 1982, Drezner proposed the (1|1)-centroid problem on the plane, in which two players, called the leader and the follower, open facilities to provide service to customers in a competitive manner. The leader opens the first facility, and then the follower opens the second. Each customer will patronize the facility closest to him (ties broken in favor of the leader's one), thereby decides the market share of the two players. The goal is to find the best position for the leader's facility so that his market share is maximized. The best algorithm for this problem is an $O(n^2 \log n)$-time parametric search approach, which searches over the space of possible market share values. In the same paper, Drezner also proposed a general version of (1|1)-centroid problem by introducing a minimal distance constraint $R$, such that the follower's facility is not allowed to be located within a distance $R$ from the leader's. He proposed an $O(n^5 \log n)$-time algorithm for this general version by identifying $O(n^4)$ points as the candidates of the optimal solution and checking the market share for each of them. In this paper, we develop a new parametric search approach searching over the $O(n^4)$ candidate points, and present an $O(n^2 \log n)$-time algorithm for the general version, thereby close the $O(n^3)$ gap between the two bounds.
- Aug 09 2016 cs.CV arXiv:1608.02023v2Study of cultural-heritage objects with embellished realistic and abstract designs made up of connected and intertwined curves crosscuts a number of related disciplines, including archaeology, art history, and heritage management. However, many objects, such as pottery sherds found in the archaeological record, are fragmentary, making the underlying complete designs unknowable at the scale of the sherd fragment. The challenge to reconstruct and study complete designs is stymied because 1) most fragmentary cultural-heritage objects contain only a small portion of the underlying full design, 2) in the case of a stamping application, the same design may be applied multiple times with spatial overlap on one object, and 3) curve patterns detected on an object are usually incomplete and noisy. As a result, classical curve-pattern matching algorithms, such as Chamfer matching, may perform poorly in identifying the underlying design. In this paper, we develop a new partial-to-global curve matching algorithm to address these challenges and better identify the full design from a fragmented cultural heritage object. Specifically, we develop the algorithm to identify the designs of the carved wooden paddles of the Southeastern Woodlands from unearthed pottery sherds. A set of pottery sherds from the Snow Collection, curated at Georgia Southern University, are used to test the proposed algorithm, with promising results.
- We study the cooperation of the mobile network operator (MNO) and the venue owners (VOs) on the public Wi-Fi deployment. We consider a one-to-many bargaining framework, where the MNO bargains with VOs sequentially to determine where to deploy Wi-Fi and how much to pay. Taking into account the negative externalities among different steps of bargaining, we analyze the following two cases: for the exogenous bargaining sequence case, we compute the optimal bargaining solution on the cooperation decisions and payments under a predetermined bargaining sequence; for the endogenous bargaining sequence case, the MNO decides the bargaining sequence to maximize its payoff. Through exploring the structural property of the optimal bargaining sequence, we design a low-complexity Optimal VO Bargaining Sequencing (OVBS) algorithm to search the optimal sequence. More specifically, we categorize the VOs into three types based on the impact of the Wi-Fi deployment at their venues, and show that it is optimal for the MNO to bargain with these three types of VOs sequentially. Numerical results show that compared with the random and worst bargaining sequences, the optimal bargaining sequence improves the MNO's payoff by up to 14.8% and 45.3%, respectively.
- Aug 03 2016 cs.CL arXiv:1608.00612v1Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
- We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.
- Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.
- In this paper, we consider an intensity modulated direct detection MIMO optical wireless communication (OWC) system. For such a system, a novel super-rectangular cover theory is developed to characterize both the unique identifiability and full reliability. This theory states that a transmitted matrix signal can be uniquely identified if and only if the cover order is equal to the transmitter aperture number, i.e., full cover. In addition, we prove that full reliability is guaranteed for space-time block coded MIMO-OWC over commonly used log-normal fading channels with an ML detector if and only if the STBC enables full cover. In addition, the diversity gain can be geometrically interpreted as the cover order of the super-rectangle, which should be maximized, and the volume of this super-rectangle, as the diversity loss, should be minimized. Using this established error performance criterion, the optimal linear STBC for block fading channels is proved to be spatial repetition code with an optimal power allocation. The design of the optimal non-linear STBC is shown to be equivalent to constructing the optimal multi-dimensional constellation. Specifically, a multi-dimensional constellation from Diophantine equations is proposed and then, shown to be more energy-efficient than the commonly used nonnegative pulse amplitude modulation constellation.
- Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding of EHRs. It has important applications in health informatics including pharmacovigilance and drug surveillance. The state of the art supervised machine learning models in this domain are based on Conditional Random Fields (CRFs) with features calculated from fixed context windows. In this application, we explored various recurrent neural network frameworks and show that they significantly outperformed the CRF models.
- Jun 28 2016 cs.CL arXiv:1606.07993v1Biomedical information extraction (BioIE) is important to many applications, including clinical decision support, integrative biology, and pharmacovigilance, and therefore it has been an active research. Unlike existing reviews covering a holistic view on BioIE, this review focuses on mainly recent advances in learning based approaches, by systematically summarizing them into different aspects of methodological development. In addition, we dive into open information extraction and deep learning, two emerging and influential techniques and envision next generation of BioIE.
- Throughout music history, theorists have identified and documented interpretable rules that capture the decisions of composers. This paper asks, "Can a machine behave like a music theorist?" It presents MUS-ROVER, a self-learning system for automatically discovering rules from symbolic music. MUS-ROVER performs feature learning via $n$-gram models to extract compositional rules --- statistical patterns over the resulting features. We evaluate MUS-ROVER on Bach's (SATB) chorales, demonstrating that it can recover known rules, as well as identify new, characteristic patterns for further study. We discuss how the extracted rules can be used in both machine and human composition.
- Jun 16 2016 cs.CL arXiv:1606.04597v1We introduce an agreement-based approach to learning parallel lexicons and phrases from non-parallel corpora. The basic idea is to encourage two asymmetric latent-variable translation models (i.e., source-to-target and target-to-source) to agree on identifying latent phrase and word alignments. The agreement is defined at both word and phrase levels. We develop a Viterbi EM algorithm for jointly training the two unidirectional models efficiently. Experiments on the Chinese-English dataset show that agreement-based learning significantly improves both alignment and translation performance.
- May 31 2016 cs.CL arXiv:1605.08889v2Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
- Previous studies in Open Information Extraction (Open IE) are mainly based on extraction patterns. They manually define patterns or automatically learn them from a large corpus. However, these approaches are limited when grasping the context of a sentence, and they fail to capture implicit relations. In this paper, we address this problem with the following methods. First, we exploit long short-term memory (LSTM) networks to extract higher-level features along the shortest dependency paths, connecting headwords of relations and arguments. The path-level features from LSTM networks provide useful clues regarding contextual information and the validity of arguments. Second, we constructed samples to train LSTM networks without the need for manual labeling. In particular, feedback negative sampling picks highly negative samples among non-positive samples through a model trained with positive samples. The experimental results show that our approach produces more precise and abundant extractions than state-of-the-art open IE systems. To the best of our knowledge, this is the first work to apply deep learning to Open IE.
- May 10 2016 cs.LG arXiv:1605.02099v1This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective. It supplements the latter analysis with simulation results and illustrates the behavior of some of the ETD algorithms using three example problems.
- Apr 12 2016 cs.DS arXiv:1604.03030v1This paper develops a new technique for proving amortized, randomized cell-probe lower bounds on dynamic data structure problems. We introduce a new randomized nondeterministic four-party communication model that enables "accelerated", error-preserving simulations of dynamic data structures. We use this technique to prove an $\Omega(n(\log n/\log\log n)^2)$ cell-probe lower bound for the dynamic 2D weighted orthogonal range counting problem (2D-ORC) with $n/\mathrm{poly}\log n$ updates and $n$ queries, that holds even for data structures with $\exp(-\tilde{\Omega}(n))$ success probability. This result not only proves the highest amortized lower bound to date, but is also tight in the strongest possible sense, as a matching upper bound can be obtained by a deterministic data structure with worst-case operational time. This is the first demonstration of a "sharp threshold" phenomenon for dynamic data structures. Our broader motivation is that cell-probe lower bounds for exponentially small success facilitate reductions from dynamic to static data structures. As a proof-of-concept, we show that a slightly strengthened version of our lower bound would imply an $\Omega((\log n /\log\log n)^2)$ lower bound for the static 3D-ORC problem with $O(n\log^{O(1)}n)$ space. Such result would give a near quadratic improvement over the highest known static cell-probe lower bound, and break the long standing $\Omega(\log n)$ barrier for static data structures.
- This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional projection based online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{\max\{\beta,1-\beta\}})$ regret and $O(T^{1-\beta/2})$ constraint violations where $\beta\in (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(\sqrt{T})$ regret bound with finite constraint violations.
- Apr 08 2016 cs.DC arXiv:1604.01950v1Demand response is widely employed by today's data centers to reduce energy consumption in response to the increasing of electricity cost. To incentivize users of data centers participate in the demand response programs, i.e., breaking the "split incentive" hurdle, some prior researches propose market-based mechanisms such as dynamic pricing and static monetary rewards. However, these mechanisms are either intrusive or unfair. In this paper, we use time-varying rewards to incentivize users, who have flexible deadlines and are willing to trading performance degradation for monetary rewards, grant time-shifting of their requests. With a game-theoretic framework, we model the game between a single data center and its users. Further, we extend our design via integrating it with two other emerging practical demand response strategies: server shutdown and local renewable energy generation. With real-world data traces, we show that a DC with our design can effectively shed its peak electricity load and overall electricity cost without reducing its profit, when comparing it with the current practice where no incentive mechanism is established.
- Hashing has emerged as a popular technique for large-scale similarity search. Most learning-based hashing methods generate compact yet correlated hash codes. However, this redundancy is storage-inefficient. Hence we propose a lossless variable-length hashing (VLH) method that is both storage- and search-efficient. Storage efficiency is achieved by converting the fixed-length hash code into a variable-length code. Search efficiency is obtained by using a multiple hash table structure. With VLH, we are able to deliberately add redundancy into hash codes to improve retrieval performance with little sacrifice in storage efficiency or search complexity. In particular, we propose a block K-means hashing (B-KMH) method to obtain significantly improved retrieval performance with no increase in storage and marginal increase in computational cost.
- Feb 22 2016 cs.SI arXiv:1602.06033v7Modeling and predicting the popularity of online content is a significant problem for the practice of information dissemination, advertising, and consumption. Recent work analyzing massive datasets advances our understanding of popularity, but one major gap remains: To precisely quantify the relationship between the popularity of an online item and the external promotions it receives. This work supplies the missing link between exogenous inputs from public social media platforms, such as Twitter, and endogenous responses within the content platform, such as YouTube. We develop a novel mathematical model, the Hawkes intensity process, which can explain the complex popularity history of each video according to its type of content, network of diffusion, and sensitivity to promotion. Our model supplies a prototypical description of videos, called an endo-exo map. This map explains popularity as the result of an extrinsic factor - the amount of promotions from the outside world that the video receives, acting upon two intrinsic factors - sensitivity to promotion, and inherent virality. We use this model to forecast future popularity given promotions on a large 5-months feed of the most-tweeted videos, and found it to lower the average error by 28.6% from approaches based on popularity history. Finally, we can identify videos that have a high potential to become viral, as well as those for which promotions will have hardly any effect.
- Massive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential.
- This paper considers dynamic transmit covariance design in point-to-point MIMO fading systems with unknown channel state distributions and inaccurate channel state information subject to both long term and short term power constraints. First, the case of instantaneous but possibly inaccurate channel state information at the transmitter (CSIT) is treated. By extending the drift-plus-penalty technique, a dynamic transmit covariance policy is developed and is shown to approach optimality with an $O(\delta)$ gap, where $\delta$ is the inaccuracy measure of CSIT, regardless of the channel state distribution and without requiring knowledge of this distribution. Next, the case of delayed and inaccurate channel state information is considered. The optimal transmit covariance solution that maximizes the ergodic capacity is fundamentally different in this case, and a different online algorithm based on convex projections is developed. The proposed algorithm for this delayed-CSIT case also has an $O(\delta)$ optimality gap, where $\delta$ is again the inaccuracy measure of CSIT.
- Dec 23 2015 cs.AI arXiv:1512.07162v1Attribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on addition-deletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.
- Dec 22 2015 cs.NI arXiv:1512.06428v2The explosive growth of global mobile traffic has lead to a rapid growth in the energy consumption in communication networks. In this paper, we focus on the energy-aware design of the network selection, subchannel, and power allocation in cellular and Wi-Fi networks, while taking into account the traffic delay of mobile users. The problem is particularly challenging due to the two-timescale operations for the network selection (large timescale) and subchannel and power allocation (small timescale). Based on the two-timescale Lyapunov optimization technique, we first design an online Energy-Aware Network Selection and Resource Allocation (ENSRA) algorithm. The ENSRA algorithm yields a power consumption within O(1/V) bound of the optimal value, and guarantees an O(V) traffic delay for any positive control parameter V. Motivated by the recent advancement in the accurate estimation and prediction of user mobility, channel conditions, and traffic demands, we further develop a novel predictive Lyapunov optimization technique to utilize the predictive information, and propose a Predictive Energy-Aware Network Selection and Resource Allocation (P-ENSRA) algorithm. We characterize the performance bounds of P-ENSRA in terms of the power-delay tradeoff theoretically. To reduce the computational complexity, we finally propose a Greedy Predictive Energy-Aware Network Selection and Resource Allocation (GP-ENSRA) algorithm, where the operator solves the problem in P-ENSRA approximately and iteratively. Numerical results show that GP-ENSRA significantly improves the power-delay performance over ENSRA in the large delay regime. For a wide range of system parameters, GP-ENSRA reduces the traffic delay over ENSRA by 20~30% under the same power consumption.
- Dec 07 2015 cs.DS arXiv:1512.01293v1In this paper, we develop a new communication model to prove a data structure lower bound for the dynamic interval union problem. The problem is to maintain a multiset of intervals $\mathcal{I}$ over $[0, n]$ with integer coordinates, supporting the following operations: - insert(a, b): add an interval $[a, b]$ to $\mathcal{I}$, provided that $a$ and $b$ are integers in $[0, n]$; - delete(a, b): delete a (previously inserted) interval $[a, b]$ from $\mathcal{I}$; - query(): return the total length of the union of all intervals in $\mathcal{I}$. It is related to the two-dimensional case of Klee's measure problem. We prove that there is a distribution over sequences of operations with $O(n)$ insertions and deletions, and $O(n^{0.01})$ queries, for which any data structure with any constant error probability requires $\Omega(n\log n)$ time in expectation. Interestingly, we use the sparse set disjointness protocol of Håstad and Wigderson [ToC'07] to speed up a reduction from a new kind of nondeterministic communication games, for which we prove lower bounds. For applications, we prove lower bounds for several dynamic graph problems by reducing them from dynamic interval union.
- Nov 25 2015 cs.LG arXiv:1511.07471v3We consider the emphatic temporal-difference (TD) algorithm, ETD($\lambda$), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD($\lambda$) algorithm was recently proposed by Sutton, Mahmood, and White to solve a long-standing divergence problem of the standard TD algorithm when it is applied to off-policy training, where data from an exploratory policy are used to evaluate other policies of interest. The almost sure convergence of ETD($\lambda$) has been proved in our recent work under general off-policy training conditions, but for a narrow range of diminishing stepsize. In this paper we present convergence results for constrained versions of ETD($\lambda$) with constant stepsize and with diminishing stepsize from a broad range. Our results characterize the asymptotic behavior of the trajectory of iterates produced by those algorithms, and are derived by combining key properties of ETD($\lambda$) with powerful convergence theorems from the weak convergence methods in stochastic approximation theory. For the case of constant stepsize, in addition to analyzing the behavior of the algorithms in the limit as the stepsize parameter approaches zero, we also analyze their behavior for a fixed stepsize and bound the deviations of their averaged iterates from the desired solution. These results are obtained by exploiting the weak Feller property of the Markov chains associated with the algorithms, and by using ergodic theorems for weak Feller Markov chains, in conjunction with the convergence results we get from the weak convergence methods. Besides ETD($\lambda$), our analysis also applies to the off-policy TD($\lambda$) algorithm, when the divergence issue is avoided by setting $\lambda$ sufficiently large.
- Nov 19 2015 cs.CV arXiv:1511.05914v1We make available to the community a new dataset to support action-recognition research. This dataset is different from prior datasets in several key ways. It is significantly larger. It contains streaming video with long segments containing multiple action occurrences that often overlap in space and/or time. All actions were filmed in the same collection of backgrounds so that background gives little clue as to action class. We had five humans replicate the annotation of temporal extent of action occurrences labeled with their class and measured a surprisingly low level of intercoder agreement. A baseline experiment shows that recent state-of-the-art methods perform poorly on this dataset. This suggests that this will be a challenging dataset to foster advances in action-recognition research. This manuscript serves to describe the novel content and characteristics of the LCA dataset, present the design decisions made when filming the dataset, and document the novel methods employed to annotate the dataset.
- In collaborative recommendation systems, privacy may be compromised, as users' opinions are used to generate recommendations for others. In this paper, we consider an online collaborative recommendation system, and we measure users' privacy in terms of the standard differential privacy. We give the first quantitative analysis of the trade-offs between recommendation quality and users' privacy in such a system by showing a lower bound on the best achievable privacy for any non-trivial algorithm, and proposing a near-optimal algorithm. From our results, we find that there is actually little trade-off between recommendation quality and privacy for any non-trivial algorithm. Our results also identify the key parameters that determine the best achievable privacy.
- Oct 28 2015 cs.CV arXiv:1510.07712v2We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences to describe a realistic video. Our hierarchical framework contains a sentence generator and a paragraph generator. The sentence generator produces one simple short sentence that describes a specific short video interval. It exploits both temporal- and spatial-attention mechanisms to selectively focus on visual elements during generation. The paragraph generator captures the inter-sentence dependency by taking as input the sentential embedding produced by the sentence generator, combining it with the paragraph history, and outputting the new initial state for the sentence generator. We evaluate our approach on two large-scale benchmark datasets: YouTubeClips and TACoS-MultiLevel. The experiments demonstrate that our approach significantly outperforms the current state-of-the-art methods with BLEU@4 scores 0.499 and 0.305 respectively.
- High-dimensional time series prediction is needed in applications as diverse as demand forecasting and climatology. Often, such applications require methods that are both highly scalable, and deal with noisy data in terms of corruptions or missing values. Classical time series methods usually fall short of handling both these issues. In this paper, we propose to adapt matrix matrix completion approaches that have previously been successfully applied to large scale noisy data, but which fail to adequately model high-dimensional time series due to temporal dependencies. We present a novel temporal regularized matrix factorization (TRMF) framework which supports data-driven temporal dependency learning and enables forecasting ability to our new matrix factorization approach. TRMF is highly general, and subsumes many existing matrix factorization approaches for time series data. We make interesting connections to graph regularized matrix factorization methods in the context of learning the dependencies. Experiments on both real and synthetic data show that TRMF outperforms several existing approaches for common time series tasks.
- Sep 29 2015 cs.DC arXiv:1509.08231v1Solving the software dependency issue under the HPC environment has always been a difficult task for both computing system administrators and application scientists. This work would like to tackle the issue by introducing the modern container technology, the Docker, to be specific. By integrating the auto-scaling feature of service discovery with the light-weight virtualization tool, the Docker, the construction of a virtual cluster on top of physical cluster hardware is attempted. Thus, through the isolation of computing environment, a remedy of software dependency of HPC environment is possible.
- In this paper, we consider a multiple-input-multiple-output optical wireless communication (MIMO-OWC) system in the presence of log-normal fading. In this scenario, a general criterion for the design of full-diversity space code (FDSC) with the maximum likelihood (ML) detector is developed. This criterion reveals that in a high signal-to-noise ratio (SNR) regime, MIMO-OWC offers both large-scale diversity gain, governing the exponential decaying of the error curve, and small-scale diversity gain, producing traditional power-law decaying. Particularly for a two by two MIMO-OWC system with unipolar pulse amplitude modulation (PAM), a closed-form solution to the design problem of a linear FDSC optimizing both diversity gains is attained by taking advantage of the available properties on the successive terms of Farey sequences in number theory as well as by developing new properties on the disjoint intervals formed by the Farey sequence terms to attack the continuous and discrete variables mixed max-min design problem. In fact, this specific design not only proves that a repetition code (RC) is the optimal linear FDSC optimizing both the diversity gains, but also uncovers a significant difference between MIMO radio frequency (RF) communications and MIMO-OWC that space dimension alone is sufficient for a full large-scale diversity achievement. Computer simulations demonstrate that FDSC substantially outperforms uncoded spatial multiplexing with the same total optical power and spectral efficiency, and the latter provides only the small-scale diversity gain.
- Sep 08 2015 cs.CV arXiv:1509.01654v1Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.
- We present a unified framework which supports grounding natural-language semantics in robotic driving. This framework supports acquisition (learning grounded meanings of nouns and prepositions from human annotation of robotic driving paths), generation (using such acquired meanings to generate sentential description of new robotic driving paths), and comprehension (using such acquired meanings to support automated driving to accomplish navigational goals specified in natural language). We evaluate the performance of these three tasks by having independent human judges rate the semantic fidelity of the sentences associated with paths, achieving overall average correctness of 94.6% and overall average completeness of 85.6%.
- Aug 24 2015 cs.NI arXiv:1508.05344v1Wireless communication is a basis of the vision of connected and automated vehicles (CAVs). Given the heterogeneity of both wireless communication technologies and CAV applications, one question that is critical to technology road-mapping and policy making is which communication technology is more suitable for a specific CAV application. Focusing on the technical aspect of this question, we present a multi-scale spatiotemporal perspective of wireless communication technologies as well as canonical CAV applications in active safety, fuel economy and emission control, vehicle automation, and vehicular infotainment. Our analysis shows that CAV applications in the regime of small spatiotemporal scale communication requirements are best supported by V2V communications, applications in the regime of large spatiotemporal scale communication requirements are better supported by cellular communications, and applications in the regime of small spatial scale but medium-to-large temporal scale can be supported by both V2V and cellular communications and provide the opportunity of leveraging heterogeneous communication resources.
- Aug 11 2015 cs.CL arXiv:1508.02225v2The widely-used automatic evaluation metrics cannot adequately reflect the fluency of the translations. The n-gram-based metrics, like BLEU, limit the maximum length of matched fragments to n and cannot catch the matched fragments longer than n, so they can only reflect the fluency indirectly. METEOR, which is not limited by n-gram, uses the number of matched chunks but it does not consider the length of each chunk. In this paper, we propose an entropy-based method, which can sufficiently reflect the fluency of translations through the distribution of matched words. This method can easily combine with the widely-used automatic evaluation metrics to improve the evaluation of fluency. Experiments show that the correlations of BLEU and METEOR are improved on sentence level after combining with the entropy-based method on WMT 2010 and WMT 2012.
- Aug 11 2015 cs.CL arXiv:1508.01996v2Most of the syntax-based metrics obtain the similarity by comparing the sub-structures extracted from the trees of hypothesis and reference. These sub-structures are defined by human and can't express all the information in the trees because of the limited length of sub-structures. In addition, the overlapped parts between these sub-structures are computed repeatedly. To avoid these problems, we propose a novel automatic evaluation metric based on dependency parsing model, with no need to define sub-structures by human. First, we train a dependency parsing model by the reference dependency tree. Then we generate the hypothesis dependency tree and the corresponding probability by the dependency parsing model. The quality of the hypothesis can be judged by this probability. In order to obtain the lexicon similarity, we also introduce the unigram F-score to the new metric. Experiment results show that the new metric gets the state-of-the-art performance on system level, and is comparable with METEOR on sentence level.
- Jul 29 2015 cs.SY arXiv:1507.07844v2Composite adaptive control (CAC) that integrates direct and indirect adaptive control techniques can achieve smaller tracking errors and faster parameter convergence compared with direct and indirect adaptive control techniques. However, the condition of persistent excitation (PE) still has to be satisfied to guarantee parameter convergence in CAC. This paper proposes a novel model reference composite learning control (MRCLC) strategy for a class of affine nonlinear systems with parametric uncertainties to guarantee parameter convergence without the PE condition. In the composite learning, an integral during a moving-time window is utilized to construct a prediction error, a linear filter is applied to alleviate the derivation of plant states, and both the tracking error and the prediction error are applied to update parametric estimates. It is proven that the closed-loop system achieves global exponential-like stability under interval excitation rather than PE of regression functions. The effectiveness of the proposed MRCLC has been verified by the application to an inverted pendulum control problem.
- Jul 14 2015 cs.CV arXiv:1507.03060v2One popular approach to interactively segment the foreground object of interest from an image is to annotate a bounding box that covers the foreground object. Then, a binary labeling is performed to achieve a refined segmentation. One major issue of the existing algorithms for such interactive image segmentation is their preference of an input bounding box that tightly encloses the foreground object. This increases the annotation burden, and prevents these algorithms from utilizing automatically detected bounding boxes. In this paper, we develop a new LooseCut algorithm that can handle cases where the input bounding box only loosely covers the foreground object. We propose a new Markov Random Fields (MRF) model for segmentation with loosely bounded boxes, including a global similarity constraint to better distinguish the foreground and background, and an additional energy term to encourage consistent labeling of similar-appearance pixels. This MRF model is then solved by an iterated max-flow algorithm. In the experiments, we evaluate LooseCut in three publicly-available image datasets, and compare its performance against several state-of-the-art interactive image segmentation algorithms. We also show that LooseCut can be used for enhancing the performance of unsupervised video segmentation and image saliency detection.
- Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and convergent under off-policy training with linear function approximation. This paper serves as a unified summary of the available results from both works. In addition, we demonstrate the empirical benefits from the flexibility of emphatic algorithms, including state-dependent discounting, state-dependent bootstrapping, and the user-specified allocation of function approximation resources.
- Jun 19 2015 cs.SY arXiv:1506.05713v1In this paper, several necessary and sufficient graphical conditions are derived for the controllability of multi-agent systems by taking advantage of the proposed concept of controllability destructive nodes. A key step of arriving at this result is the establishment of a relationship between topology structures of the controllability destructive nodes and a specific eigenvector of the Laplacian matrix. The results on double, triple and quadruple controllability destructive nodes constitute a novel approach to study the controllability. In particular, the approach is applied to the graph consisting of five nodes to get a complete graphical characterization of controllability.
- Jun 09 2015 cs.CV arXiv:1506.02059v2We tackle the problem of video object codetection by leveraging the weak semantic constraint implied by sentences that describe the video content. Unlike most existing work that focuses on codetecting large objects which are usually salient both in size and appearance, we can codetect objects that are small or medium sized. Our method assumes no human pose or depth information such as is required by the most recent state-of-the-art method. We employ weak semantic constraint on the codetection process by pairing the video with sentences. Although the semantic information is usually simple and weak, it can greatly boost the performance of our codetection framework by reducing the search space of the hypothesized object detections. Our experiment demonstrates an average IoU score of 0.423 on a new challenging dataset which contains 15 object classes and 150 videos with 12,509 frames in total, and an average IoU score of 0.373 on a subset of an existing dataset, originally intended for activity recognition, which contains 5 object classes and 75 videos with 8,854 frames in total.
- Jun 09 2015 cs.LG arXiv:1506.02582v2We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence proofs for two emphatic algorithms, ETD($\lambda$) and ELSTD($\lambda$). We prove, under general off-policy conditions, the convergence in $L^1$ for ELSTD($\lambda$) iterates, and the almost sure convergence of the approximate value functions calculated by both algorithms using a single infinitely long trajectory. Our analysis involves new techniques with applications beyond emphatic algorithms leading, for example, to the first proof that standard TD($\lambda$) also converges under off-policy training for $\lambda$ sufficiently large.