This paper proposes XML-Defined Network policies (XDNP), a new high-level language based on XML notation, to describe network control rules in Software Defined Network environments. We rely on existing OpenFlow controllers specifically Floodlight but the novelty of this project is to separate complicated language- and framework-specific APIs from policy descriptions. This separation makes it possible to extend the current work as a northbound higher level abstraction that can support a wide range of controllers who are based on different programming languages. By this approach, we believe that network administrators can develop and deploy network control policies easier and faster.
This paper studies energy efficient resource allocation for a machine-to-machine (M2M) enabled cellular network with non-linear energy harvesting, especially focusing on two different multiple access strategies, namely non-orthogonal multiple access (NOMA) and time division multiple access (TDMA). Our goal is to minimize the total energy consumption of the network via joint power control and time allocation while taking into account circuit power consumption. For both NOMA and TDMA strategies, we show that it is optimal for each machine type communication device (MTCD) to transmit with the minimum throughput, and the energy consumption of each MTCD is a convex function with respect to the allocated transmission time. Based on the derived optimal conditions for the transmission power of MTCDs, we transform the original optimization problem for NOMA to an equivalent problem which can be solved suboptimally via an iterative power control and time allocation algorithm. Through an appropriate variable transformation, we also transform the original optimization problem for TDMA to an equivalent tractable problem, which can be iteratively solved. Numerical results verify the theoretical findings and demonstrate that NOMA consumes less total energy than TDMA at low circuit power regime of MTCDs, while at high circuit power regime of MTCDs TDMA achieves better network energy efficiency than NOMA.
Nov 22 2017 cs.CL
Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better agents in the short term, and collaborate by sharing their agents' skills in the long term. This results in a gamified, engaging experience for the Turkers and a better quality teaching signal for the agents compared to static datasets, as the Turkers naturally adapt the training data to the agent's abilities.
Nov 21 2017 cs.LG
Visual attributes, which refer to human-labeled semantic annotations, have gained increasing popularity in a wide range of real world applications. Generally, the existing attribute learning methods fall into two categories: one focuses on learning user-specific labels separately for different attributes, while the other one focuses on learning crowd-sourced global labels jointly for multiple attributes. However, both categories ignore the joint effect of the two mentioned factors: the personal diversity with respect to the global consensus; and the intrinsic correlation among multiple attributes. To overcome this challenge, we propose a novel model to learn user-specific predictors across multiple attributes. In our proposed model, the diversity of personalized opinions and the intrinsic relationship among multiple attributes are unified in a common-to-special manner. To this end, we adopt a three-component decomposition. Specifically, our model integrates a common cognition factor, an attribute-specific bias factor and a user-specific bias factor. Meanwhile Lasso and group Lasso penalties are adopted to leverage efficient feature selection. Furthermore, theoretical analysis is conducted to show that our proposed method could reach reasonable performance. Eventually, the empirical study carried out in this paper demonstrates the effectiveness of our proposed method.
Nov 17 2017 cs.CV
It has been recently shown that a convolutional neural network can learn optical flow estimation with unsupervised learning. However, the performance of the unsupervised methods still has a relatively large gap compared to its supervised counterpart. Occlusion and large motion are some of the major factors that limit the current unsupervised learning of optical flow methods. In this work we introduce a new method which models occlusion explicitly and a new warping way that facilitates the learning of large motion. Our method shows promising results on Flying Chairs, MPI-Sintel and KITTI benchmark datasets. Especially on KITTI dataset where abundant unlabeled samples exist, our unsupervised method outperforms its counterpart trained with supervised learning.
This paper presents a robust matrix elastic net based canonical correlation analysis (RMEN-CCA) for multiple view unsupervised learning problems, which emphasizes the combination of CCA and the robust matrix elastic net (RMEN) used as coupled feature selection. The RMEN-CCA leverages the strength of the RMEN to distill naturally meaningful features without any prior assumption and to measure effectively correlations between different 'views'. We can further employ directly the kernel trick to extend the RMEN-CCA to the kernel scenario with theoretical guarantees, which takes advantage of the kernel trick for highly complicated nonlinear feature learning. Rather than simply incorporating existing regularization minimization terms into CCA, this paper provides a new learning paradigm for CCA and is the first to derive a coupled feature selection based CCA algorithm that guarantees convergence. More significantly, for CCA, the newly-derived RMEN-CCA bridges the gap between measurement of relevance and coupled feature selection. Moreover, it is nontrivial to tackle directly the RMEN-CCA by previous optimization approaches derived from its sophisticated model architecture. Therefore, this paper further offers a bridge between a new optimization problem and an existing efficient iterative approach. As a consequence, the RMEN-CCA can overcome the limitation of CCA and address large-scale and streaming data problems. Experimental results on four popular competing datasets illustrate that the RMEN-CCA performs more effectively and efficiently than do state-of-the-art approaches.
In this paper, the influence of fan-shaped buffer zone on the performance of the toll plaza is researched. A two-dimensional traffic flow model and a comprehensive evaluation model based on mechanical model and psychological field are established. The traffic flow model is simulated by creating coordinate system. We first establish queue theory model to analyze vehicles when entering toll plaza. Then, a two-dimensional steadily car-following model is established based on psychological field for the analysis of vehicles when leaving toll plaza. According to psychological field theory, we analyze the force condition of each vehicle. The force of each vehicle is contributed by the vehicles in its observation area and obstacles. By projecting these vehicles and obstacles via the equipotential line in the psychological field, the influence on the value and direction acceleration of following vehicles is obtained. Consequently, the changes of each vehicle's speed and position are obtained as well. Next, we establish simulation based on the states of vehicles and make the rules of vehicle state-changing. By simulating the system, we obtain the throughput of the toll plaza's input and output. Then we obtained the bearing pressure on the road by the max throughput and the demand of the roads. Using the number of cars in per unit area as the safety factor. Then a comprehensive evaluation model is established based on bearing pressure on the road, cost and safety factor.
Nov 15 2017 cs.DB
Destination prediction is an essential task in a variety of mobile applications. In this paper, we optimize the matrix operation and adapt a semi-lazy framework to improve the prediction accuracy and efficiency of a state-of-the-art approach. To this end, we employ efficient dynamic-programming by devising several data constructs including Efficient Transition Probability and Transition Probabilities with Detours that are capable of pinpointing the minimum amount of computation. We prove that our method achieves one order of cut in both time and space complexity. The experimental results on real-world and synthetic datasets have shown that our solution consistently outperforms its state-of-the-art counterparts in terms of both efficiency (approximately over 100 times faster) and accuracy (above 30% increase).
Nov 13 2017 cs.CV
Learning to reconstruct depths in a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention in recent years. In this paper, we introduce a surface normal representation for unsupervised depth estimation framework. Our estimated depths are constrained to be compatible with predicted normals, yielding more robust geometry results. Specifically, we formulate an edge-aware depth-normal consistency term, and solve it by constructing a depth-to-normal layer and a normal-to-depth layer inside of the DCN. The depth-to-normal layer takes estimated depths as input, and computes normal directions using cross production based on neighboring pixels. Then given the estimated normals, the normal-to-depth layer outputs a regularized depth map through local planar smoothness. Both layers are computed with awareness of edges inside the image to help address the issue of depth/normal discontinuity and preserve sharp edges. Finally, to train the network, we apply the photometric error and gradient smoothness for both depth and normal predictions. We conducted experiments on both outdoor (KITTI) and indoor (NYUv2) datasets, and show that our algorithm vastly outperforms state of the art, which demonstrates the benefits from our approach.
We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively.
Nov 10 2017 cs.CR
The light-emitting diode(LED) is widely used as an indicator on the information device. Early in 2002, Loughry et al studied the exfiltration of LED indicators and found the kind of LEDs unmodulated to indicate some state of the device can hardly be utilized to establish covert channels. In our paper, a novel approach is proposed to modulate this kind of LEDs. We use binary frequency shift keying(B-FSK) to replace on-off keying(OOK) in modulation. In order to verify the validity, we implement a prototype of an exfiltration malware. Our experiment show a great improvement in the imperceptibility of covert communication. It is available to leak data covertly from air-gapped networks via unmodulated LED status indicators.
Nov 07 2017 cs.CL
Traditional Chinese Medicine (TCM) has accumulated a big amount of precious resource in the long history of development. TCM prescriptions that consist of TCM herbs are an important form of TCM treatment, which are similar to natural language documents, but in a weakly ordered fashion. Directly adapting language modeling style methods to learn the embeddings of the herbs can be problematic as the herbs are not strictly in order, the herbs in the front of the prescription can be connected to the very last ones. In this paper, we propose to represent TCM herbs with distributed representations via Prescription Level Language Modeling (PLLM). In one of our experiments, the correlation between our calculated similarity between medicines and the judgment of professionals achieves a Spearman score of 55.35 indicating a strong correlation, which surpasses human beginners (TCM related field bachelor student) by a big margin (over 10%).
In this paper, we consider the network utility maximization problem with various user priorities via jointly optimizing user association, load distribution and power control in a load-coupled heterogeneous network. In order to tackle the nonconvexity of the problem, we first analyze the problem by obtaining the optimal resource allocation strategy in closed form and characterizing the optimal base station load distribution pattern. Both observations are shown essential in simplifying the original problem and making it possible to transform the nonconvex load distribution and power control problem into convex reformulation via exponential variable transformation. An iterative algorithm with low complexity is accordingly presented to obtain a suboptimal solution to the joint optimization problem. Simulation results show that the proposed algorithm achieves better performance than conventional approaches.
This letter investigates the power control and channel assignment problem in device-to-device (D2D) communications underlaying a non-orthogonal multiple access (NOMA) cellular network. With the successive interference cancellation decoding order constraints, our target is to maximize the sum rate of D2D pairs while guaranteeing the minimum rate requirements of NOMA-based cellular users. Specifically, the optimal conditions for power control of cellular users on each subchannel are derived first. Then, based on these results, we propose a dual-based iterative algorithm to solve the resource allocation problem. Simulation results validate the superiority of proposed resource allocation algorithm over the existing orthogonal multiple access scheme.
Oct 11 2017 cs.CV
The timely provision of traffic sign information to drivers is essential for the drivers to respond, to ensure safe driving, and to avoid traffic accidents in a timely manner. We proposed a timely visual recognizability quantitative evaluation method for traffic signs in large-scale transportation environments. To achieve this goal, we first address the concept of a visibility field to reflect the visible distribution of three-dimensional (3D) space and construct a traffic sign Visibility Evaluation Model (VEM) to measure the traffic sign visibility for a given viewpoint. Then, based on the VEM, we proposed the concept of the Visual Recognizability Field (VRF) to reflect the visual recognizability distribution in 3D space and established a Visual Recognizability Evaluation Model (VREM) to measure a traffic sign visual recognizability for a given viewpoint. Next, we proposed a Traffic Sign Timely Visual Recognizability Evaluation Model (TSTVREM) by combining VREM, the actual maximum continuous visual recognizable distance, and traffic big data to measure a traffic sign visual recognizability in different lanes. Finally, we presented an automatic algorithm to implement the TSTVREM model through traffic sign and road marking detection and classification, traffic sign environment point cloud segmentation, viewpoints calculation, and TSTVREM model realization. The performance of our method for traffic sign timely visual recognizability evaluation is tested on three road point clouds acquired by a mobile laser scanning system (RIEGL VMX-450) according to Road Traffic Signs and Markings (GB 5768-1999 in China), showing that our method is feasible and efficient.
Oct 10 2017 cs.CV
For Hyperspectral image (HSI) datasets, each class have their salient feature and classifiers classify HSI datasets according to the class's saliency features, however, there will be different salient features when use different normalization method. In this letter, we report the effect on classifiers by different normalization methods and recommend the best normalization methods for classifier after analyzing the impact of different normalization methods on classifiers. Pavia University datasets, Indian Pines datasets and Kennedy Space Center datasets will apply to several typical classifiers in order to evaluate and analysis the impact of different normalization methods on typical classifiers.
Sep 20 2017 cs.NE
Research on the performance of recycled concrete as building material in the current world is an important subject. Given the complex composition of recycled concrete, conventional methods for forecasting slump scarcely obtain satisfactory results. Based on theory of nonlinear prediction method, we propose a recycled concrete slump prediction model based on geometric semantic genetic programming (GSGP) and combined it with recycled concrete features. Tests show that the model can accurately predict the recycled concrete slump by using the established prediction model to calculate the recycled concrete slump with different mixing ratios in practical projects and by comparing the predicted values with the experimental values. By comparing the model with several other nonlinear prediction models, we can conclude that GSGP has higher accuracy and reliability than conventional methods.
There is a special type of text which the order of the rows makes no difference (e.g., a word list). To compress these special texts, the traditional lossless compression method is not the ideal choice. A new method that can achieve better compression results for this type of texts is proposed. The texts are pre-processed by a method named SSE and are then compressed through the traditional lossless compression method. Comparison shows that an improved compression result is achieved.
Sep 13 2017 cs.CV
Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the op-timized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to effi-ciently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. The variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. Third, the spatial information is extracted using the weighted compo-site features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches.
Sep 12 2017 cs.CV
In this letter, to break the limit of the traditional linear models for SAR image despeckling, we propose a novel deep learning approach by learning a non-linear end-to-end mapping between the noisy and clean SAR images with a dilated residual network (SAR-DRN). SAR-DRN is based on dilated convolutions, which can both enlarge the receptive field and maintain the filter size and layer depth with a lightweight structure. In addition, skip connections are added to the despeckling model to reduce the vanishing gradient problem. Compared with the traditional despeckling methods, the proposed method shows superior performance over the state-of-the-art methods on both quantitative and visual assessments, especially for strong speckle noise.
Sep 11 2017 cs.CV
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
As a new machine learning approach, extreme learning machine (ELM) has received wide attentions due to its good performances. However, when directly applied to the hyperspectral image (HSI) classification, the recognition rate is too low. This is because ELM does not use the spatial information which is very important for HSI classification. In view of this, this paper proposes a new framework for spectral-spatial classification of HSI by combining ELM with loopy belief propagation (LBP). The original ELM is linear, and the nonlinear ELMs (or Kernel ELMs) are the improvement of linear ELM (LELM). However, based on lots of experiments and analysis, we found out that the LELM is a better choice than nonlinear ELM for spectral-spatial classification of HSI. Furthermore, we exploit the marginal probability distribution that uses the whole information in the HSI and learn such distribution using the LBP. The proposed method not only maintain the fast speed of ELM, but also greatly improves the accuracy of classification. The experimental results in the well-known HSI data sets, Indian Pines and Pavia University, demonstrate the good performances of the proposed method.
Motivated by statistical physics models connected to computation problems, we devise a tensor network technique that is suited to problems with or without translation invariance and with arbitrary boundary conditions. We introduce a compression-decimation algorithm as an efficient iterative scheme to optimize tensor networks that encode generalized vertex models on regular lattices. The algorithm first propagates local constraints to longer ranges via repeated contraction-decomposition sweeps over all lattice bonds, thus achieving compression on a given length scale. It then decimates the lattice via coarse-graining tensor contractions. Repeated iterations of these two steps allow us to gradually collapse the tensor network while keeping the tensor dimensions under control, such that ultimately the full tensor trace can be taken for relatively large systems. As a benchmark, we demonstrate the efficiency of the algorithm by computing the ground state entropy density of the planar ice model and the eight-vertex model. We then apply it to reversible classical computational problems based on a recently proposed vertex model representation of classical computations [Nat. Commun. 8, 15303 (2017)]. Our protocol allows us to obtain the exact number of solutions for computations where a naive enumeration would take astronomically long times, suggesting that the algorithm is a promising practical tool for the solution of a plethora of problems in physics and computer science.
Aug 31 2017 cs.NE
Genetic programming has been widely used in the engineering field. Compared with the conventional genetic programming and artificial neural network, geometric semantic genetic programming (GSGP) is superior in astringency and computing efficiency. In this paper, GSGP is adopted for the classification and regression analysis of a sample dataset. Furthermore, a model for slope stability analysis is established on the basis of geometric semantics. According to the results of the study based on GSGP, the method can analyze slope stability objectively and is highly precise in predicting slope stability and safety factors. Hence, the predicted results can be used as a reference for slope safety design.
Distributed machine learning algorithms enable processing of datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional indeed happen in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional distributed learning tasks. In this paper, two variants of an algorithm termed Byzantine-resilient distributed coordinate descent (ByRDiE) are developed and analyzed that solve distributed learning problems in the presence of Byzantine failures. Theoretical analysis as well as numerical experiments presented in the paper highlight the usefulness of ByRDiE for high-dimensional distributed learning in the presence of Byzantine failures.
Hierarchical organization is an important, prevalent characteristic of complex systems; in order to understand their organization, the study of the underlying (generally complex) networks that describe the interactions between their constituents plays a central role. Numerous previous works have shown that many real-world networks in social, biologic and technical systems present hierarchical organization, often in the form of a hierarchy of community structures. Many artificial benchmark graphs have been proposed in order to test different community detection methods, but no benchmark has been developed to throughly test the detection of hierarchical community structures. In this study, we fill this vacancy by extending the Lancichinetti-Fortunato-Radicchi (LFR) ensemble of benchmark graphs, adopting the rule of constructing hierarchical networks proposed by Ravasz and Barabási. We employ this benchmark to test three of the most popular community detection algorithms, and quantify their accuracy using the traditional Mutual Information and the recently introduced Hierarchical Mutual Information. The results indicate that the Ravasz-Barabási-Lancichinetti-Fortunato-Radicchi (RB-LFR) benchmark generates a complex hierarchical structure constituting a challenging benchmark for the considered community detection methods.
In a device-to-device (D2D) underlaid massive MIMO system, D2D transmitters reuse the uplink spectrum of cellular users (CUs), leading to cochannel interference. To decrease pilot overhead, we assume pilot reuse (PR) among D2D pairs. We first derive the minimum-mean-square-error (MMSE) estimation of all channels and give a lower bound on the ergodic achievable rate of both cellular and D2D links. To mitigate pilot contamination caused by PR, we then propose a pilot scheduling and pilot power control algorithm based on the criterion of minimizing the sum mean-square-error (MSE) of channel estimation of D2D links. We show that, with an appropriate PR ratio and a well designed pilot scheduling scheme, each D2D transmitter could transmit its pilot with maximum power. In addition, we also maximize the sum rate of all D2D links while guaranteeing the quality of service (QoS) of CUs, and develop an iterative algorithm to obtain a suboptimal solution. Simulation results show that the effect of pilot contamination can be greatly decreased by the proposed pilot scheduling algorithm, and the PR scheme provides significant performance gains over the conventional orthogonal training scheme in terms of system spectral efficiency.
Aug 02 2017 cs.CV
In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos. It is an important and challenging task as finding accurate human actions in both temporal and spatial space is important for analyzing large-scale video data. To tackle this problem, we propose a cascade proposal and location anticipation (CPLA) model for frame-level action detection. There are several salient points of our model: (1) a cascade region proposal network (casRPN) is adopted for action proposal generation and shows better localization accuracy compared with single region proposal network (RPN); (2) action spatio-temporal consistencies are exploited via a location anticipation network (LAN) and thus frame-level action detection is not conducted independently. Frame-level detections are then linked by solving an linking score maximization problem, and temporally trimmed into spatio-temporal action tubes. We demonstrate the effectiveness of our model on the challenging UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.
In this paper, we investigate the problems of sum power minimization and sum rate maximization for multi-cell networks with non-orthogonal multiple access. Considering the sum power minimization, we obtain closed-form solutions to the optimal power allocation strategy and then successfully transform the original problem to a linear one with a much smaller size, which can be optimally solved by using the standard interference function. To solve the nonconvex sum rate maximization problem, we first prove that the power allocation problem for a single cell is a convex problem. By analyzing the Karush-Kuhn-Tucker conditions, the optimal power allocation for users in a single cell is derived in closed form. Based on the optimal solution in each cell, a distributed algorithm is accordingly proposed to acquire efficient solutions. Numerical results verify our theoretical findings showing the superiority of our solutions compared to the orthogonal frequency division multiple access and broadcast channel.
Jul 18 2017 cs.CV
Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame's representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.
In this paper, we consider the problems of minimizing sum power and maximizing sum rate for multi-cell networks with load coupling, where coupling relation occurs among cells due to inter-cell interference. This coupling relation is characterized by the signal-to-interference-and-noise-ratio (SINR) coupling model with cell load vector and cell power vector as the variables. Due to the nonlinear SINR coupling model, the optimization problems for multi-cell networks with load coupling is nonconvex. To solve these nonconvex problems, we first consider the optimization problems for single-cell networks. Through variable transformations, the optimization problems can be equivalently transformed into convex problems. By solving the Karush-Kuhn-Tucker (KKT), the optimal solutions to power minimization and rate maximization problems can be obtained in closed form. Based on the theoretical findings of optimization problems for single-cell networks, we develop a distributed time allocation and power control algorithm with low complexity for sum power minimization in multi-cell networks. This algorithm is proved to be convergent and globally optimal by using the properties of standard interference function. For sum rate optimization in multi-cell networks, we also provide a distributed algorithm which yields suboptimal solution. Besides, the convergence for this distributed algorithm is proved. Numerical results illustrate the theoretical findings, showing the superiority of our solutions compared to the conventional solution of allocating uniform power for users in the same cell.
Jun 27 2017 cs.DS
We present a novel sparsity-based space-time adaptive processing (STAP) technique based on the alternating direction method to overcome the severe performance degradation caused by array gain/phase (GP) errors. The proposed algorithm reformulates the STAP problem as a joint optimization problem of the spatio-Doppler profile and GP errors in both single and multiple snapshots, and introduces a target detector using the reconstructed spatio-Doppler profiles. Simulations are conducted to illustrate the benefits of the proposed algorithm.
Jun 27 2017 cs.SE
Self-adaptive software is considered as the most advanced approach and its development attracts a lot of attention. Decentralization is an effective way to design and manage the complexity of modern self-adaptive software systems. However, there are still tremendous challenges. One major challenge is to unify decentrality with traditional self-adaptive implementation framework during design and implementation activity. One is to guarantee the required global goals and performance of decentralized self-adaptive systems operating in highly dynamic and uncertain environments. Another challenge is to predict the influence of system's internal change on its self-adaptability to the environment. To solve these problems, we combine the mechanisms of separation of concerns with modeling method using timed automata to allow the system to be analyzed and verified. Timed computation tree logic is used to specify system goals and stochastic simulations in dynamic environment are experimented to verify decentralized self-adaptive system's adaptation properties. In this paper, we extracted a motivation example from practical applications in UAV emergency mission scenarios. The whole approach is evaluated and illustrated with this motivation example and the statistical results can be used as reference for arrangement planning of UAVs in cyber physical spaces.
Jun 09 2017 cs.CV
Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that automatically learns to describe the content of images. Our model consists of two sub-models: an object detection and localization model, which extract the information of objects and their spatial relationship in images respectively; Besides, a deep recurrent neural network (RNN) based on long short-term memory (LSTM) units with attention mechanism for sentences generation. Each word of the description will be automatically aligned to different objects of the input image when it is generated. This is similar to the attention mechanism of the human visual system. Experimental results on the COCO dataset showcase the merit of the proposed method, which outperforms previous benchmark models.
Deep generative models have achieved impressive success in recent years. Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as powerful frameworks for deep generative model learning, have largely been considered as two distinct paradigms and received extensive independent study respectively. This paper establishes formal connections between deep generative modeling approaches through a new formulation of GANs and VAEs. We show that GANs and VAEs are essentially minimizing KL divergences of respective posterior and inference distributions with opposite directions, extending the two learning phases of classic wake-sleep algorithm, respectively. The unified view provides a powerful tool to analyze a diverse set of existing model variants, and enables to exchange ideas across research lines in a principled way. For example, we transfer the importance weighting method in VAE literatures for improved GAN learning, and enhance VAEs with an adversarial mechanism for leveraging generated samples. Quantitative experiments show generality and effectiveness of the imported extensions.
In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets.
May 25 2017 cs.CV
Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted to CNN to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network(DREN) consisting of cycle layers, isotonic layers and decycle layers.Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures. Our codes are released on GitHub.
Inferring the relations between two images is an important class of tasks in computer vision. Examples of such tasks include computing optical flow and stereo disparity. We treat the relation inference tasks as a machine learning problem and tackle it with neural networks. A key to the problem is learning a representation of relations. We propose a new neural network module, contrast association unit (CAU), which explicitly models the relations between two sets of input variables. Due to the non-negativity of the weights in CAU, we adopt a multiplicative update algorithm for learning these weights. Experiments show that neural networks with CAUs are more effective in learning five fundamental image transformations than conventional neural networks.
May 08 2017 cs.CV
This paper focuses on temporal localization of actions in untrimmed videos. Existing methods typically train classifiers for a pre-defined list of actions and apply them in a sliding window fashion. However, activities in the wild consist of a wide combination of actors, actions and objects; it is difficult to design a proper activity list that meets users' needs. We propose to localize activities by natural language queries. Temporal Activity Localization via Language (TALL) is challenging as it requires: (1) suitable design of text and video representations to allow cross-modal matching of actions and language queries; (2) ability to locate actions accurately given features from sliding windows of limited granularity. We propose a novel Cross-modal Temporal Regression Localizer (CTRL) to jointly model text query and video clips, output alignment scores and action boundary regression results for candidate clips. For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA. We also build complex sentence queries in Charades-STA for test. Experimental results show that CTRL outperforms previous methods significantly on both datasets.
May 04 2017 cs.CV
Temporal action detection in long videos is an important problem. State-of-the-art methods address this problem by applying action classifiers on sliding windows. Although sliding windows may contain an identifiable portion of the actions, they may not necessarily cover the entire action instance, which would lead to inferior performance. We adapt a two-stage temporal action detection pipeline with Cascaded Boundary Regression (CBR) model. Class-agnostic proposals and specific actions are detected respectively in the first and the second stage. CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows. The salient aspect of the refinement process is that, inside each stage, the temporal boundaries are adjusted in a cascaded way by feeding the refined windows back to the system for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and achieve state-of-the-art performance on both datasets. The performance gain is especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14 is improved from 19.0% to 31.0%.
Apr 24 2017 cs.CV
The use of color in QR codes brings extra data capacity, but also inflicts tremendous challenges on the decoding process due to chromatic distortion, cross-channel color interference and illumination variation. Particularly, we further discover a new type of chromatic distortion in high-density color QR codes, cross-module color interference, caused by the high density which also makes the geometric distortion correction more challenging. To address these problems, we propose two approaches, namely, LSVM-CMI and QDA-CMI, which jointly model these different types of chromatic distortion. Extended from SVM and QDA, respectively, both LSVM-CMI and QDA-CMI optimize over a particular objective function to learn a color classifier. Furthermore, a robust geometric transformation method is proposed to accurately correct the geometric distortion for high-density color QR codes. We put forth and implement a framework for high-capacity color QR codes equipped with our methods, called HiQ. To evaluate the performance of HiQ, we collect a challenging large-scale color QR code dataset, CUHK-CQRC, which consists of 5390 high-density color QR code samples. The comparison with the baseline method  on CUHK-CQRC shows that HiQ at least outperforms  by 188% in decoding success rate and 60% in bit error rate. Our implementation of HiQ in iOS and Android also demonstrates the effectiveness of our framework in real-world applications.
Apr 05 2017 cs.SE
Self-adaptive systems (SASs) are capable of adjusting its behavior in response to meaningful changes in the operational con-text and itself. The adaptation needs to be performed automatically through self-managed reactions and decision-making processes at runtime. To support this kind of automatic behavior, SASs must be endowed by a rich runtime support that can detect requirements violations and reason about adaptation decisions. Requirements Engineering for SASs primarily aims to model adaptation logic and mechanisms. Requirements models will guide the design decisions and runtime behaviors of sys-tem-to-be. This paper proposes a model-driven approach for achieving adaptation against non-functional requirements (NFRs), i.e. reliability and performances. The approach begins with the models in RE stage and provides runtime support for self-adaptation. We capture adaptation mechanisms as graphical elements in the goal model. By assigning reliability and performance attributes to related system tasks, we derive the tagged sequential diagram for specifying the reliability and performances of system behaviors. To formalize system behavior, we transform the requirements model to the corresponding behavior model, expressed by Label Transition Systems (LTS). To analyze the reliability requirements and performance requirements, we merged the sequential diagram and LTS to a variable Discrete-Time Markov Chains (DTMC) and a variable Continuous-Time Markov Chains (CTMC) respectively. Adaptation candidates are characterized by the variable states. The optimal decision is derived by verifying the concerned NFRs and reducing the decision space. Our approach is implemented through the demonstration of a mobile information system.
Self-adaptive software (SAS) is capable of adjusting its behavior in response to meaningful changes in the operational context and itself. Due to the inherent volatility of the open and changeable environment in which SAS is embedded, the ability of adaptation is highly demanded by many software-intensive systems. Two concerns, i.e., the requirements uncertainty and the context uncertainty are most important among others at Requirements Engineering (RE) stage. However, requirements analyzers can hardly figure out the mathematical relation between requirements, system behavior and context, especially for complex and nonlinear systems, due to the existence of above uncertainties, misunderstanding and ambiguity of prior knowledge. An essential issue to be addressed is how to model and specify these uncertainties at RE stage and how to utilize the prior knowledge to achieve adaptation. In this paper, we propose a fuzzy-based approach to modeling uncertainty and achieving evolution. The approach introduces specifications to describe fuzziness. Based on the specifications, we derive a series of reasoning rules as knowledge base for achieving adaptation and evolution. These two targets are implemented through four reasoning schemas from a control theory perspective. Specifically, forward reasoning schema is used for direct adaptation; backward reasoning schema is used for optimal adaptation. Parameter-identified schema implements learning evolution by considering SAS as the gray-box system, while system-identified reasoning schema implements learning evolution by considering SAS as the gray-box system. The former two schemas function as the control group, while the latter two are de-signed as the experimental groups to illustrate the learning ability. Our approach is implemented under three types of context through the demonstration of a mobile computing application.
Apr 05 2017 cs.NI
The topologies of predictable dynamic networks are continuously dynamic in terms of node position, network connectivity and link metric. However, their dynamics are almost predictable compared with the ad-hoc network. The existing routing protocols specific to static or ad-hoc network do not consider this predictability and thus are not very efficient for some cases. We present a topology model based on Divide-and-Merge methodology to formulate the dynamic topology into the series of static topologies, which can reflect the topology dynamics correctly with the least number of static topologies. Then we design a dynamic programing algorithm to solve that model and determine the timing of routing update and the topology to be used. Besides, for the classic predictable dynamic network---space Internet, the links at some region have shorter delay, which leads to most traffic converge at these links. Meanwhile, the connectivity and metric of these links continuously vary, which results in a great end-to-end path variations and routing updates. In this paper, we propose a stable routing scheme which adds link life-time into its metric to eliminate these dynamics. And then we take use of the Dijkstra's greedy feature to release some paths from the dynamic link, achieving the goal of routing stability. Experimental results show that our method can significantly decrease the number of changed paths and affected network nodes, and then greatly improve the network stability. Interestingly, our method can also achieve better network performance, including the less number of loss packets, smoother variation of end-to-end delay and higher throughput.
Apr 04 2017 cs.SE
Over the last decade, researchers and engineers have developed a vast body of methodologies and technologies in requirements engineering for self-adaptive systems. Although existing studies have explored various aspects of this topic, few of them have categorized and summarized these areas of research in require-ments modeling and analysis. This study aims to investigate the research themes based on the utilized modeling methods and RE activities. We conduct a thematic study in the systematic literature review. The results are derived by synthesizing the extracted data with statistical methods. This paper provides an updated review of the research literature, enabling researchers and practitioners to better understand the research themes in these areas and identify research gaps which need to be further studied.
Apr 04 2017 cs.SE
Self-adaptive systems are capable of adjusting their behavior to cope with the changes in environment and itself. These changes may cause runtime uncertainty, which refers to the system state of failing to achieve appropriate reconfigurations. However, it is often infeasible to exhaustively anticipate all the changes. Thus, providing dynamic adaptation mechanisms for mitigating runtime uncertainty becomes a big challenge. This paper suggests solving this challenge at requirements phase by presenting REDAPT, short for REquirement-Driven adAPTation. We propose an adaptive goal model (AGM) by introducing adaptive elements, specify dynamic properties of AGM by providing logic based grammar, derive adaptation mechanisms with AGM specifications and achieve adaptation by monitoring variables, diagnosing requirements violations, determining reconfigurations and execution. Our approach is demonstrated with an example from the Intelligent Transportation System domain and evaluated through a series of simulation experiments.
Apr 04 2017 cs.SE
Context: Over the last decade, software researchers and engineers have developed a vast body of methodologies and technologies in requirements engineering for self-adaptive systems. Although existing studies have explored various aspects of this field, no systematic study has been performed on summarizing modeling methods and corresponding requirements activities. Objective: This study summarizes the state-of-the-art research trends, details the modeling methods and corresponding requirements activities, identifies relevant quality attributes and application domains and assesses the quality of each study. Method: We perform a systematic literature review underpinned by a rigorously established and reviewed protocol. To ensure the quality of the study, we choose 21 highly regarded publication venues and 8 popular digital libraries. In addition, we apply text mining to derive search strings and use Kappa coefficient to mitigate disagreements of researchers. Results: We selected 109 papers during the period of 2003-2013 and presented the research distributions over various kinds of factors. We extracted 29 modeling methods which are classified into 8 categories and identified 14 requirements activities which are classified into 4 requirements timelines. We captured 8 concerned software quality attributes based on the ISO 9126 standard and 12 application domains. Conclusion: The frequency of application of modeling methods varies greatly. Enterprise models were more widely used while behavior models were more rigorously evaluated. Requirements-driven runtime adaptation was the most frequently studied requirements activity. Activities at runtime were conveyed with more details. Finally, we draw other conclusions by discussing how well modeling dimensions were considered in these modeling methods and how well assurance dimensions were conveyed in requirements activities.
Self-adaptive system (SAS) is capable of adjusting its behavior in response to meaningful changes in the operational context and itself. Due to the inherent volatility of the open and changeable environment in which SAS is embedded, the ability of adaptation is highly demanded by many software-intensive systems. Two concerns, i.e., the requirements uncertainty and the context uncertainty are most important among others. An essential issue to be addressed is how to dynamically adapt non-functional requirements (NFRs) and task configurations of SASs with context uncertainty. In this paper, we propose a model-based fuzzy control approach that is underpinned by the feedforward-feedback control mechanism. This approach identifies and represents NFR uncertainties, task uncertainties and context uncertainties with linguistic variables, and then designs an inference structure and rules for the fuzzy controller based on the relations between the requirements model and the context model. The adaptation of NFRs and task configurations is achieved through fuzzification, inference, defuzzification and readaptation. Our approach is demonstrated with a mobile computing application and is evaluated through a series of simulation experiments.
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given <text, audio> pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-to-sequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.
Mar 27 2017 cs.CV
Recent years have witnessed great success of convolutional neural network (CNN) for various problems both in low and high level visions. Especially noteworthy is the residual network which was originally proposed to handle high-level vision problems and enjoys several merits. This paper aims to extend the merits of residual network, such as skip connection induced fast training, for a typical low-level vision problem, i.e., single image super-resolution. In general, the two main challenges of existing deep CNN for supper-resolution lie in the gradient exploding/vanishing problem and large numbers of parameters or computational cost as CNN goes deeper. Correspondingly, the skip connections or identity mapping shortcuts are utilized to avoid gradient exploding/vanishing problem. In addition, the skip connections have naturally centered the activation which led to better performance. To tackle with the second problem, a lightweight CNN architecture which has carefully designed width, depth and skip connections was proposed. In particular, a strategy of gradually varying the shape of network has been proposed for residual network. Different residual architectures for image super-resolution have also been compared. Experimental results have demonstrated that the proposed CNN model can not only achieve state-of-the-art PSNR and SSIM results for single image super-resolution but also produce visually pleasant results. This paper has extended the mmm 2017 oral conference paper with a considerable new analyses and more experiments especially from the perspective of centering activations and ensemble behaviors of residual network.
Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.
Mar 21 2017 cs.CV
Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e.g. human actions) segments from untrimmed videos is an important step for large-scale video analysis. We propose a novel Temporal Unit Regression Network (TURN) model. There are two salient aspects of TURN: (1) TURN jointly predicts action proposals and refines the temporal boundaries by temporal coordinate regression; (2) Fast computation is enabled by unit feature reuse: a long untrimmed video is decomposed into video units, which are reused as basic building blocks of temporal proposals. TURN outperforms the state-of-the-art methods under average recall (AR) by a large margin on THUMOS-14 and ActivityNet datasets, and runs at over 880 frames per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal generation stage for existing temporal action localization pipelines, it outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.
Mar 16 2017 cs.CL
This paper proposes an approach for applying GANs to NMT. We build a conditional sequence generative adversarial net which comprises of two adversarial sub models, a generator and a discriminator. The generator aims to generate sentences which are hard to be discriminated from human-translated sentences ( i.e., the golden target sentences); And the discriminator makes efforts to discriminate the machine-generated sentences from human-translated ones. The two sub models play a mini-max game and achieve the win-win situation when they reach a Nash Equilibrium. Additionally, the static sentence-level BLEU is utilized as the reinforced objective for the generator, which biases the generation towards high BLEU points. During training, both the dynamic discriminator and the static BLEU objective are employed to evaluate the generated sentences and feedback the evaluations to guide the learning of the generator. Experimental results show that the proposed model consistently outperforms the traditional RNNSearch and the newly emerged state-of-the-art Transformer on English-German and Chinese-English translation tasks.
Mar 09 2017 cs.CL
Training recurrent neural networks to model long term dependencies is difficult. Hence, we propose to use external linguistic knowledge as an explicit signal to inform the model which memories it should utilize. Specifically, external knowledge is used to augment a sequence with typed edges between arbitrarily distant elements, and the resulting graph is decomposed into directed acyclic subgraphs. We introduce a model that encodes such graphs as explicit memory in recurrent neural networks, and use it to model coreference relations in text. We apply our model to several text comprehension tasks and achieve new state-of-the-art results on all considered benchmarks, including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out of the 20 tasks with only 1000 training examples per task. Analysis of the learned representations further demonstrates the ability of our model to encode fine-grained entity information across a document.
In this paper, we propose a faster-than-Nyquist (FTN) non-orthogonal frequency-division multiplexing (NOFDM) scheme for visible light communications (VLC) where the multiplexing/demultiplexing employs the inverse fractional cosine transform (IFrCT)/FrCT. Different to the common fractional Fourier transform-based NOFDM (FrFT-NOFDM) signal, FrCT-based NOFDM (FrCT-NOFDM) signal is real-valued which can be directly applied to the VLC systems without the expensive upconversion. Thus, FrCT-NOFDM is more suitable for the cost-sensitive VLC systems. Meanwhile, under the same transmission rate, FrCT-NOFDM signal occupies smaller bandwidth compared to OFDM signal. When the bandwidth compression factor $\alpha$ is set to $0.8$, $20\%$ bandwidth saving can be obtained. Therefore, FrCT-NOFDM has higher spectral efficiency and suffers less high-frequency distortion compared to OFDM, which benefits the bandwidth-limited VLC systems. As the simulation results show, bit error rate (BER) performance of FrCT-NOFDM with $\alpha$ of $0.9$ or $0.8$ is better than that of OFDM. Moreover, FrCT-NOFDM has a superior security performance. In conclusion, FrCT-NOFDM shows great potential for application in the future VLC systems.
Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain. This paper aims at generating plausible natural language sentences, whose attributes are dynamically controlled by learning disentangled latent representations with designated semantics. We propose a new neural generative model which combines variational auto-encoders and holistic attribute discriminators for effective imposition of semantic structures. With differentiable approximation to discrete text samples, explicit constraints on independent attribute controls, and efficient collaborative learning of generator and discriminators, our model learns highly interpretable representations from even only word annotations, and produces realistic sentences with desired attributes. Quantitative evaluation validates the accuracy of sentence and attribute generation.
Mar 03 2017 cs.CL
Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we propose structural embedding of syntactic trees (SEST), an algorithm framework to utilize structured information and encode them into vector representations that can boost the performance of algorithms for the machine comprehension. We evaluate our approach using a state-of-the-art neural attention model on the SQuAD dataset. Experimental results demonstrate that our model can accurately identify the syntactic boundaries of the sentences and extract answers that are syntactically coherent over the baseline methods.
Mar 03 2017 cs.SE
Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.
Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder's dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.
Feb 28 2017 cs.AI
We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.
We study the extent to which we can infer users' geographical locations from social media. Location inference from social media can benefit many applications, such as disaster management, targeted advertising, and news content tailoring. In recent years, a number of algorithms have been proposed for identifying user locations on social media platforms such as Twitter and Facebook from message contents, friend networks, and interactions between users. In this paper, we propose a novel probabilistic model based on factor graphs for location inference that offers several unique advantages for this task. First, the model generalizes previous methods by incorporating content, network, and deep features learned from social context. The model is also flexible enough to support both supervised learning and semi-supervised learning. Second, we explore several learning algorithms for the proposed model, and present a Two-chain Metropolis-Hastings (MH+) algorithm, which improves the inference accuracy. Third, we validate the proposed model on three different genres of data - Twitter, Weibo, and Facebook - and demonstrate that the proposed model can substantially improve the inference accuracy (+3.3-18.5% by F1-score) over that of several state-of-the-art methods.
We study the problem of semi-supervised question answering----utilizing unlabeled text to boost the performance of question answering models. We propose a novel training framework, the Generative Domain-Adaptive Nets. In this framework, we train a generative model to generate questions based on the unlabeled text, and combine model-generated questions with human-generated questions for training question answering models. We develop novel domain adaptation algorithms, based on reinforcement learning, to alleviate the discrepancy between the model-generated data distribution and the human-generated data distribution. Experiments show that our proposed framework obtains substantial improvement from unlabeled text.
Feb 07 2017 cs.AI
With the advent of modern computer networks, fault diagnosis has been a focus of research activity. This paper reviews the history of fault diagnosis in networks and discusses the main methods in information gathering section, information analyzing section and diagnosing and revolving section of fault diagnosis in networks. Emphasis will be placed upon knowledge-based methods with discussing the advantages and shortcomings of the different methods. The survey is concluded with a description of some open problems.
In this paper, we examine the physical layer security for cooperative wireless networks with multiple intermediate nodes, where the decode-and-forward (DF) protocol is considered. We propose a new joint relay and jammer selection (JRJS) scheme for protecting wireless communications against eavesdropping, where an intermediate node is selected as the relay for the sake of forwarding the source signal to the destination and meanwhile, the remaining intermediate nodes are employed to act as friendly jammers which broadcast the artificial noise for disturbing the eavesdropper. We further investigate the power allocation among the source, relay and friendly jammers for maximizing the secrecy rate of proposed JRJS scheme and derive a closed-form sub-optimal solution. Specificially, all the intermediate nodes which successfully decode the source signal are considered as relay candidates. For each candidate, we derive the sub-optimal closed-form power allocation solution and obtain the secrecy rate result of the corresponding JRJS scheme. Then, the candidate which is capable of achieving the highest secrecy rate is selected as the relay. Two assumptions about the channel state information (CSI), namely the full CSI (FCSI) and partial CSI (PCSI), are considered. Simulation results show that the proposed JRJS scheme outperforms the conventional pure relay selection, pure jamming and GSVD based beamforming schemes in terms of secrecy rate. Additionally, the proposed FCSI based power allocation (FCSI-PA) and PCSI based power allocation (PCSI-PA) schemes both achieve higher secrecy rates than the equal power allocation (EPA) scheme.
Modern statistical machine learning (SML) methods share a major limitation with the early approaches to AI: there is no scalable way to adapt them to new domains. Human learning solves this in part by leveraging a rich, shared, updateable world model. Such scalability requires modularity: updating part of the world model should not impact unrelated parts. We have argued that such modularity will require both "correctability" (so that errors can be corrected without introducing new errors) and "interpretability" (so that we can understand what components need correcting). To achieve this, one could attempt to adapt state of the art SML systems to be interpretable and correctable; or one could see how far the simplest possible interpretable, correctable learning methods can take us, and try to control the limitations of SML methods by applying them only where needed. Here we focus on the latter approach and we investigate two main ideas: "Teacher Assisted Learning", which leverages crowd sourcing to learn language; and "Factored Dialog Learning", which factors the process of application development into roles where the language competencies needed are isolated, enabling non-experts to quickly create new applications. We test these ideas in an "Automated Personal Assistant" (APA) setting, with two scenarios: that of detecting user intent from a user-APA dialog; and that of creating a class of event reminder applications, where a non-expert "teacher" can then create specific apps. For the intent detection task, we use a dataset of a thousand labeled utterances from user dialogs with Cortana, and we show that our approach matches state of the art SML methods, but in addition provides full transparency: the whole (editable) model can be summarized on one human-readable page. For the reminder app task, we ran small user studies to verify the efficacy of the approach.
In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.
Faster-than-Nyquist (FTN) signal achieves higher spectral efficiency and capacity compared to Nyquist signal due to its smaller pulse interval or narrower subcarrier spacing. Shannon limit typically defines the upper-limit capacity of Nyquist signal. To the best of our knowledge, the mathematical expression for the capacity limit of FTN non-orthogonal frequency-division multiplexing (NOFDM) signal is first demonstrated in this paper. The mathematical expression shows that FTN NOFDM signal has the potential to achieve a higher capacity limit compared to Nyquist signal. In this paper, we demonstrate the principle of FTN NOFDM by taking fractional cosine transform-based NOFDM (FrCT-NOFDM) for instance. FrCT-NOFDM is first proposed and implemented by both simulation and experiment. When the bandwidth compression factor $\alpha$ is set to $0.8$ in FrCT-NOFDM, the subcarrier spacing is equal to $40\%$ of the symbol rate per subcarrier, thus the transmission rate is about $25\%$ faster than Nyquist rate. FTN NOFDM with higher capacity would be promising in the future communication systems, especially in the bandwidth-limited applications.
Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension. We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words. We also extend the idea of fine-grained gating to modeling the interaction between questions and paragraphs for reading comprehension. Experiments show that our approach can improve the performance on reading comprehension tasks, achieving new state-of-the-art results on the Children's Book Test dataset. To demonstrate the generality of our gating mechanism, we also show improved results on a social media tag prediction task.
Nov 08 2016 cs.CL
We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention.