In this paper, we consider the network utility maximization problem with various user priorities via jointly optimizing user association, load distribution and power control in a load-coupled heterogeneous network. In order to tackle the nonconvexity of the problem, we first analyze the problem by obtaining the optimal resource allocation strategy in closed form and characterizing the optimal base station load distribution pattern. Both observations are shown essential in simplifying the original problem and making it possible to transform the nonconvex load distribution and power control problem into convex reformulation via exponential variable transformation. An iterative algorithm with low complexity is accordingly presented to obtain a suboptimal solution to the joint optimization problem. Simulation results show that the proposed algorithm achieves better performance than conventional approaches.
This letter investigates the power control and channel assignment problem in device-to-device (D2D) communications underlaying a non-orthogonal multiple access (NOMA) cellular network. With the successive interference cancellation decoding order constraints, our target is to maximize the sum rate of D2D pairs while guaranteeing the minimum rate requirements of NOMA-based cellular users. Specifically, the optimal conditions for power control of cellular users on each subchannel are derived first. Then, based on these results, we propose a dual-based iterative algorithm to solve the resource allocation problem. Simulation results validate the superiority of proposed resource allocation algorithm over the existing orthogonal multiple access scheme.
Oct 11 2017 cs.CV
The timely provision of traffic sign information to drivers is essential for the drivers to respond, to ensure safe driving, and to avoid traffic accidents in a timely manner. We proposed a timely visual recognizability quantitative evaluation method for traffic signs in large-scale transportation environments. To achieve this goal, we first address the concept of a visibility field to reflect the visible distribution of three-dimensional (3D) space and construct a traffic sign Visibility Evaluation Model (VEM) to measure the traffic sign visibility for a given viewpoint. Then, based on the VEM, we proposed the concept of the Visual Recognizability Field (VRF) to reflect the visual recognizability distribution in 3D space and established a Visual Recognizability Evaluation Model (VREM) to measure a traffic sign visual recognizability for a given viewpoint. Next, we proposed a Traffic Sign Timely Visual Recognizability Evaluation Model (TSTVREM) by combining VREM, the actual maximum continuous visual recognizable distance, and traffic big data to measure a traffic sign visual recognizability in different lanes. Finally, we presented an automatic algorithm to implement the TSTVREM model through traffic sign and road marking detection and classification, traffic sign environment point cloud segmentation, viewpoints calculation, and TSTVREM model realization. The performance of our method for traffic sign timely visual recognizability evaluation is tested on three road point clouds acquired by a mobile laser scanning system (RIEGL VMX-450) according to Road Traffic Signs and Markings (GB 5768-1999 in China), showing that our method is feasible and efficient.
Oct 10 2017 cs.CV
For Hyperspectral image (HSI) datasets, each class have their salient feature and classifiers classify HSI datasets according to the class's saliency features, however, there will be different salient features when use different normalization method. In this letter, we report the effect on classifiers by different normalization methods and recommend the best normalization methods for classifier after analyzing the impact of different normalization methods on classifiers. Pavia University datasets, Indian Pines datasets and Kennedy Space Center datasets will apply to several typical classifiers in order to evaluate and analysis the impact of different normalization methods on typical classifiers.
Sep 20 2017 cs.NE
Research on the performance of recycled concrete as building material in the current world is an important subject. Given the complex composition of recycled concrete, conventional methods for forecasting slump scarcely obtain satisfactory results. Based on theory of nonlinear prediction method, we propose a recycled concrete slump prediction model based on geometric semantic genetic programming (GSGP) and combined it with recycled concrete features. Tests show that the model can accurately predict the recycled concrete slump by using the established prediction model to calculate the recycled concrete slump with different mixing ratios in practical projects and by comparing the predicted values with the experimental values. By comparing the model with several other nonlinear prediction models, we can conclude that GSGP has higher accuracy and reliability than conventional methods.
There is a special type of text which the order of the rows makes no difference (e.g., a word list). To compress these special texts, the traditional lossless compression method is not the ideal choice. A new method that can achieve better compression results for this type of texts is proposed. The texts are pre-processed by a method named SSE and are then compressed through the traditional lossless compression method. Comparison shows that an improved compression result is achieved.
Sep 13 2017 cs.CV
Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the op-timized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to effi-ciently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. The variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. Third, the spatial information is extracted using the weighted compo-site features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches.
Sep 12 2017 cs.CV
In this letter, to break the limit of the traditional linear models for SAR image despeckling, we propose a novel deep learning approach by learning a non-linear end-to-end mapping between the noisy and clean SAR images with a dilated residual network (SAR-DRN). SAR-DRN is based on dilated convolutions, which can both enlarge the receptive field and maintain the filter size and layer depth with a lightweight structure. In addition, skip connections are added to the despeckling model to reduce the vanishing gradient problem. Compared with the traditional despeckling methods, the proposed method shows superior performance over the state-of-the-art methods on both quantitative and visual assessments, especially for strong speckle noise.
Sep 11 2017 cs.CV
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
As a new machine learning approach, extreme learning machine (ELM) has received wide attentions due to its good performances. However, when directly applied to the hyperspectral image (HSI) classification, the recognition rate is too low. This is because ELM does not use the spatial information which is very important for HSI classification. In view of this, this paper proposes a new framework for spectral-spatial classification of HSI by combining ELM with loopy belief propagation (LBP). The original ELM is linear, and the nonlinear ELMs (or Kernel ELMs) are the improvement of linear ELM (LELM). However, based on lots of experiments and analysis, we found out that the LELM is a better choice than nonlinear ELM for spectral-spatial classification of HSI. Furthermore, we exploit the marginal probability distribution that uses the whole information in the HSI and learn such distribution using the LBP. The proposed method not only maintain the fast speed of ELM, but also greatly improves the accuracy of classification. The experimental results in the well-known HSI data sets, Indian Pines and Pavia University, demonstrate the good performances of the proposed method.
Motivated by statistical physics models connected to computation problems, we devise a tensor network technique that is suited to problems with or without translation invariance and with arbitrary boundary conditions. We introduce a compression-decimation algorithm as an efficient iterative scheme to optimize tensor networks that encode generalized vertex models on regular lattices. The algorithm first propagates local constraints to longer ranges via repeated contraction-decomposition sweeps over all lattice bonds, thus achieving compression on a given length scale. It then decimates the lattice via coarse-graining tensor contractions. Repeated iterations of these two steps allow us to gradually collapse the tensor network while keeping the tensor dimensions under control, such that ultimately the full tensor trace can be taken for relatively large systems. As a benchmark, we demonstrate the efficiency of the algorithm by computing the ground state entropy density of the planar ice model and the eight-vertex model. We then apply it to reversible classical computational problems based on a recently proposed vertex model representation of classical computations [Nat. Commun. 8, 15303 (2017)]. Our protocol allows us to obtain the exact number of solutions for computations where a naive enumeration would take astronomically long times, suggesting that the algorithm is a promising practical tool for the solution of a plethora of problems in physics and computer science.
Genetic programming has been widely used in the engineering field. Compared with the conventional genetic programming and artificial neural network, geometric semantic genetic programming (GSGP) is superior in astringency and computing efficiency. In this paper, GSGP is adopted for the classification and regression analysis of a sample dataset. Furthermore, a model for slope stability analysis is established on the basis of geometric semantics. According to the results of the study based on GSGP, the method can analyze slope stability objectively and is highly precise in predicting slope stability and safety factors. Hence, the predicted results can be used as a reference for slope safety design.
Distributed machine learning algorithms enable processing of datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional indeed happen in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional distributed learning tasks. In this paper, two variants of an algorithm termed Byzantine-resilient distributed coordinate descent (ByRDiE) are developed and analyzed that solve distributed learning problems in the presence of Byzantine failures. Theoretical analysis as well as numerical experiments presented in the paper highlight the usefulness of ByRDiE for high-dimensional distributed learning in the presence of Byzantine failures.
Hierarchical organization is an important, prevalent characteristic of complex systems; in order to understand their organization, the study of the underlying (generally complex) networks that describe the interactions between their constituents plays a central role. Numerous previous works have shown that many real-world networks in social, biologic and technical systems present hierarchical organization, often in the form of a hierarchy of community structures. Many artificial benchmark graphs have been proposed in order to test different community detection methods, but no benchmark has been developed to throughly test the detection of hierarchical community structures. In this study, we fill this vacancy by extending the Lancichinetti-Fortunato-Radicchi (LFR) ensemble of benchmark graphs, adopting the rule of constructing hierarchical networks proposed by Ravasz and Barabási. We employ this benchmark to test three of the most popular community detection algorithms, and quantify their accuracy using the traditional Mutual Information and the recently introduced Hierarchical Mutual Information. The results indicate that the Ravasz-Barabási-Lancichinetti-Fortunato-Radicchi (RB-LFR) benchmark generates a complex hierarchical structure constituting a challenging benchmark for the considered community detection methods.
In a device-to-device (D2D) underlaid massive MIMO system, D2D transmitters reuse the uplink spectrum of cellular users (CUs), leading to cochannel interference. To decrease pilot overhead, we assume pilot reuse (PR) among D2D pairs. We first derive the minimum-mean-square-error (MMSE) estimation of all channels and give a lower bound on the ergodic achievable rate of both cellular and D2D links. To mitigate pilot contamination caused by PR, we then propose a pilot scheduling and pilot power control algorithm based on the criterion of minimizing the sum mean-square-error (MSE) of channel estimation of D2D links. We show that, with an appropriate PR ratio and a well designed pilot scheduling scheme, each D2D transmitter could transmit its pilot with maximum power. In addition, we also maximize the sum rate of all D2D links while guaranteeing the quality of service (QoS) of CUs, and develop an iterative algorithm to obtain a suboptimal solution. Simulation results show that the effect of pilot contamination can be greatly decreased by the proposed pilot scheduling algorithm, and the PR scheme provides significant performance gains over the conventional orthogonal training scheme in terms of system spectral efficiency.
Aug 02 2017 cs.CV
In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos. It is an important and challenging task as finding accurate human actions in both temporal and spatial space is important for analyzing large-scale video data. To tackle this problem, we propose a cascade proposal and location anticipation (CPLA) model for frame-level action detection. There are several salient points of our model: (1) a cascade region proposal network (casRPN) is adopted for action proposal generation and shows better localization accuracy compared with single region proposal network (RPN); (2) action spatio-temporal consistencies are exploited via a location anticipation network (LAN) and thus frame-level action detection is not conducted independently. Frame-level detections are then linked by solving an linking score maximization problem, and temporally trimmed into spatio-temporal action tubes. We demonstrate the effectiveness of our model on the challenging UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.
In this paper, we consider the problems of minimizing sum power and maximizing sum rate for multi-cell networks with non-orthogonal multiple access (NOMA). For sum power minimization, we transform it into an equivalent linear problem with fewer variables and obtain the optimal power allocation strategy for users in closed-form expression. To solve the noconvex sum rate maximization problem, we prove that the power allocation problem for a single cell is a convex problem. Further, by analyzing the Karush-Kuhn-Tucker (KKT) conditions, we reveal that the optimal power allocation strategy for each base station (BS) is to allocate additional power to its served user with the highest channel gain, while other users served by this BS are allocated with minimum power to maintain their rate requirements. Based on this observation, the original sum rate maximization problem can be simplified to an equivalent problem with variables in dimension of the total number of BSs. It is shown that the objective function of the simplified problem can be readily rewritten as a minimization of a difference of convex functions (DC). By using this representation, DC programming approach is applied to transform and approximate the simplified problem to convex optimization problems. By solving this set of approximately convex problems iteratively, a suboptimal solution to the sum rate maximization problem can be obtained. Numerical results illustrate the theoretical findings, showing the superiority of our solutions compared to orthogonal frequency multiple division multiple access (OFDMA).
Jul 18 2017 cs.CV
Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame's representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.
In this paper, we consider the problems of minimizing sum power and maximizing sum rate for multi-cell networks with load coupling, where coupling relation occurs among cells due to inter-cell interference. This coupling relation is characterized by the signal-to-interference-and-noise-ratio (SINR) coupling model with cell load vector and cell power vector as the variables. Due to the nonlinear SINR coupling model, the optimization problems for multi-cell networks with load coupling is nonconvex. To solve these nonconvex problems, we first consider the optimization problems for single-cell networks. Through variable transformations, the optimization problems can be equivalently transformed into convex problems. By solving the Karush-Kuhn-Tucker (KKT), the optimal solutions to power minimization and rate maximization problems can be obtained in closed form. Based on the theoretical findings of optimization problems for single-cell networks, we develop a distributed time allocation and power control algorithm with low complexity for sum power minimization in multi-cell networks. This algorithm is proved to be convergent and globally optimal by using the properties of standard interference function. For sum rate optimization in multi-cell networks, we also provide a distributed algorithm which yields suboptimal solution. Besides, the convergence for this distributed algorithm is proved. Numerical results illustrate the theoretical findings, showing the superiority of our solutions compared to the conventional solution of allocating uniform power for users in the same cell.
Jun 27 2017 cs.DS
We present a novel sparsity-based space-time adaptive processing (STAP) technique based on the alternating direction method to overcome the severe performance degradation caused by array gain/phase (GP) errors. The proposed algorithm reformulates the STAP problem as a joint optimization problem of the spatio-Doppler profile and GP errors in both single and multiple snapshots, and introduces a target detector using the reconstructed spatio-Doppler profiles. Simulations are conducted to illustrate the benefits of the proposed algorithm.
Jun 27 2017 cs.SE
Self-adaptive software is considered as the most advanced approach and its development attracts a lot of attention. Decentralization is an effective way to design and manage the complexity of modern self-adaptive software systems. However, there are still tremendous challenges. One major challenge is to unify decentrality with traditional self-adaptive implementation framework during design and implementation activity. One is to guarantee the required global goals and performance of decentralized self-adaptive systems operating in highly dynamic and uncertain environments. Another challenge is to predict the influence of system's internal change on its self-adaptability to the environment. To solve these problems, we combine the mechanisms of separation of concerns with modeling method using timed automata to allow the system to be analyzed and verified. Timed computation tree logic is used to specify system goals and stochastic simulations in dynamic environment are experimented to verify decentralized self-adaptive system's adaptation properties. In this paper, we extracted a motivation example from practical applications in UAV emergency mission scenarios. The whole approach is evaluated and illustrated with this motivation example and the statistical results can be used as reference for arrangement planning of UAVs in cyber physical spaces.
Jun 09 2017 cs.CV
Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that automatically learns to describe the content of images. Our model consists of two sub-models: an object detection and localization model, which extract the information of objects and their spatial relationship in images respectively; Besides, a deep recurrent neural network (RNN) based on long short-term memory (LSTM) units with attention mechanism for sentences generation. Each word of the description will be automatically aligned to different objects of the input image when it is generated. This is similar to the attention mechanism of the human visual system. Experimental results on the COCO dataset showcase the merit of the proposed method, which outperforms previous benchmark models.
Deep generative models have achieved impressive success in recent years. Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as powerful frameworks for deep generative model learning, have largely been considered as two distinct paradigms and received extensive independent study respectively. This paper establishes formal connections between deep generative modeling approaches through a new formulation of GANs and VAEs. We show that GANs and VAEs are essentially minimizing KL divergences of respective posterior and inference distributions with opposite directions, extending the two learning phases of classic wake-sleep algorithm, respectively. The unified view provides a powerful tool to analyze a diverse set of existing model variants, and enables to exchange ideas across research lines in a principled way. For example, we transfer the importance weighting method in VAE literatures for improved GAN learning, and enhance VAEs with an adversarial mechanism for leveraging generated samples. Quantitative experiments show generality and effectiveness of the imported extensions.
In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets.
May 25 2017 cs.CV
Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted to CNN to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network(DREN) consisting of cycle layers, isotonic layers and decycle layers.Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures. Our codes are released on GitHub.
Inferring the relations between two images is an important class of tasks in computer vision. Examples of such tasks include computing optical flow and stereo disparity. We treat the relation inference tasks as a machine learning problem and tackle it with neural networks. A key to the problem is learning a representation of relations. We propose a new neural network module, contrast association unit (CAU), which explicitly models the relations between two sets of input variables. Due to the non-negativity of the weights in CAU, we adopt a multiplicative update algorithm for learning these weights. Experiments show that neural networks with CAUs are more effective in learning five fundamental image transformations than conventional neural networks.
May 08 2017 cs.CV
This paper focuses on temporal localization of actions in untrimmed videos. Existing methods typically train classifiers for a pre-defined list of actions and apply them in a sliding window fashion. However, activities in the wild consist of a wide combination of actors, actions and objects; it is difficult to design a proper activity list that meets users' needs. We propose to localize activities by natural language queries. Temporal Activity Localization via Language (TALL) is challenging as it requires: (1) suitable design of text and video representations to allow cross-modal matching of actions and language queries; (2) ability to locate actions accurately given features from sliding windows of limited granularity. We propose a novel Cross-modal Temporal Regression Localizer (CTRL) to jointly model text query and video clips, output alignment scores and action boundary regression results for candidate clips. For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA. We also build complex sentence queries in Charades-STA for test. Experimental results show that CTRL outperforms previous methods significantly on both datasets.
May 04 2017 cs.CV
Temporal action detection in long videos is an important problem. State-of-the-art methods address this problem by applying action classifiers on sliding windows. Although sliding windows may contain an identifiable portion of the actions, they may not necessarily cover the entire action instance, which would lead to inferior performance. We adapt a two-stage temporal action detection pipeline with Cascaded Boundary Regression (CBR) model. Class-agnostic proposals and specific actions are detected respectively in the first and the second stage. CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows. The salient aspect of the refinement process is that, inside each stage, the temporal boundaries are adjusted in a cascaded way by feeding the refined windows back to the system for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and achieve state-of-the-art performance on both datasets. The performance gain is especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14 is improved from 19.0% to 31.0%.
Apr 24 2017 cs.CV
The use of color in QR codes brings extra data capacity, but also inflicts tremendous challenges on the decoding process due to chromatic distortion, cross-channel color interference and illumination variation. Particularly, we further discover a new type of chromatic distortion in high-density color QR codes, cross-module color interference, caused by the high density which also makes the geometric distortion correction more challenging. To address these problems, we propose two approaches, namely, LSVM-CMI and QDA-CMI, which jointly model these different types of chromatic distortion. Extended from SVM and QDA, respectively, both LSVM-CMI and QDA-CMI optimize over a particular objective function to learn a color classifier. Furthermore, a robust geometric transformation method is proposed to accurately correct the geometric distortion for high-density color QR codes. We put forth and implement a framework for high-capacity color QR codes equipped with our methods, called HiQ. To evaluate the performance of HiQ, we collect a challenging large-scale color QR code dataset, CUHK-CQRC, which consists of 5390 high-density color QR code samples. The comparison with the baseline method  on CUHK-CQRC shows that HiQ at least outperforms  by 188% in decoding success rate and 60% in bit error rate. Our implementation of HiQ in iOS and Android also demonstrates the effectiveness of our framework in real-world applications.
Apr 05 2017 cs.SE
Self-adaptive systems (SASs) are capable of adjusting its behavior in response to meaningful changes in the operational con-text and itself. The adaptation needs to be performed automatically through self-managed reactions and decision-making processes at runtime. To support this kind of automatic behavior, SASs must be endowed by a rich runtime support that can detect requirements violations and reason about adaptation decisions. Requirements Engineering for SASs primarily aims to model adaptation logic and mechanisms. Requirements models will guide the design decisions and runtime behaviors of sys-tem-to-be. This paper proposes a model-driven approach for achieving adaptation against non-functional requirements (NFRs), i.e. reliability and performances. The approach begins with the models in RE stage and provides runtime support for self-adaptation. We capture adaptation mechanisms as graphical elements in the goal model. By assigning reliability and performance attributes to related system tasks, we derive the tagged sequential diagram for specifying the reliability and performances of system behaviors. To formalize system behavior, we transform the requirements model to the corresponding behavior model, expressed by Label Transition Systems (LTS). To analyze the reliability requirements and performance requirements, we merged the sequential diagram and LTS to a variable Discrete-Time Markov Chains (DTMC) and a variable Continuous-Time Markov Chains (CTMC) respectively. Adaptation candidates are characterized by the variable states. The optimal decision is derived by verifying the concerned NFRs and reducing the decision space. Our approach is implemented through the demonstration of a mobile information system.
Self-adaptive software (SAS) is capable of adjusting its behavior in response to meaningful changes in the operational context and itself. Due to the inherent volatility of the open and changeable environment in which SAS is embedded, the ability of adaptation is highly demanded by many software-intensive systems. Two concerns, i.e., the requirements uncertainty and the context uncertainty are most important among others at Requirements Engineering (RE) stage. However, requirements analyzers can hardly figure out the mathematical relation between requirements, system behavior and context, especially for complex and nonlinear systems, due to the existence of above uncertainties, misunderstanding and ambiguity of prior knowledge. An essential issue to be addressed is how to model and specify these uncertainties at RE stage and how to utilize the prior knowledge to achieve adaptation. In this paper, we propose a fuzzy-based approach to modeling uncertainty and achieving evolution. The approach introduces specifications to describe fuzziness. Based on the specifications, we derive a series of reasoning rules as knowledge base for achieving adaptation and evolution. These two targets are implemented through four reasoning schemas from a control theory perspective. Specifically, forward reasoning schema is used for direct adaptation; backward reasoning schema is used for optimal adaptation. Parameter-identified schema implements learning evolution by considering SAS as the gray-box system, while system-identified reasoning schema implements learning evolution by considering SAS as the gray-box system. The former two schemas function as the control group, while the latter two are de-signed as the experimental groups to illustrate the learning ability. Our approach is implemented under three types of context through the demonstration of a mobile computing application.
Apr 05 2017 cs.NI
The topologies of predictable dynamic networks are continuously dynamic in terms of node position, network connectivity and link metric. However, their dynamics are almost predictable compared with the ad-hoc network. The existing routing protocols specific to static or ad-hoc network do not consider this predictability and thus are not very efficient for some cases. We present a topology model based on Divide-and-Merge methodology to formulate the dynamic topology into the series of static topologies, which can reflect the topology dynamics correctly with the least number of static topologies. Then we design a dynamic programing algorithm to solve that model and determine the timing of routing update and the topology to be used. Besides, for the classic predictable dynamic network---space Internet, the links at some region have shorter delay, which leads to most traffic converge at these links. Meanwhile, the connectivity and metric of these links continuously vary, which results in a great end-to-end path variations and routing updates. In this paper, we propose a stable routing scheme which adds link life-time into its metric to eliminate these dynamics. And then we take use of the Dijkstra's greedy feature to release some paths from the dynamic link, achieving the goal of routing stability. Experimental results show that our method can significantly decrease the number of changed paths and affected network nodes, and then greatly improve the network stability. Interestingly, our method can also achieve better network performance, including the less number of loss packets, smoother variation of end-to-end delay and higher throughput.
Apr 04 2017 cs.SE
Over the last decade, researchers and engineers have developed a vast body of methodologies and technologies in requirements engineering for self-adaptive systems. Although existing studies have explored various aspects of this topic, few of them have categorized and summarized these areas of research in require-ments modeling and analysis. This study aims to investigate the research themes based on the utilized modeling methods and RE activities. We conduct a thematic study in the systematic literature review. The results are derived by synthesizing the extracted data with statistical methods. This paper provides an updated review of the research literature, enabling researchers and practitioners to better understand the research themes in these areas and identify research gaps which need to be further studied.
Apr 04 2017 cs.SE
Self-adaptive systems are capable of adjusting their behavior to cope with the changes in environment and itself. These changes may cause runtime uncertainty, which refers to the system state of failing to achieve appropriate reconfigurations. However, it is often infeasible to exhaustively anticipate all the changes. Thus, providing dynamic adaptation mechanisms for mitigating runtime uncertainty becomes a big challenge. This paper suggests solving this challenge at requirements phase by presenting REDAPT, short for REquirement-Driven adAPTation. We propose an adaptive goal model (AGM) by introducing adaptive elements, specify dynamic properties of AGM by providing logic based grammar, derive adaptation mechanisms with AGM specifications and achieve adaptation by monitoring variables, diagnosing requirements violations, determining reconfigurations and execution. Our approach is demonstrated with an example from the Intelligent Transportation System domain and evaluated through a series of simulation experiments.
Apr 04 2017 cs.SE
Context: Over the last decade, software researchers and engineers have developed a vast body of methodologies and technologies in requirements engineering for self-adaptive systems. Although existing studies have explored various aspects of this field, no systematic study has been performed on summarizing modeling methods and corresponding requirements activities. Objective: This study summarizes the state-of-the-art research trends, details the modeling methods and corresponding requirements activities, identifies relevant quality attributes and application domains and assesses the quality of each study. Method: We perform a systematic literature review underpinned by a rigorously established and reviewed protocol. To ensure the quality of the study, we choose 21 highly regarded publication venues and 8 popular digital libraries. In addition, we apply text mining to derive search strings and use Kappa coefficient to mitigate disagreements of researchers. Results: We selected 109 papers during the period of 2003-2013 and presented the research distributions over various kinds of factors. We extracted 29 modeling methods which are classified into 8 categories and identified 14 requirements activities which are classified into 4 requirements timelines. We captured 8 concerned software quality attributes based on the ISO 9126 standard and 12 application domains. Conclusion: The frequency of application of modeling methods varies greatly. Enterprise models were more widely used while behavior models were more rigorously evaluated. Requirements-driven runtime adaptation was the most frequently studied requirements activity. Activities at runtime were conveyed with more details. Finally, we draw other conclusions by discussing how well modeling dimensions were considered in these modeling methods and how well assurance dimensions were conveyed in requirements activities.
Self-adaptive system (SAS) is capable of adjusting its behavior in response to meaningful changes in the operational context and itself. Due to the inherent volatility of the open and changeable environment in which SAS is embedded, the ability of adaptation is highly demanded by many software-intensive systems. Two concerns, i.e., the requirements uncertainty and the context uncertainty are most important among others. An essential issue to be addressed is how to dynamically adapt non-functional requirements (NFRs) and task configurations of SASs with context uncertainty. In this paper, we propose a model-based fuzzy control approach that is underpinned by the feedforward-feedback control mechanism. This approach identifies and represents NFR uncertainties, task uncertainties and context uncertainties with linguistic variables, and then designs an inference structure and rules for the fuzzy controller based on the relations between the requirements model and the context model. The adaptation of NFRs and task configurations is achieved through fuzzification, inference, defuzzification and readaptation. Our approach is demonstrated with a mobile computing application and is evaluated through a series of simulation experiments.
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given <text, audio> pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-to-sequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.
Mar 27 2017 cs.CV
Recent years have witnessed great success of convolutional neural network (CNN) for various problems both in low and high level visions. Especially noteworthy is the residual network which was originally proposed to handle high-level vision problems and enjoys several merits. This paper aims to extend the merits of residual network, such as skip connection induced fast training, for a typical low-level vision problem, i.e., single image super-resolution. In general, the two main challenges of existing deep CNN for supper-resolution lie in the gradient exploding/vanishing problem and large amount of parameters or computational cost as CNN goes deeper. Correspondingly, the skip connections or identity mapping shortcuts are utilized to avoid gradient exploding/vanishing problem. To tackle with the second problem, a parameter economic CNN architecture which has carefully designed width, depth and skip connections was proposed. Different residual-like architectures for image superresolution has also been compared. Experimental results have demonstrated that the proposed CNN model can not only achieve state-of-the-art PSNR and SSIM results for single image super-resolution but also produce visually pleasant results. This paper has extended the mmm 2017 paper with more experiments and explanations.
Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.
Mar 21 2017 cs.CV
Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e.g. human actions) segments from untrimmed videos is an important step for large-scale video analysis. We propose a novel Temporal Unit Regression Network (TURN) model. There are two salient aspects of TURN: (1) TURN jointly predicts action proposals and refines the temporal boundaries by temporal coordinate regression; (2) Fast computation is enabled by unit feature reuse: a long untrimmed video is decomposed into video units, which are reused as basic building blocks of temporal proposals. TURN outperforms the state-of-the-art methods under average recall (AR) by a large margin on THUMOS-14 and ActivityNet datasets, and runs at over 880 frames per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal generation stage for existing temporal action localization pipelines, it outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.
Mar 16 2017 cs.CL
This paper proposes a new route for applying the generative adversarial nets (GANs) to NLP tasks (taking the neural machine translation as an instance) and the widespread perspective that GANs can't work well in the NLP area turns out to be unreasonable. In this work, we build a conditional sequence generative adversarial net which comprises of two adversarial sub models, a generative model (generator) which translates the source sentence into the target sentence as the traditional NMT models do and a discriminative model (discriminator) which discriminates the machine-translated target sentence from the human-translated sentence. From the perspective of Turing test, the proposed model is to generate the translation which is indistinguishable from the human-translated one. Experiments show that the proposed model achieves significant improvements than the traditional NMT model. In Chinese-English translation tasks, we obtain up to +2.0 BLEU points improvement. To the best of our knowledge, this is the first time that the quantitative results about the application of GANs in the traditional NLP task is reported. Meanwhile, we present detailed strategies for GAN training. In addition, We find that the discriminator of the proposed model shows great capability in data cleaning.
Mar 09 2017 cs.CL
Training recurrent neural networks to model long term dependencies is difficult. Hence, we propose to use external linguistic knowledge as an explicit signal to inform the model which memories it should utilize. Specifically, external knowledge is used to augment a sequence with typed edges between arbitrarily distant elements, and the resulting graph is decomposed into directed acyclic subgraphs. We introduce a model that encodes such graphs as explicit memory in recurrent neural networks, and use it to model coreference relations in text. We apply our model to several text comprehension tasks and achieve new state-of-the-art results on all considered benchmarks, including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out of the 20 tasks with only 1000 training examples per task. Analysis of the learned representations further demonstrates the ability of our model to encode fine-grained entity information across a document.
In this paper, we propose a faster-than-Nyquist (FTN) non-orthogonal frequency-division multiplexing (NOFDM) scheme for visible light communications (VLC) where the multiplexing/demultiplexing employs the inverse fractional cosine transform (IFrCT)/FrCT. Different to the common fractional Fourier transform-based NOFDM (FrFT-NOFDM) signal, FrCT-based NOFDM (FrCT-NOFDM) signal is real-valued which can be directly applied to the VLC systems without the expensive upconversion. Thus, FrCT-NOFDM is more suitable for the cost-sensitive VLC systems. Meanwhile, under the same transmission rate, FrCT-NOFDM signal occupies smaller bandwidth compared to OFDM signal. When the bandwidth compression factor $\alpha$ is set to $0.8$, $20\%$ bandwidth saving can be obtained. Therefore, FrCT-NOFDM has higher spectral efficiency and suffers less high-frequency distortion compared to OFDM, which benefits the bandwidth-limited VLC systems. As the simulation results show, bit error rate (BER) performance of FrCT-NOFDM with $\alpha$ of $0.9$ or $0.8$ is better than that of OFDM. Moreover, FrCT-NOFDM has a superior security performance. In conclusion, FrCT-NOFDM shows great potential for application in the future VLC systems.
Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain. This paper aims at generating plausible natural language sentences, whose attributes are dynamically controlled by learning disentangled latent representations with designated semantics. We propose a new neural generative model which combines variational auto-encoders and holistic attribute discriminators for effective imposition of semantic structures. With differentiable approximation to discrete text samples, explicit constraints on independent attribute controls, and efficient collaborative learning of generator and discriminators, our model learns highly interpretable representations from even only word annotations, and produces realistic sentences with desired attributes. Quantitative evaluation validates the accuracy of sentence and attribute generation.
Mar 03 2017 cs.CL
Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we propose structural embedding of syntactic trees (SEST), an algorithm framework to utilize structured information and encode them into vector representations that can boost the performance of algorithms for the machine comprehension. We evaluate our approach using a state-of-the-art neural attention model on the SQuAD dataset. Experimental results demonstrate that our model can accurately identify the syntactic boundaries of the sentences and extract answers that are syntactically coherent over the baseline methods.
Mar 03 2017 cs.SE
Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.
Feb 28 2017 cs.AI
We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method obtains state-of-the-art results on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.
Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder's dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.
We study the extent to which we can infer users' geographical locations from social media. Location inference from social media can benefit many applications, such as disaster management, targeted advertising, and news content tailoring. In recent years, a number of algorithms have been proposed for identifying user locations on social media platforms such as Twitter and Facebook from message contents, friend networks, and interactions between users. In this paper, we propose a novel probabilistic model based on factor graphs for location inference that offers several unique advantages for this task. First, the model generalizes previous methods by incorporating content, network, and deep features learned from social context. The model is also flexible enough to support both supervised learning and semi-supervised learning. Second, we explore several learning algorithms for the proposed model, and present a Two-chain Metropolis-Hastings (MH+) algorithm, which improves the inference accuracy. Third, we validate the proposed model on three different genres of data - Twitter, Weibo, and Facebook - and demonstrate that the proposed model can substantially improve the inference accuracy (+3.3-18.5% by F1-score) over that of several state-of-the-art methods.
We study the problem of semi-supervised question answering----utilizing unlabeled text to boost the performance of question answering models. We propose a novel training framework, the Generative Domain-Adaptive Nets. In this framework, we train a generative model to generate questions based on the unlabeled text, and combine model-generated questions with human-generated questions for training question answering models. We develop novel domain adaptation algorithms, based on reinforcement learning, to alleviate the discrepancy between the model-generated data distribution and the human-generated data distribution. Experiments show that our proposed framework obtains substantial improvement from unlabeled text.
Feb 07 2017 cs.AI
With the advent of modern computer networks, fault diagnosis has been a focus of research activity. This paper reviews the history of fault diagnosis in networks and discusses the main methods in information gathering section, information analyzing section and diagnosing and revolving section of fault diagnosis in networks. Emphasis will be placed upon knowledge-based methods with discussing the advantages and shortcomings of the different methods. The survey is concluded with a description of some open problems.
In this paper, we examine the physical layer security for cooperative wireless networks with multiple intermediate nodes, where the decode-and-forward (DF) protocol is considered. We propose a new joint relay and jammer selection (JRJS) scheme for protecting wireless communications against eavesdropping, where an intermediate node is selected as the relay for the sake of forwarding the source signal to the destination and meanwhile, the remaining intermediate nodes are employed to act as friendly jammers which broadcast the artificial noise for disturbing the eavesdropper. We further investigate the power allocation among the source, relay and friendly jammers for maximizing the secrecy rate of proposed JRJS scheme and derive a closed-form sub-optimal solution. Specificially, all the intermediate nodes which successfully decode the source signal are considered as relay candidates. For each candidate, we derive the sub-optimal closed-form power allocation solution and obtain the secrecy rate result of the corresponding JRJS scheme. Then, the candidate which is capable of achieving the highest secrecy rate is selected as the relay. Two assumptions about the channel state information (CSI), namely the full CSI (FCSI) and partial CSI (PCSI), are considered. Simulation results show that the proposed JRJS scheme outperforms the conventional pure relay selection, pure jamming and GSVD based beamforming schemes in terms of secrecy rate. Additionally, the proposed FCSI based power allocation (FCSI-PA) and PCSI based power allocation (PCSI-PA) schemes both achieve higher secrecy rates than the equal power allocation (EPA) scheme.
Modern statistical machine learning (SML) methods share a major limitation with the early approaches to AI: there is no scalable way to adapt them to new domains. Human learning solves this in part by leveraging a rich, shared, updateable world model. Such scalability requires modularity: updating part of the world model should not impact unrelated parts. We have argued that such modularity will require both "correctability" (so that errors can be corrected without introducing new errors) and "interpretability" (so that we can understand what components need correcting). To achieve this, one could attempt to adapt state of the art SML systems to be interpretable and correctable; or one could see how far the simplest possible interpretable, correctable learning methods can take us, and try to control the limitations of SML methods by applying them only where needed. Here we focus on the latter approach and we investigate two main ideas: "Teacher Assisted Learning", which leverages crowd sourcing to learn language; and "Factored Dialog Learning", which factors the process of application development into roles where the language competencies needed are isolated, enabling non-experts to quickly create new applications. We test these ideas in an "Automated Personal Assistant" (APA) setting, with two scenarios: that of detecting user intent from a user-APA dialog; and that of creating a class of event reminder applications, where a non-expert "teacher" can then create specific apps. For the intent detection task, we use a dataset of a thousand labeled utterances from user dialogs with Cortana, and we show that our approach matches state of the art SML methods, but in addition provides full transparency: the whole (editable) model can be summarized on one human-readable page. For the reminder app task, we ran small user studies to verify the efficacy of the approach.
In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.
Faster-than-Nyquist (FTN) signal achieves higher spectral efficiency and capacity compared to Nyquist signal due to its smaller pulse interval or narrower subcarrier spacing. Shannon limit typically defines the upper-limit capacity of Nyquist signal. To the best of our knowledge, the mathematical expression for the capacity limit of FTN non-orthogonal frequency-division multiplexing (NOFDM) signal is first demonstrated in this paper. The mathematical expression shows that FTN NOFDM signal has the potential to achieve a higher capacity limit compared to Nyquist signal. In this paper, we demonstrate the principle of FTN NOFDM by taking fractional cosine transform-based NOFDM (FrCT-NOFDM) for instance. FrCT-NOFDM is first proposed and implemented by both simulation and experiment. When the bandwidth compression factor $\alpha$ is set to $0.8$ in FrCT-NOFDM, the subcarrier spacing is equal to $40\%$ of the symbol rate per subcarrier, thus the transmission rate is about $25\%$ faster than Nyquist rate. FTN NOFDM with higher capacity would be promising in the future communication systems, especially in the bandwidth-limited applications.
Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension. We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words. We also extend the idea of fine-grained gating to modeling the interaction between questions and paragraphs for reading comprehension. Experiments show that our approach can improve the performance on reading comprehension tasks, achieving new state-of-the-art results on the Children's Book Test dataset. To demonstrate the generality of our gating mechanism, we also show improved results on a social media tag prediction task.
Nov 08 2016 cs.CL
We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention.
Nov 01 2016 cs.SY
Series elastic actuators (SEAs) are growingly important in physical human-robot interaction (HRI) due to their inherent safety and compliance. Cable-driven SEAs also allow flexible installation and remote torque transmission, etc. However, there are still challenges for the impedance control of cable-driven SEAs, such as the reduced bandwidth caused by the elastic component, and the performance balance between reference tracking and robustness. In this paper, a velocity sourced cable-driven SEA has been set up. Then, a stabilizing 2 degrees of freedom (2-DOF) control approach was designed to separately pursue the goals of robustness and torque tracking. Further, the impedance control structure for human-robot interaction was designed and implemented with a torque compensator. Both simulation and practical experiments have validated the efficacy of the 2-DOF method for the control of cable-driven SEAs.
Direction-of-arrival (DOA) estimation refers to the process of retrieving the direction information of several electromagnetic waves/sources from the outputs of a number of receiving antennas that form a sensor array. DOA estimation is a major problem in array signal processing and has wide applications in radar, sonar, wireless communications, etc. With the development of sparse representation and compressed sensing, the last decade has witnessed a tremendous advance in this research topic. The purpose of this article is to provide an overview of these sparse methods for DOA estimation, with a particular highlight on the recently developed gridless sparse methods, e.g., those based on covariance fitting and the atomic norm. Several future research directions are also discussed.
Sep 13 2016 cs.CV
Face detection is challenging as faces in images could be present at arbitrary locations and in different scales. We propose a three-stage cascade structure based on fully convolutional neural networks (FCNs). It first proposes the approximate locations where the faces may be, then aims to find the accurate location by zooming on to the faces. Each level of the FCN cascade is a multi-scale fully-convolutional network, which generates scores at different locations and in different scales. A score map is generated after each FCN stage. Probable regions of face are selected and fed to the next stage. The number of proposals is decreased after each level, and the areas of regions are decreased to more precisely fit the face. Compared to passing proposals directly between stages, passing probable regions can decrease the number of proposals and reduce the cases where first stage doesn't propose good bounding boxes. We show that by using FCN and score map, the FCN cascade face detector can achieve strong performance on public datasets.
Sep 08 2016 cs.LG
Recently, Stochastic Neighbor Embedding (SNE) methods have widely been applied in data visualization. These methods minimize the divergence between the pairwise similarities of high- and low-dimensional data. Despite their popularity, the current SNE methods experience the "crowding problem" when the data include highly imbalanced similarities. This implies that the data points with higher total similarity tend to get crowded around the display center. To solve this problem, we normalize the similarity matrix to be doubly stochastic such that all the data points have equal total similarities. A fast normalization method is proposed. Furthermore, we show empirically and theoretically that the doubly stochasticity constraint often leads to approximately spherical embeddings. This suggests replacing a flat space with spheres as the embedding space. The spherical embedding eliminates the discrepancy between the center and the periphery in visualization and thus resolves the "crowding problem". We compared the proposed method with the state-of-the-art SNE method on three real-world datasets. The results indicate that our method is more favorable in terms of visualization quality.
This paper provides an analytically tractable framework of investigating the statistical properties of the signal-to-interference power ratio (SIR) with a general distribution in a heterogeneous wireless ad hoc network in which there are K different types of transmitters (TXs) communicating with their unique intended receiver (RX). The TXs of each type form an independent homogeneous Poisson point process. In the first part of this paper, we introduce a novel approach to deriving the Laplace transform of the reciprocal of the SIR and use it to characterize the distribution of the SIR. Our main findings show that the closed-form expression of the distribution of the SIR can be obtained whenever the receive signal power has an Erlang distribution, and an almost closed-form expression can be found if the power-law pathloss model has a pathloss exponent of four. In the second part of this paper, we aim to apply the derived distribution of the SIR in finding the two important performance metrics: the success probability and ergodic link capacity. For each type of the RXs, the success probability with (without) interference cancellation and that with (without) the proposed stochastic power control are found in a compact form. With the aid of the derived Shannon transform identity, the ergodic link capacities of K-type RXs are derived with low complexity, and they can be applied to many transmitting scenarios, such as multi-antenna communication and stochastic power control. Finally, we analyze the spatial throughput capacity of the heterogeneous network defined based on the derived K success probabilities and ergodic link capacities and show the existence of its maximum.
Many community detection algorithms have been developed to uncover the mesoscopic properties of complex networks. However how good an algorithm is, in terms of accuracy and computing time, remains still open. Testing algorithms on real-world network has certain restrictions which made their insights potentially biased: the networks are usually small, and the underlying communities are not defined objectively. In this study, we employ the Lancichinetti-Fortunato-Radicchi benchmark graph to test eight state-of-the-art algorithms. We quantify the accuracy using complementary measures and algorithms' computing time. Based on simple network properties and the aforementioned results, we provide guidelines that help to choose the most adequate community detection algorithm for a given network. Moreover, these rules allow uncovering limitations in the use of specific algorithms given macroscopic network properties. Our contribution is threefold: firstly, we provide actual techniques to determine which is the most suited algorithm in most circumstances based on observable properties of the network under consideration. Secondly, we use the mixing parameter as an easily measurable indicator of finding the ranges of reliability of the different algorithms. Finally, we study the dependency with network size focusing on both the algorithm's predicting power and the effective computing time.
Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future. We improve upon the attention model of Bahdanau et al. (2014) by explicitly modeling the relationship between previous and subsequent attention levels for each word using one recurrent network per input word. This architecture easily captures informative features, such as fertility and regularities in relative distortion. In experiments, we show our parameterization of attention improves translation quality.
In this paper we study the problem of answering cloze-style questions over documents. Our model, the Gated-Attention (GA) Reader, integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader. This enables the reader to build query-specific representations of tokens in the document for accurate answer selection. The GA Reader obtains state-of-the-art results on three benchmarks for this task--the CNN \& Daily Mail news stories and the Who Did What dataset. The effectiveness of multiplicative interaction is demonstrated by an ablation study, and by comparing to alternative compositional operators for implementing the gated-attention. The code is available at https://github.com/bdhingra/ga-reader.
Jun 07 2016 cs.CY
The usage of recommendation agents (RAs) in the online marketplace can help consumers to locate their desired products. RAs can help consumers effectively obtain comprehensive product information and compare their candidate target products. As a result, RAs have affected consumers shopping behaviour. In this study, we investigate the usage and the influence of RAs in the online marketplace. Based on the Stimulus-Organism-Response (SOR) model, we propose that the stimulus of using RAs (informativeness, product search effectiveness and the lack of sociality stress) can affect consumers attitude (perceived control and satisfaction), which further affects their behavioural outcomes like impulsive purchase. We validate this research model with survey data from 157 users of RAs. The data largely support the proposed model and indicate that the RAs can significantly contribute to impulsive purchase behaviour in online marketplaces. Theoretical and practical contributions are discussed.
We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoder- decoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-of- the-art encoder-decoder systems on the tasks of image captioning and source code captioning.
The classical result of Vandermonde decomposition of positive semidefinite Toeplitz matrices, which dates back to the early twentieth century, forms the basis of modern subspace and recent atomic norm methods for frequency estimation. In this paper, we study the Vandermonde decomposition in which the frequencies are restricted to lie in a given interval, referred to as frequency-selective Vandermonde decomposition. The existence and uniqueness of the decomposition are studied under explicit conditions on the Toeplitz matrix. The new result is connected by duality to the positive real lemma for trigonometric polynomials nonnegative on the same frequency interval. Its applications in the theory of moments and line spectral estimation are illustrated. In particular, it provides a solution to the truncated trigonometric $K$-moment problem. It is used to derive a primal semidefinite program formulation of the frequency-selective atomic norm in which the frequencies are known \em a priori to lie in certain frequency bands. Numerical examples are also provided.
Mar 30 2016 cs.LG
We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.