results for au:Xie_L in:cs

- In the first part of this paper, inspired by the geometric method of Jean-Pierre Marec, we consider the two-impulse Hohmann transfer problem between two coplanar circular orbits as a constrained nonlinear programming problem. By using the Kuhn-Tucker theorem, we analytically prove the global optimality of the Hohmann transfer. Two sets of feasible solutions are found, one of which corresponding to the Hohmann transfer is the global minimum, and the other is a local minimum. In the second part, we formulate the Hohmann transfer problem as two-point and multi-point boundary-value problems by using the calculus of variations. With the help of the Matlab solver bvp4c, two numerical examples are solved successfully, which verifies that the Hohmann transfer is indeed the solution of these boundary-value problems. Via static and dynamic constrained optimization, the solution to the orbit transfer problem proposed by W. Hohmann ninety-two years ago and its global optimality are re-discovered.
- Nov 21 2017 cs.CV arXiv:1711.07183v2Generating adversarial examples is an intriguing problem and an important way of understanding the working mechanism of deep neural networks. Recently, it has attracted a lot of attention in the computer vision community. Most existing approaches generated perturbations in image space, i.e., each pixel can be modified independently. However, it remains unclear whether these adversarial examples are authentic, in the sense that they correspond to actual changes in physical properties. This paper aims at exploring this topic in the contexts of object classification and visual question answering. The baselines are set to be several state-of-the-art deep neural networks which receive 2D input images. We augment these networks with a differentiable 3D rendering layer in front, so that a 3D scene (in physical space) is rendered into a 2D image (in image space), and then mapped to a prediction (in output space). There are two (direct or indirect) ways of attacking the physical parameters. The former back-propagates the gradients of error signals from output space to physical space directly, while the latter first constructs an adversary in image space, and then attempts to find the best solution in physical space that is rendered into this image. An important finding is that attacking physical space is much more difficult, as the direct method, compared with that used in image space, produces a much lower success rate and requires heavier perturbations to be added. On the other hand, the indirect method does not work out, suggesting that adversaries generated in image space are inauthentic. By interpreting them in physical space, most of these adversaries can be filtered out, showing promise in defending adversaries.
- Nov 15 2017 cs.CV arXiv:1711.04451v1It is very attractive to formulate vision in terms of pattern theory \citeMumford2010pattern, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and can easily be fooled by adding occluding objects. It is natural to wonder whether by better understanding deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal representations of a deep network using vehicle images from the PASCAL3D+ dataset. We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles. To analyze this we annotate these vehicles by their semantic parts to create a new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised part detectors. We show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines (SVM). We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like SVM and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called VehicleOcclusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.
- Nov 07 2017 cs.SI physics.soc-ph arXiv:1711.01679v2Two of the main frameworks used for modeling information diffusions in the online are epidemic models and Hawkes point processes. The former consider information as a viral contagion which spreads into a population of online users, and employ tools initially developed in the field of epidemiology. The latter view individual broadcasts of information as events in a point process and they modulate the event rate according to observed (or assumed) social principles; they have been broadly used in fields such as finance and geophysics. Here, we study for the first time the connection between these two mature frameworks, and we find them to be equivalent. More precisely, the rate of events in the Hawkes model is identical to the rate of new infections in the Susceptible-Infected-Recovered (SIR) model when taking the expectation over recovery events -- which are unobserved in a Hawkes process. This paves the way to apply tools developed for one framework across the gap, to the other framework. We make three further contributions in this work. First, we propose HawkesN, an extension of the basic Hawkes model, in which we introduce the notion of finite maximum number of events that can occur. Second, we show HawkesN to explain real retweet cascades better than the current state-of-the-art Hawkes modeling. The size of the population can be learned while observing the cascade, at the expense of requiring larger amounts of training data. Third, we employ an SIR method based on Markov chains for computing the final size distribution for a partially observed cascade fitted with HawkesN. We propose an explanation to the generally perceived randomness of online popularity: the final size distribution for real diffusion cascades tends to have two maxima, one corresponding to large cascade sizes and another one around zero.
- Images in the wild encapsulate rich knowledge about varied abstract concepts and cannot be sufficiently described with models built only using image-caption pairs containing selected objects. We propose to handle such a task with the guidance of a knowledge base that incorporate many abstract concepts. Our method is a two-step process where we first build a multi-entity-label image recognition model to predict abstract concepts as image labels and then leverage them in the second step as an external semantic attention and constrained inference in the caption generation model for describing images that depict unseen/novel objects. Evaluations show that our models outperform most of the prior work for out-of-domain captioning on MSCOCO and are useful for integration of knowledge and vision in general.
- Oct 17 2017 cs.RO arXiv:1710.05502v1This paper presents a non-iterative method for dense mapping using inertial sensor and depth camera. To obtain data correspondence, traditional methods resort to iterative algorithms which are computationally expensive. This paper proposes a novel non-iterative framework with a computationally efficient closed-form solution to the front-end of the dense mapping system. First, 3-D point clouds with 6 degrees of freedom are decoupled into independent subspaces, in which point clouds can be matched respectively. Second, without any prior knowledge, the matching process is carried out by single key-frame training in frequency domain, which reduces computational requirements dramatically and provides a closed-form solution. Third, 3-D maps are presented and fused in the subspaces directly to further reduce the complexity. In this manner, the complexity of our method is only $\mathcal{O}(n\log{n})$ where $n$ is the number of matched points. Extensive tests show that, compared with the state-of-the-arts, the proposed method is able to run at a much faster speed and yet still with comparable accuracy.
- Consider a generalized multiterminal source coding system, where $\ell\choose m$ encoders, each observing a distinct size-$m$ subset of $\ell$ ($\ell\geq 2$) zero-mean unit-variance symmetrically correlated Gaussian sources with correlation coefficient $\rho$, compress their observations in such a way that a joint decoder can reconstruct the sources within a prescribed mean squared error distortion based on the compressed data. The optimal rate-distortion performance of this system was previously known only for the two extreme cases $m=\ell$ (the centralized case) and $m=1$ (the distributed case), and except when $\rho=0$, the centralized system can achieve strictly lower compression rates than the distributed system under all non-trivial distortion constraints. Somewhat surprisingly, it is established in the present paper that the optimal rate-distortion performance of the afore-described generalized multiterminal source coding system with $m\geq 2$ coincides with that of the centralized system for all distortions when $\rho\leq 0$ and for distortions below an explicit positive threshold (depending on $m$) when $\rho>0$. Moreover, when $\rho>0$, the minimum achievable rate of generalized multiterminal source coding subject to an arbitrary positive distortion constraint $d$ is shown to be within a finite gap (depending on $m$ and $d$) from its centralized counterpart in the large $\ell$ limit except for possibly the critical distortion $d=1-\rho$.
- This paper deals with a new type of warehousing system, Robotic Mobile Fulfillment Systems (RMFS). In such systems, robots are sent to carry storage units, so-called "pods", from the inventory and bring them to human operators working at stations. At the stations, the items are picked according to customers' orders. There exist new decision problems in such systems, for example, the reallocation of pods after their visits at work stations or the selection of pods to fulfill orders. In order to analyze decision strategies for these decision problems and relations between them, we develop a simulation framework called "RAWSim-O" in this paper. Moreover, we show a real-world application of our simulation framework by integrating simple robot prototypes based on vacuum cleaning robots.
- Oct 03 2017 cs.RO arXiv:1710.00156v1This paper proposes an ultra-wideband (UWB) aided localization and mapping system that leverages on inertial sensor and depth camera. Inspired by the fact that visual odometry (VO) system, regardless of its accuracy in the short term, still faces challenges with accumulated errors in the long run or under unfavourable environments, the UWB ranging measurements are fused to remove the visual drift and improve the robustness. A general framework is developed which consists of three parallel threads, two of which carry out the visual-inertial odometry (VIO) and UWB localization respectively. The other mapping thread integrates visual tracking constraints into a pose graph with the proposed smooth and virtual range constraints, such that an optimization is performed to provide robust trajectory estimation. Experiments show that the proposed system is able to create dense drift-free maps in real-time even running on an ultra-low power processor in featureless environments.
- Sep 19 2017 cs.CV arXiv:1709.05936v3Cross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking. Beyond the linear cross-correlator, this paper proposes a kernel cross-correlator (KCC) that breaks traditional limitations. First, by introducing the kernel trick, the KCC extends the linear cross-correlation to non-linear space, which is more robust to signal noises and distortions. Second, the connection to the existing works shows that KCC provides a unified solution for correlation filters. Third, KCC is applicable to any kernel function and is not limited to circulant structure on training data, thus it is able to predict affine transformations with customized properties. Last, by leveraging the fast Fourier transform (FFT), KCC eliminates direct calculation of kernel vectors, thus achieves better performance yet still with a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices demonstrate its robustness, flexibility, and efficiency. The source codes of both experiments are released at https://github.com/wang-chen/KCC.
- Sep 15 2017 cs.CV arXiv:1709.04518v3We aim at segmenting small organs (e.g., the pancreas) from abdominal CT scans. As the target often occupies a relatively small region in the input image, deep neural networks can be easily confused by the complex and variable background. To alleviate this, researchers proposed a coarse-to-fine approach, which used prediction from the first (coarse) stage to indicate a smaller input region for the second (fine) stage. Despite its effectiveness, this algorithm dealt with two stages individually, which lacked optimizing a global energy function, and limited its ability to incorporate multi-stage visual cues. Missing contextual information led to unsatisfying convergence in iterations, and that the fine stage sometimes produced even lower segmentation accuracy than the coarse stage. This paper presents a Recurrent Saliency Transformation Network. The key innovation is a saliency transformation module, which repeatedly converts the segmentation probability map from the previous iteration as spatial weights and applies these weights to the current iteration. This brings us two-fold benefits. In training, it allows joint optimization over the deep networks dealing with different input scales. In testing, it propagates multi-stage visual information throughout iterations to improve segmentation accuracy. Experiments in the NIH pancreas segmentation dataset demonstrate the state-of-the-art accuracy, which outperforms the previous best by an average of over 2%. Much higher accuracies are also reported on several small organs in a larger dataset collected by ourselves. In addition, our approach enjoys better convergence properties, making it more efficient and reliable in practice.
- Sep 15 2017 cs.CV arXiv:1709.04577v1In this paper, we study the task of detecting semantic parts of an object. This is very important in computer vision, as it provides the possibility to parse an object as human do, and helps us better understand object detection algorithms. Also, detecting semantic parts is very challenging especially when the parts are partially or fully occluded. In this scenario, the popular proposal-based methods like Faster-RCNN often produce unsatisfactory results, because both the proposal extraction and classification stages may be confused by the irrelevant occluders. To this end, we propose a novel detection framework, named DeepVoting, which accumulates local visual cues, called visual concepts (VC), to locate the semantic parts. Our approach involves adding two layers after the intermediate outputs of a deep neural network. The first layer is used to extract VC responses, and the second layer performs a voting mechanism to capture the spatial relationship between VC's and semantic parts. The benefit is that each semantic part is supported by multiple VC's. Even if some of the supporting VC's are missing due to occlusion, we can still infer the presence of the target semantic part using the remaining ones. To avoid generating an exponentially large training set to cover all occlusion cases, we train our model without seeing occlusion and transfer the learned knowledge to deal with occlusions. This setting favors learning the models which are naturally robust and adaptive to occlusions instead of over-fitting the occlusion patterns in the training data. In experiments, DeepVoting shows significantly better performance on semantic part detection in occlusion scenarios, compared with Faster-RCNN, with one order of magnitude fewer parameters and 2.5x testing speed. In addition, DeepVoting is explainable as the detection result can be diagnosed via looking up the voted VC's.
- In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight-sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet's learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.
- This work studies engagement, or time spent watching online videos. Most current work focuses on modeling the number of views, which is known to be inadequate % measure of engagement and video quality due to and prone to spam. More broadly, engagement has been studied in reading behavior of news and web pages, click-through in online ads, but for videos, a robust set of engagement metrics do not exist yet. We study a set of aggregate engagement metrics including watch time, percentage of video watched, and relate them to views and video properties such as length, category and topics. We propose a new metric, relative engagement, which is calibrated over video duration, stable over time, and strongly correlated with video quality. We leverage relative engagement to predict watch percentage before a video gains views, and can explain most of its variance - R2=0.77. We further link daily watch time to external sharing of a video using the self-exciting Hawkes Intensity Processes, and find that we can forecast daily watch time more accurately than daily views. We measure engagement over 5.3 million YouTube videos. This new dataset and benchmarks will be publicly available. This work provides a set of new yardsticks for measuring content including video and other length-constrained media such as songs and podcasts. It opens a new direction for modeling user-specific engagement and making better recommendations.
- This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data - we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix
- Aug 18 2017 cs.LG arXiv:1708.05165v1Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit. It is strongly desirable for the recommended sequence to avoid loops, as tourists typically would not wish to revisit the same location. Given some learned model that scores sequences, how can we then find the highest-scoring sequence that is loop-free? This paper studies this problem, with three contributions. First, we detail three distinct approaches to the problem -- graph-based heuristics, integer linear programming, and list extensions of the Viterbi algorithm -- and qualitatively summarise their strengths and weaknesses. Second, we explicate how two ostensibly different approaches to the list Viterbi algorithm are in fact fundamentally identical. Third, we conduct experiments on real-world trajectory recommendation datasets to identify the tradeoffs imposed by each of the three approaches. Overall, our results indicate that a greedy graph-based heuristic offer excellent performance and runtime, leading us to recommend its use for removing loops at prediction time.
- Jul 26 2017 cs.CV arXiv:1707.07819v1In this paper, we address the task of detecting semantic parts on partially occluded objects. We consider a scenario where the model is trained using non-occluded images but tested on occluded images. The motivation is that there are infinite number of occlusion patterns in real world, which cannot be fully covered in the training data. So the models should be inherently robust and adaptive to occlusions instead of fitting / learning the occlusion patterns in the training data. Our approach detects semantic parts by accumulating the confidence of local visual cues. Specifically, the method uses a simple voting method, based on log-likelihood ratio tests and spatial constraints, to combine the evidence of local cues. These cues are called visual concepts, which are derived by clustering the internal states of deep networks. We evaluate our voting scheme on the VehicleSemanticPart dataset with dense part annotations. We randomly place two, three or four irrelevant objects onto the target object to generate testing images with various occlusions. Experiments show that our algorithm outperforms several competitors in semantic part detection when occlusions are present.
- Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on voice search. We propose a smoothing method for attention mechanism and compare with content attention and convolutional attention. Moreover, frame skipping is employed for fast training and convergence. On the XiaoMi TV voice search dataset, we achieve a character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% without using any lexicon or language model. While together with a trigram language model, we reach 2.81% CER and 5.77% SER.
- Jul 07 2017 cs.SD arXiv:1707.01670v2In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN). In particular, we propose a novel architecture combining the traditional acoustic loss function and the GAN's discriminative loss under a multi-task learning (MTL) framework. The mean squared error (MSE) is usually used to estimate the parameters of deep neural networks, which only considers the numerical difference between the raw audio and the synthesized one. To mitigate this problem, we introduce the GAN as a second task to determine if the input is a natural speech with specific conditions. In this MTL framework, the MSE optimization improves the stability of GAN, and at the same time GAN produces samples with a distribution closer to natural speech. Listening tests show that the multi-task architecture can generate more natural speech that satisfies human perception than the conventional methods.
- Jul 07 2017 cs.HC arXiv:1707.01627v2We present an interactive visualisation tool for recommending travel trajectories. This system is based on new machine learning formulations and algorithms for the sequence recommendation problem. The system starts from a map-based overview, taking an interactive query as starting point. It then breaks down contributions from different geographical and user behavior features, and those from individual points-of-interest versus pairs of consecutive points on a route. The system also supports detailed quantitative interrogation by comparing a large number of features for multiple points. Effective trajectory visualisations can potentially benefit a large cohort of online map users and assist their decision-making. More broadly, the design of this system can inform visualisations of other structured prediction tasks, such as for sequences or trees.
- Jun 30 2017 cs.RO arXiv:1706.09829v1Obstacle avoidance is a fundamental requirement for autonomous robots which operate in, and interact with, the real world. When perception is limited to monocular vision avoiding collision becomes significantly more challenging due to the lack of 3D information. Conventional path planners for obstacle avoidance require tuning a number of parameters and do not have the ability to directly benefit from large datasets and continuous use. In this paper, a dueling architecture based deep double-Q network (D3QN) is proposed for obstacle avoidance, using only monocular RGB vision. Based on the dueling and double-Q mechanisms, D3QN can efficiently learn how to avoid obstacles in a simulator even with very noisy depth information predicted from RGB image. Extensive experiments show that D3QN enables twofold acceleration on learning compared with a normal deep Q network and the models trained solely in virtual environments can be directly transferred to real robots, generalizing well to various new environments with previously unseen dynamic objects.
- This paper presents a collection of path planning algorithms for real-time movement of multiple robots across a Robotic Mobile Fulfillment System (RMFS). Robots are assigned to move storage units to pickers at working stations instead of requiring pickers to go to the storage area. Path planning algorithms aim to find paths for the robots to fulfill the requests without collisions or deadlocks. The state-of-the-art path planning algorithms, including WHCA*, FAR, BCP, OD&ID and CBS, were adapted to suit path planning in RMFS and integrated within a simulation tool to guide the robots from their starting points to their destinations during the storage and retrieval processes. Ten different layouts with a variety of numbers of robots, floors, pods, stations and the sizes of storage areas were considered in the simulation study. Performance metrics of throughput, path length and search time were monitored. Simulation results demonstrate the best algorithm based on each performance metric.
- Jun 29 2017 cs.IR arXiv:1706.09067v1Current recommender systems largely focus on static, unstructured content. In many scenarios, we would like to recommend content that has structure, such as a trajectory of points-of-interests in a city, or a playlist of songs. Dubbed Structured Recommendation, this problem differs from the typical structured prediction problem in that there are multiple correct answers for a given input. Motivated by trajectory recommendation, we focus on sequential structures but in contrast to classical Viterbi decoding we require that valid predictions are sequences with no repeated elements. We propose an approach to sequence recommendation based on the structured support vector machine. For prediction, we modify the inference procedure to avoid predicting loops in the sequence. For training, we modify the objective function to account for the existence of multiple ground truths for a given input. We also modify the loss-augmented inference procedure to exclude the known ground truths. Experiments on real-world trajectory recommendation datasets show the benefits of our approach over existing, non-structured recommendation approaches.
- Jun 23 2017 cs.CV arXiv:1706.07346v1Automatic segmentation of an organ and its cystic region is a prerequisite of computer-aided diagnosis. In this paper, we focus on pancreatic cyst segmentation in abdominal CT scan. This task is important and very useful in clinical practice yet challenging due to the low contrast in boundary, the variability in location, shape and the different stages of the pancreatic cancer. Inspired by the high relevance between the location of a pancreas and its cystic region, we introduce extra deep supervision into the segmentation network, so that cyst segmentation can be improved with the help of relatively easier pancreas segmentation. Under a reasonable transformation function, our approach can be factorized into two stages, and each stage can be efficiently optimized via gradient back-propagation throughout the deep networks. We collect a new dataset with 131 pathological samples, which, to the best of our knowledge, is the largest set for pancreatic cyst segmentation. Without human assistance, our approach reports a 63.44% average accuracy, measured by the Dice-Sørensen coefficient (DSC), which is higher than the number (60.46%) without deep supervision.
- Sequential change-point detection when the distribution parameters are unknown is a fundamental problem in statistics and machine learning. When the post-change parameters are unknown, we consider a set of detection procedures based on sequential likelihood ratios with non-anticipating estimators constructed using online convex optimization algorithms such as online mirror descent, which provides a more versatile approach to tackle complex situations where recursive maximum likelihood estimators cannot be found. When the underlying distributions belong to a exponential family and the estimators satisfy the logarithm regret property, we show that this approach is nearly second-order asymptotically optimal. This means that the upper bound for the false alarm rate of the algorithm (measured by the average-run-length) meets the lower bound asymptotically up to a log-log factor when the threshold tends to infinity. Our proof is achieved by making a connection between sequential change-point and online convex optimization and leveraging the logarithmic regret bound property of online mirror descent algorithm. Numerical and real data examples validate our theory.
- Mar 28 2017 cs.CV arXiv:1703.08603v3It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, generally exist for deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations. Based on this idea, we propose a novel algorithm named Dense Adversary Generation (DAG), which generates a large family of adversarial examples, and applies to a wide range of state-of-the-art deep networks for segmentation and detection. We also find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transferability across networks with the same architecture is more significant than in other cases. Besides, summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of black-box adversarial attack.
- Mar 22 2017 cs.CV arXiv:1703.06993v3In this paper, we reveal the importance and benefits of introducing second-order operations into deep neural networks. We propose a novel approach named Second-Order Response Transform (SORT), which appends element-wise product transform to the linear sum of a two-branch network module. A direct advantage of SORT is to facilitate cross-branch response propagation, so that each branch can update its weights based on the current status of the other branch. Moreover, SORT augments the family of transform operations and increases the nonlinearity of the network, making it possible to learn flexible functions to fit the complicated distribution of feature space. SORT can be applied to a wide range of network architectures, including a branched variant of a chain-styled network and a residual network, with very light-weighted modifications. We observe consistent accuracy gain on both small (CIFAR10, CIFAR100 and SVHN) and big (ILSVRC2012) datasets. In addition, SORT is very efficient, as the extra computation overhead is less than 5%.
- Deep learning models (DLMs) are state-of-the-art techniques in speech recognition. However, training good DLMs can be time consuming especially for production-size models and corpora. Although several parallel training algorithms have been proposed to improve training efficiency, there is no clear guidance on which one to choose for the task in hand due to lack of systematic and fair comparison among them. In this paper we aim at filling this gap by comparing four popular parallel training algorithms in speech recognition, namely asynchronous stochastic gradient descent (ASGD), blockwise model-update filtering (BMUF), bulk synchronous parallel (BSP) and elastic averaging stochastic gradient descent (EASGD), on 1000-hour LibriSpeech corpora using feed-forward deep neural networks (DNNs) and convolutional, long short-term memory, DNNs (CLDNNs). Based on our experiments, we recommend using BMUF as the top choice to train acoustic models since it is most stable, scales well with number of GPUs, can achieve reproducible results, and in many cases even outperforms single-GPU SGD. ASGD can be used as a substitute in some cases.
- Mar 07 2017 cs.CV arXiv:1703.01513v1The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following basic principles such as increasing the depth and constructing highway connections, researchers have manually designed a lot of fixed network structures and verified their effectiveness. In this paper, we discuss the possibility of learning deep network structures automatically. Note that the number of possible network structures increases exponentially with the number of layers in the network, which inspires us to adopt the genetic algorithm to efficiently traverse this large search space. We first propose an encoding method to represent each network structure in a fixed-length binary string, and initialize the genetic algorithm by generating a set of randomized individuals. In each generation, we define standard genetic operations, e.g., selection, mutation and crossover, to eliminate weak individuals and then generate more competitive ones. The competitiveness of each individual is defined as its recognition accuracy, which is obtained via training the network from scratch and evaluating it on a validation set. We run the genetic process on two small datasets, i.e., MNIST and CIFAR10, demonstrating its ability to evolve and find high-quality structures which are little studied before. These structures are also transferrable to the large-scale ILSVRC2012 dataset.
- Mar 06 2017 cs.SI arXiv:1703.01012v3Modeling the popularity dynamics of an online item is an important open problem in computational social science. This paper presents an in-depth study of popularity dynamics under external promotions, especially in predicting popularity jumps of online videos, and determining effective and efficient schedules to promote online content. The recently proposed Hawkes Intensity Process (HIP) models popularity as a non-linear interplay between exogenous stimuli and the endogenous reactions. Here, we propose two novel metrics based on HIP: to describe popularity gain per unit of promotion, and to quantify the time it takes for such effects to unfold. We make increasingly accurate forecasts of future popularity by including information about the intrinsic properties of the video, promotions it receives, and the non-linear effects of popularity ranking. We illustrate by simulation the interplay between the unfolding of popularity over time, and the time-sensitive value of resources. Lastly, our model lends a novel explanation of the commonly adopted periodic and constant promotion strategy in advertising, as increasing the perceived viral potential. This study provides quantitative guidelines about setting promotion schedules considering content virality, timing, and economics.
- Mar 06 2017 cs.CV arXiv:1703.01229v1Deep neural networks are playing an important role in state-of-the-art visual recognition. To represent high-level visual concepts, modern networks are equipped with large convolutional layers, which use a large number of filters and contribute significantly to model complexity. For example, more than half of the weights of AlexNet are stored in the first fully-connected layer (4,096 filters). We formulate the function of a convolutional layer as learning a large visual vocabulary, and propose an alternative way, namely Deep Collaborative Learning (DCL), to reduce the computational complexity. We replace a convolutional layer with a two-stage DCL module, in which we first construct a couple of smaller convolutional layers individually, and then fuse them at each spatial position to consider feature co-occurrence. In mathematics, DCL can be explained as an efficient way of learning compositional visual concepts, in which the vocabulary size increases exponentially while the model complexity only increases linearly. We evaluate DCL on a wide range of visual recognition tasks, including a series of multi-digit number classification datasets, and some generic image classification datasets such as SVHN, CIFAR and ILSVRC2012. We apply DCL to several state-of-the-art network structures, improving the recognition accuracy meanwhile reducing the number of parameters (16.82% fewer in AlexNet).
- Jan 20 2017 cs.RO arXiv:1701.05294v2The goal of this paper is to create a new framework for dense SLAM that is light enough for micro-robot systems based on depth camera and inertial sensor. Feature-based and direct methods are two mainstreams in visual SLAM. Both methods minimize photometric or reprojection error by iterative solutions, which are computationally expensive. To overcome this problem, we propose a non-iterative framework to reduce computational requirement. First, the attitude and heading reference system (AHRS) and axonometric projection are utilized to decouple the 6 Degree-of-Freedom (DoF) data, so that point clouds can be matched in independent spaces respectively. Second, based on single key-frame training, the matching process is carried out in frequency domain by Fourier transformation, which provides a closed-form non-iterative solution. In this manner, the time complexity is reduced to $\mathcal{O}(n \log{n})$, where $n$ is the number of matched points in each frame. To the best of our knowledge, this method is the first non-iterative and online trainable approach for data association in visual SLAM. Compared with the state-of-the-arts, it runs at a faster speed and obtains 3-D maps with higher resolution yet still with comparable accuracy.
- Dec 28 2016 cs.CV arXiv:1612.08230v4Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-Sørensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications.
- Nov 28 2016 cs.SY arXiv:1611.08222v2We consider the problem of multiple sensor scheduling for remote state estimation of multiple process over a shared link. In this problem, a set of sensors monitor mutually independent dynamical systems in parallel but only one sensor can access the shared channel at each time to transmit the data packet to the estimator. We propose a stochastic event-based sensor scheduling in which each sensor makes transmission decisions based on both channel accessibility and distributed event-triggering conditions. The corresponding minimum mean squared error (MMSE) estimator is explicitly given. Considering information patterns accessed by sensor schedulers, time-based ones can be treated as a special case of the proposed one. By ultilizing realtime information, the proposed schedule outperforms the time-based ones in terms of the estimation quality. Resorting to solving an Markov decision process (MDP) problem with average cost criterion, we can find optimal parameters for the proposed schedule. As for practical use, a greedy algorithm is devised for parameter design, which has rather low computational complexity. We also provide a method to quantify the performance gap between the schedule optimized via MDP and any other schedules.
- Nov 22 2016 cs.CV arXiv:1611.06596v3While recent deep neural networks have achieved a promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural net- works on the foreground (object) and background (context) regions of images respectively. Consider- ing human recognition in the same situations, net- works trained on the pure background without ob- jects achieves highly reasonable recognition performance that beats humans by a large margin if only given context. However, humans still outperform networks with pure object available, which indicates networks and human beings have different mechanisms in understanding an image. Furthermore, we straightforwardly combine multiple trained networks to explore different visual cues learned by different networks. Experiments show that useful visual hints can be explicitly learned separately and then combined to achieve higher performance, which verifies the advantages of the proposed framework.
- Direction-of-arrival (DOA) estimation refers to the process of retrieving the direction information of several electromagnetic waves/sources from the outputs of a number of receiving antennas that form a sensor array. DOA estimation is a major problem in array signal processing and has wide applications in radar, sonar, wireless communications, etc. With the development of sparse representation and compressed sensing, the last decade has witnessed a tremendous advance in this research topic. The purpose of this article is to provide an overview of these sparse methods for DOA estimation, with a particular highlight on the recently developed gridless sparse methods, e.g., those based on covariance fitting and the atomic norm. Several future research directions are also discussed.
- Distributed consensus with data rate constraint is an important research topic of multi-agent systems. Some results have been obtained for consensus of multi-agent systems with integrator dynamics, but it remains challenging for general high-order systems, especially in the presence of unmeasurable states. In this paper, we study the quantized consensus problem for a special kind of high-order systems and investigate the corresponding data rate required for achieving consensus. The state matrix of each agent is a 2m-th order real Jordan block admitting m identical pairs of conjugate poles on the unit circle; each agent has a single input, and only the first state variable can be measured. The case of harmonic oscillators corresponding to m=1 is first investigated under a directed communication topology which contains a spanning tree, while the general case of m >= 2 is considered for a connected and undirected network. In both cases it is concluded that the sufficient number of communication bits to guarantee the consensus at an exponential convergence rate is an integer between $m$ and $2m$, depending on the location of the poles.
- The problem of recommending tours to travellers is an important and broadly studied area. Suggested solutions include various approaches of points-of-interest (POI) recommendation and route planning. We consider the task of recommending a sequence of POIs, that simultaneously uses information about POIs and routes. Our approach unifies the treatment of various sources of information by representing them as features in machine learning algorithms, enabling us to learn from past behaviour. Information about POIs are used to learn a POI ranking model that accounts for the start and end points of tours. Data about previous trajectories are used for learning transition patterns between POIs that enable us to recommend probable routes. In addition, a probabilistic model is proposed to combine the results of POI ranking and the POI to POI transitions. We propose a new F$_1$ score on pairs of POIs that capture the order of visits. Empirical results show that our approach improves on recent methods, and demonstrate that combining points and routes enables better trajectory recommendations.
- Knowledge graph construction consists of two tasks: extracting information from external resources (knowledge population) and inferring missing information through a statistical analysis on the extracted information (knowledge completion). In many cases, insufficient external resources in the knowledge population hinder the subsequent statistical inference. The gap between these two processes can be reduced by an incremental population approach. We propose a new probabilistic knowledge graph factorisation method that benefits from the path structure of existing knowledge (e.g. syllogism) and enables a common modelling approach to be used for both incremental population and knowledge completion tasks. More specifically, the probabilistic formulation allows us to develop an incremental population algorithm that trades off exploitation-exploration. Experiments on three benchmark datasets show that the balanced exploitation-exploration helps the incremental population, and the additional path structure helps to predict missing information in knowledge completion.
- Aug 18 2016 cs.SI physics.soc-ph arXiv:1608.04862v2Predicting popularity, or the total volume of information outbreaks, is an important subproblem for understanding collective behavior in networks. Each of the two main types of recent approaches to the problem, feature-driven and generative models, have desired qualities and clear limitations. This paper bridges the gap between these solutions with a new hybrid approach and a new performance benchmark. We model each social cascade with a marked Hawkes self-exciting point process, and estimate the content virality, memory decay, and user influence. We then learn a predictive layer for popularity prediction using a collection of cascade history. To our surprise, Hawkes process with a predictive overlay outperform recent feature-driven and generative approaches on existing tweet data [43] and a new public benchmark on news tweets. We also found that a basic set of user features and event time summary statistics performs competitively in both classification and regression tasks, and that adding point process information to the feature set further improves predictions. From these observations, we argue that future work on popularity prediction should compare across feature-driven and generative modeling approaches in both classification and regression tasks.
- Jul 25 2016 cs.CV arXiv:1607.06514v1Deep Convolutional Neural Networks (CNNs) are playing important roles in state-of-the-art visual recognition. This paper focuses on modeling the spatial co-occurrence of neuron responses, which is less studied in the previous work. For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them. The idea that grouping neural words into neural phrases is borrowed from the Bag-of-Visual-Words (BoVW) model. Next, the Geometric Neural Phrase Pooling (GNPP) algorithm is proposed to efficiently encode these neural phrases. GNPP acts as a new type of hidden layer, which punishes the isolated neuron responses after convolution, and can be inserted into a CNN model with little extra computational overhead. Experimental results show that GNPP produces significant and consistent accuracy gain in image classification.
- Motivated by the growing importance of demand response in modern power system's operations, we propose an architecture and supporting algorithms for privacy preserving thermal inertial load management as a service provided by the load serving entity (LSE). We focus on an LSE managing a population of its customers' air conditioners, and propose a contractual model where the LSE guarantees quality of service to each customer in terms of keeping their indoor temperature trajectories within respective bands around the desired individual comfort temperatures. We show how the LSE can price the contracts differentiated by the flexibility embodied by the width of the specified bands. We address architectural questions of (i) how the LSE can strategize its energy procurement based on price and ambient temperature forecasts, (ii) how an LSE can close the real time control loop at the aggregate level while providing individual comfort guarantees to loads, without ever measuring the states of an air conditioner for privacy reasons. Control algorithms to enable our proposed architecture are given, and their efficacy is demonstrated on real data.
- May 13 2016 cs.NI arXiv:1605.03678v1Software Defined Networking (SDN) can effectively improve the performance of traffic engineering and has promising application foreground in backbone networks. Therefore, new energy saving schemes must take SDN into account, which is extremely important considering the rapidly increasing energy consumption from Telecom and ISP networks. At the same time, the introduction of SDN in a current network must be incremental in most cases, for both technical and economic reasons. During this period, operators have to manage hybrid networks, where SDN and traditional protocols coexist. In this paper, we study the energy efficient traffic engineering problem in hybrid SDN/IP networks. We first formulate the mathematic optimization model considering SDN/IP hybrid routing mode. As the problem is NP-hard, we propose the fast heuristic algorithm named HEATE (Hybrid Energy-Aware Traffic Engineering). In our proposed HEATE algorithm, the IP routers perform the shortest path routing using the distribute OSPF link weight optimization. The SDNs perform the multi-path routing with traffic flow splitting by the global SDN controller. The HEATE algorithm finds the optimal setting of OSPF link weight and splitting ratio of SDNs. Thus traffic flow is aggregated onto partial links and the underutilized links can be turned off to save energy. By computer simulation results, we show that our algorithm has a significant improvement in energy efficiency in hybrid SDN/IP networks.
- The classical result of Vandermonde decomposition of positive semidefinite Toeplitz matrices, which dates back to the early twentieth century, forms the basis of modern subspace and recent atomic norm methods for frequency estimation. In this paper, we study the Vandermonde decomposition in which the frequencies are restricted to lie in a given interval, referred to as frequency-selective Vandermonde decomposition. The existence and uniqueness of the decomposition are studied under explicit conditions on the Toeplitz matrix. The new result is connected by duality to the positive real lemma for trigonometric polynomials nonnegative on the same frequency interval. Its applications in the theory of moments and line spectral estimation are illustrated. In particular, it provides a solution to the truncated trigonometric $K$-moment problem. It is used to derive a primal semidefinite program formulation of the frequency-selective atomic norm in which the frequencies are known \em a priori to lie in certain frequency bands. Numerical examples are also provided.
- May 03 2016 cs.CV arXiv:1605.00052v1An increasing number of computer vision tasks can be tackled with deep features, which are the intermediate outputs of a pre-trained Convolutional Neural Network. Despite the astonishing performance, deep features extracted from low-level neurons are still below satisfaction, arguably because they cannot access the spatial context contained in the higher layers. In this paper, we present InterActive, a novel algorithm which computes the activeness of neurons and network connections. Activeness is propagated through a neural network in a top-down manner, carrying high-level context and improving the descriptive power of low-level and mid-level neurons. Visualization indicates that neuron activeness can be interpreted as spatial-weighted neuron responses. We achieve state-of-the-art classification performance on a wide range of image datasets.
- May 03 2016 cs.CV arXiv:1605.00055v1During a long period of time we are combating over-fitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc. In this paper, we present DisturbLabel, an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration. Although it seems weird to intentionally generate incorrect training labels, we show that DisturbLabel prevents the network training from over-fitting by implicitly averaging over exponentially many networks which are trained with different label sets. To the best of our knowledge, DisturbLabel serves as the first work which adds noises on the loss layer. Meanwhile, DisturbLabel cooperates well with Dropout to provide complementary regularization functions. Experiments demonstrate competitive recognition results on several popular image recognition datasets.
- We consider the problem of resilient state estimation in the presence of integrity attacks. There are m sensors monitoring the state and p of them are under attack. The sensory data collected by the compromised sensors can be manipulated arbitrarily by the attacker. The classical estimators such as the least squares estimator may not provide a reliable estimate under the so-called (p,m)-sparse attack. In this work, we are not restricting our efforts in studying whether any specific estimator is resilient to the attack or not, but instead we aim to present the generic sufficient and necessary conditions for resilience by considering a general class of convex optimization based estimators. The sufficient and necessary conditions are shown to be tight, with a trivial gap. We further specialize our result to scalar sensor measurements case and present some conservative but verifiable results for practical use. Experimental simulations tested on the IEEE 14-bus test system validate the theoretical analysis.
- In this paper, an improved direction-of-arrival (DOA) estimation algorithm for circular and non-circular signals is proposed. Most state-of-the-art algorithms only deal with the DOA estimation problem for the maximal non-circularity rated and circular signals. However, common non-circularity rated signals are not taken into consideration. The proposed algorithm can estimates not only the maximal non-circularity rated and circular signals, but also the common non-circularity rated signals. Based on the property of the non-circularity phase and rate, the incident signals can be divided into three types as mentioned above, which can be estimated separately. The interrelationship among these signals can be reduced significantly, which means the resolution performance among different types of signals is improved. Simulation results illustrate the effectiveness of the proposed method.
- Mar 24 2016 cs.SY arXiv:1603.07276v1This paper investigates the fundamental coupling between loads and locational marginal prices (LMPs) in security-constrained economic dispatch (SCED). Theoretical analysis based on multi-parametric programming theory points out the unique one-to-one mapping between load and LMP vectors. Such one-to-one mapping is depicted by the concept of system pattern region (SPR) and identifying SPRs is the key to understanding the LMP-load coupling. Built upon the characteristics of SPRs, the SPR identification problem is modeled as a classification problem from a market participant's viewpoint, and a Support Vector Machine based data-driven approach is proposed. It is shown that even without the knowledge of system topology and parameters, the SPRs can be estimated by learning from historical load and price data. Visualization and illustration of the proposed data-driven approach are performed on a 3-bus system as well as the IEEE 118-bus system.
- We consider the discrete memoryless symmetric primitive relay channel, where, a source $X$ wants to send information to a destination $Y$ with the help of a relay $Z$ and the relay can communicate to the destination via an error-free digital link of rate $R_0$, while $Y$ and $Z$ are conditionally independent and identically distributed given $X$. We develop two new upper bounds on the capacity of this channel that are tighter than existing bounds, including the celebrated cut-set bound. Our approach significantly deviates from the standard information-theoretic approach for proving upper bounds on the capacity of multi-user channels. We build on the blowing-up lemma to analyze the probabilistic geometric relations between the typical sets of the $n$-letter random variables associated with a reliable code for communicating over this channel. These relations translate to new entropy inequalities between the $n$-letter random variables involved. As an application of our bounds, we study an open question posed by (Cover, 1987), namely, what is the minimum needed $Z$-$Y$ link rate $R_0^*$ in order for the capacity of the relay channel to be equal to that of the broadcast cut. We consider the special case when the $X$-$Y$ and $X$-$Z$ links are both binary symmetric channels. Our tighter bounds on the capacity of the relay channel immediately translate to tighter lower bounds for $R_0^*$. More interestingly, we show that when $p\to 1/2$, $R_0^*\geq 0.1803$; even though the broadcast channel becomes completely noisy as $p\to 1/2$ and its capacity, and therefore the capacity of the relay channel, goes to zero, a strictly positive rate $R_0$ is required for the relay channel capacity to be equal to the broadcast bound.
- Feb 22 2016 cs.SI arXiv:1602.06033v8Modeling and predicting the popularity of online content is a significant problem for the practice of information dissemination, advertising, and consumption. Recent work analyzing massive datasets advances our understanding of popularity, but one major gap remains: To precisely quantify the relationship between the popularity of an online item and the external promotions it receives. This work supplies the missing link between exogenous inputs from public social media platforms, such as Twitter, and endogenous responses within the content platform, such as YouTube. We develop a novel mathematical model, the Hawkes intensity process, which can explain the complex popularity history of each video according to its type of content, network of diffusion, and sensitivity to promotion. Our model supplies a prototypical description of videos, called an endo-exo map. This map explains popularity as the result of an extrinsic factor - the amount of promotions from the outside world that the video receives, acting upon two intrinsic factors - sensitivity to promotion, and inherent virality. We use this model to forecast future popularity given promotions on a large 5-months feed of the most-tweeted videos, and found it to lower the average error by 28.6% from approaches based on popularity history. Finally, we can identify videos that have a high potential to become viral, as well as those for which promotions will have hardly any effect.
- We propose an achievable rate-region for the two-way multiple-relay channel using decode-and-forward block Markovian coding. We identify a conflict between the information flow in both directions. This conflict leads to an intractable number of decode-forward schemes and achievable rate regions, none of which are universally better than the others. We introduce a new concept in decode-forward coding called ranking, and discover that there is an underlying structure to all of these rate regions expressed in the rank assignment. Through this discovery, we characterize the complete achievable rate region that includes all of the rate regions corresponding to the particular decode-forward schemes. This rate region is an extension of existing results for the two-way one-relay channel and the two-way two-relay channel.
- We consider the problem of robust state estimation in the presence of integrity attacks. There are $m$ sensors monitoring a dynamical process. Subject to the integrity attacks, $p$ out of $m$ measurements can be arbitrarily manipulated. The classical approach such as the MMSE estimation in the literature may not provide a reliable estimate under this so-called $(p,m)$-sparse attack. In this work, we propose a robust estimation framework where distributed local measurements are computed first and fused at the estimator based on a convex optimization problem. We show the sufficient and necessary conditions for robustness of the proposed estimator. The sufficient and necessary conditions are shown to be tight, with a trivial gap. We also present an upper bound on the damage an attacker can cause when the sufficient condition is satisfied. Simulation results are also given to illustrate the effectiveness of the estimator.
- Dec 14 2015 cs.SI arXiv:1512.03523v3The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.
- We consider the problem of robust estimation in the presence of integrity attacks. There are m sensors monitoring the state and p of them are under attack. The malicious measurements collected by the compromised sensors can be manipulated arbitrarily by the attacker. The classical estimators such as the least squares estimator may not provide a reliable estimate under the so-called (p,m)-sparse attack. In this work, we are not restricting our efforts in studying whether any specific estimator is resilient to the attack or not, but instead we aim to present some generic sufficient and necessary conditions for robustness by considering a general class of convex optimization based estimators. The sufficient and necessary conditions are shown to be tight, with a trivial gap.
- Nov 24 2015 cs.CV arXiv:1511.06834v1We study the problem of evaluating super resolution methods. Traditional evaluation methods usually judge the quality of super resolved images based on a single measure of their difference with the original high resolution images. In this paper, we proposed to use both fidelity (the difference with original images) and naturalness (human visual perception of super resolved images) for evaluation. For fidelity evaluation, a new metric is proposed to solve the bias problem of traditional evaluation. For naturalness evaluation, we let humans label preference of super resolution results using pair-wise comparison, and test the correlation between human labeling results and image quality assessment metrics' outputs. Experimental results show that our fidelity-naturalness method is better than the traditional evaluation method for super resolution methods, which could help future research on single-image super resolution.
- Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.
- This paper studies the mean square stabilization problem of vector LTI systems over power constrained lossy channels. The communication channel is with packet dropouts, additive noises and input power constraints. To overcome the difficulty of optimally allocating channel resources among different sub-dynamics, schedulers are designed with time division multiplexing of channels. An adaptive TDMA (Time Division Multiple Access) scheduler is proposed first, which is shown to be able to achieve a larger stabilizability region than the conventional TDMA scheduler, and is optimal under some special cases. In particular, for two-dimensional systems, an optimal scheduler is designed, which provides the necessary and sufficient condition for mean square stabilization.
- The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such style is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.
- Oct 07 2015 cs.SD arXiv:1510.01443v1State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter during synthesis and the speech quality suffers. To bypass this bottleneck in vocoded speech, this paper proposes a phase-embedded waveform representation framework and establishes a magnitude-phase joint modeling platform for high-quality SPSS. Our experiments on waveform reconstruction show that the performance is better than that of the widely-used STRAIGHT. Furthermore, the proposed modeling and synthesis platform outperforms a leading-edge, vocoded, deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN)-based baseline system in various objective evaluation metrics conducted.
- Oct 06 2015 cs.SY arXiv:1510.00983v1We consider a smart-grid connecting several agents, modeled as stochastic dynamical systems, who may be electricity consumers/producers. At each discrete time instant, which may represent a 15 minute interval, each agent may consume/generate some quantity of electrical energy. The Independent System Operator (ISO) is given the task of assigning consumptions/generations to the agents so as to maximize the sum of the utilities accrued to the agents, subject to the constraint that energy generation equals consumption at each time. This task of coordinating generation and demand has to be accomplished by the ISO without the agents revealing their system states, dynamics, or utility/cost functions. We show how and when a simple iterative procedure converges to the optimal solution. The ISO iteratively obtains electricity bids by the agents, and declares the tentative market clearing prices. In response to these prices, the agents submit new bids. On the demand side, the solution yields an optimal demand response for dynamic and stochastic loads. On the generation side, it provides the optimal utilization of stochastically varying renewables such as solar/wind, and generation with fossil fuel based generation with dynamic constraints such as ramping rates. Thereby we solve a decentralized stochastic control problem, without agents sharing any information about their system models, states or utility functions.
- In this paper, we consider a general distributed estimation problem in relay-assisted sensor networks by taking into account time-varying asymmetric communications, fading channels and intermittent measurements. Motivated by centralized filtering algorithms, we propose a distributed innovation-based estimation algorithm by combining the measurement innovation (assimilation of new measurement) and local data innovation (incorporation of neighboring data). Our algorithm is fully distributed which does not need a fusion center. We establish theoretical results regarding asymptotic unbiasedness and consistency of the proposed algorithm. Specifically, in order to cope with time-varying asymmetric communications, we utilize an ordering technique and the generalized Perron complement to manipulate the first and second moment analyses in a tractable framework. Furthermore, we present a performance-oriented design of the proposed algorithm for energy-constrained networks based on the theoretical results. Simulation results corroborate the theoretical findings, thus demonstrating the effectiveness of the proposed algorithm.
- This paper is concerned with the mean square stabilization problem of discrete-time LTI systems over a power constrained fading channel. Different from existing research works, the channel considered in this paper suffers from both fading and additive noises. We allow any form of causal channel encoders/decoders, unlike linear encoders/decoders commonly studied in the literature. Sufficient conditions and necessary conditions for the mean square stabilizability are given in terms of channel parameters such as transmission power and fading and additive noise statistics in relation to the unstable eigenvalues of the open-loop system matrix. The corresponding mean square capacity of the power constrained fading channel under causal encoders/decoders is given. It is proved that this mean square capacity is smaller than the corresponding Shannon channel capacity. In the end, numerical examples are presented, which demonstrate that the causal encoders/decoders render less restrictive stabilizability conditions than those under linear encoders/decoders studied in the existing works.
- Jul 17 2015 cs.SY arXiv:1507.04657v1In this paper, we address a key issue of designing architectures and algorithms which generate optimal demand response in a decentralized manner for a smart-grid consisting of several stochastic renewables and dynamic loads. By optimal demand response, we refer to the demand response which maximizes the utility of the agents connected to the smart-grid. By decentralized we refer to the desirable case where neither the independent system operator (ISO) needs to know the dynamics/utilities of the agents, nor do the agents need to have a knowledge of the dynamics/utilities of other agents connected to the grid. The communication between the ISO and agents is restricted to the ISO announcing a pricing policy and the agents responding with their energy generation/consumption bids in response to the pricing policy. We provide a complete solution for both the deterministic and stochastic cases. It features a price iteration scheme that results in optimality of social welfare. We also provide an optimal solution for the case where there is a common randomness affecting and observed by all agents. This solution can be computationally complex, and we pose approximations. For the more general partially observed randomness case, we exhibit a relaxation that significantly reduces complexity. We also provide an approximation strategy that leads to a model predictive control (MPC) approach. Simulation results comparing the resulting optimal demand response with the existing architectures employed by the ISO illustrate the benefit in social welfare utility realized by our scheme. To the best of the authors' knowledge, this is the first work of its kind to explicitly mark out the optimal response of dynamic demand.
- The Vandermonde decomposition of Toeplitz matrices, discovered by Carathéodory and Fejér in the 1910s and rediscovered by Pisarenko in the 1970s, forms the basis of modern subspace methods for 1D frequency estimation. Many related numerical tools have also been developed for multidimensional (MD), especially 2D, frequency estimation; however, a fundamental question has remained unresolved as to whether an analog of the Vandermonde decomposition holds for multilevel Toeplitz matrices in the MD case. In this paper, an affirmative answer to this question and a constructive method for finding the decomposition are provided when the matrix rank is lower than the dimension of each Toeplitz block. A numerical method for searching for a decomposition is also proposed when the matrix rank is higher. The new results are applied to studying MD frequency estimation within the recent super-resolution framework. A precise formulation of the atomic $\ell_0$ norm is derived using the Vandermonde decomposition. Practical algorithms for frequency estimation are proposed based on relaxation techniques. Extensive numerical simulations are provided to demonstrate the effectiveness of these algorithms compared to the existing atomic norm and subspace methods.
- Mar 11 2015 cs.GT arXiv:1503.02951v3We consider the general problem of resource sharing in societal networks, consisting of interconnected communication, transportation, energy and other networks important to the functioning of society. Participants in such network need to take decisions daily, both on the quantity of resources to use as well as the periods of usage. With this in mind, we discuss the problem of incentivizing users to behave in such a way that society as a whole benefits. In order to perceive societal level impact, such incentives may take the form of rewarding users with lottery tickets based on good behavior, and periodically conducting a lottery to translate these tickets into real rewards. We will pose the user decision problem as a mean field game (MFG), and the incentives question as one of trying to select a good mean field equilibrium (MFE). In such a framework, each agent (a participant in the societal network) takes a decision based on an assumed distribution of actions of his/her competitors, and the incentives provided by the social planner. The system is said to be at MFE if the agent's action is a sample drawn from the assumed distribution. We will show the existence of such an MFE under different settings, and also illustrate how to choose an attractive equilibrium using as an example demand-response in energy networks.
- Jan 23 2015 cs.SY arXiv:1501.05469v1In this paper, we consider the peak-covariance stability of Kalman filtering subject to packet losses. The length of consecutive packet losses is governed by a time-homogeneous finite-state Markov chain. We establish a sufficient condition for peak-covariance stability and show that this stability check can be recast as a linear matrix inequality (LMI) feasibility problem. Comparing with the literature, the stability condition given in this paper is invariant with respect to similarity state transformations; moreover, our condition is proved to be less conservative than the existing results. Numerical examples are provided to demonstrate the effectiveness of our result.
- In this paper, we consider the parameter estimation problem over sensor networks in the presence of quantized data and directed communication links. We propose a two-stage algorithm aiming at achieving the centralized sample mean estimate in a distributed manner. Different from the existing algorithms, a running average technique is utilized in the proposed algorithm to smear out the randomness caused by the probabilistic quantization scheme. With the running average technique, it is shown that the centralized sample mean estimate can be achieved both in the mean square and almost sure senses, which is not observed in the conventional consensus algorithms. In addition, the rates of convergence are given to quantify the mean square and almost sure performances. Finally, simulation results are presented to illustrate the effectiveness of the proposed algorithm and highlight the improvements by using running average technique.
- This paper considers a new bi-objective optimization formulation for robust RGB-D visual odometry. We investigate two methods for solving the proposed bi-objective optimization problem: the weighted sum method (in which the objective functions are combined into a single objective function) and the bounded objective method (in which one of the objective functions is optimized and the value of the other objective function is bounded via a constraint). Our experimental results for the open source TUM RGB-D dataset show that the new bi-objective optimization formulation is superior to several existing RGB-D odometry methods. In particular, the new formulation yields more accurate motion estimates and is more robust when textural or structural features in the image sequence are lacking.
- The super-resolution theory developed recently by Candès and Fernandes-Granda aims to recover fine details of a sparse frequency spectrum from coarse scale information only. The theory was then extended to the cases with compressive samples and/or multiple measurement vectors. However, the existing atomic norm (or total variation norm) techniques succeed only if the frequencies are sufficiently separated, prohibiting commonly known high resolution. In this paper, a reweighted atomic-norm minimization (RAM) approach is proposed which iteratively carries out atomic norm minimization (ANM) with a sound reweighting strategy that enhances sparsity and resolution. It is demonstrated analytically and via numerical simulations that the proposed method achieves high resolution with application to DOA estimation.