Predicting fine-grained interests of users with temporal behavior is important to personalization and information filtering applications. However, existing interest prediction methods are incapable of capturing the subtle degreed user interests towards particular items, and the internal time-varying drifting attention of individuals is not studied yet. Moreover, the prediction process can also be affected by inter-personal influence, known as behavioral mutual infectivity. Inspired by point process in modeling temporal point process, in this paper we present a deep prediction method based on two recurrent neural networks (RNNs) to jointly model each user's continuous browsing history and asynchronous event sequences in the context of inter-user behavioral mutual infectivity. Our model is able to predict the fine-grained interest from a user regarding a particular item and corresponding timestamps when an occurrence of event takes place. The proposed approach is more flexible to capture the dynamic characteristic of event sequences by using the temporal point process to model event data and timely update its intensity function by RNNs. Furthermore, to improve the interpretability of the model, the attention mechanism is introduced to emphasize both intra-personal and inter-personal behavior influence over time. Experiments on real datasets demonstrate that our model outperforms the state-of-the-art methods in fine-grained user interest prediction.
In most macro-scale robotics systems , propulsion and controls are enabled through a physical tether or complex on-board electronics and batteries. A tether simplifies the design process but limits the range of motion of the robot, while on-board controls and power supplies are heavy and complicate the design process. Here we present a simple design principle for an untethered, entirely soft, swimming robot with the ability to achieve preprogrammed, directional propulsion without a battery or on-board electronics. Locomotion is achieved by employing actuators that harness the large displacements of bistable elements, triggered by surrounding temperature changes. Powered by shape memory polymer (SMP) muscles, the bistable elements in turn actuates the robot's fins. Our robots are fabricated entirely using a commercially available 3D printer with no post-processing. As a proof-of-concept, we demonstrate the ability to program a vessel, which can autonomously deliver a cargo and navigate back to the deployment point.
In multi-agent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and static obstacles, often without communication with each other. Existing methods compute motions that are optimal locally but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop methods to allow agents to dynamically adapt their behavior to their local conditions. We accomplish this by formulating the multi-agent navigation problem as an action-selection problem, and propose an approach, ALAN, that allows agents to compute time-efficient and collision-free motions. ALAN is highly scalable because each agent makes its own decisions on how to move using a set of velocities optimized for a variety of navigation tasks. Experimental results show that the agents using ALAN, in general, reach their destinations faster than using ORCA, a state-of-the-art collision avoidance framework, the Social Forces model for pedestrian navigation, and a Predictive collision avoidance model.
Oct 12 2017 cs.SY
The stable spline (SS) kernel and the diagonal correlated (DC) kernel are two kernels that have been applied and studied extensively for kernel-based regularized LTI system identification. In this note, we show that similar to the derivation of the SS kernel, the continuous-time DC kernel can be derived by applying the same "stable" coordinate change to a "generalized" first-order spline kernel, and thus can be interpreted as a stable generalized first-order spline kernel. This interpretation provides new facets to understand the properties of the DC kernel. In particular, we derive a new orthonormal basis expansion of the DC kernel, and the explicit expression of the norm of the RKHS associated with the DC kernel. Moreover, for the non-uniformly sampled DC kernel, we derive its maximum entropy property and show that its kernel matrix has tridiagonal inverse.
Oct 05 2017 cs.CV
Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement. Although plenty of efforts have been dedicated to this task, several issues still remain unsolved for generating vivid and detail-preserving personal sketch portraits. For example, quite a few artifacts may exist in synthesizing hairpins and glasses, and textural details may be lost in the regions of hair or mustache. Moreover, the generalization ability of current systems is somewhat limited since they usually require elaborately collecting a dictionary of examples or carefully tuning features/components. In this paper, we present a novel representation learning framework that generates an end-to-end photo-sketch mapping through structure and texture decomposition. In the training stage, we first decompose the input face photo into different components according to their representational contents (i.e., structural and textural parts) by using a pre-trained Convolutional Neural Network (CNN). Then, we utilize a Branched Fully Convolutional Neural Network (BFCN) for learning structural and textural representations, respectively. In addition, we design a Sorted Matching Mean Square Error (SM-MSE) metric to measure texture patterns in the loss function. In the stage of sketch rendering, our approach automatically generates structural and textural representations for the input photo and produces the final result via a probabilistic fusion scheme. Extensive experiments on several challenging benchmarks suggest that our approach outperforms example-based synthesis algorithms in terms of both perceptual and objective metrics. In addition, the proposed method also has better generalization ability across dataset without additional training.
Sep 12 2017 cs.CV
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder's test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.
Modularity, since its introduction, has remained one of the most widely used metrics to assess the quality of community structure in a complex network. However the resolution limit problem associated with modularity limits its applicability to networks with community sizes smaller than a certain scale. In the past various attempts have been made to solve this problem. More recently a new metric, modularity density, was introduced for the quality of community structure in networks in order to solve some of the known problems with modularity, particularly the resolution limit problem. Modularity density resolves some communities which are otherwise undetectable using modularity. However, we find that it does not solve the resolution limit problem completely by investigating some cases where it fails to detect expected community structures. To address this problem, we introduce a variant of this metric and show that it further reduces the resolution limit problem, effectively eliminating the problem in a wide range of networks.
Aug 21 2017 cs.SY
Input design is an important issue for classical system identification methods but has not been investigated for the kernel-based regularization method (KRM) until very recently. In this paper, we consider in the time domain the input design problem of KRMs for LTI system identification. Different from the recent result, we adopt a Bayesian perspective and in particular make use of scalar measures (e.g., the $A$-optimality, $D$-optimality, and $E$-optimality) of the Bayesian mean square error matrix as the design criteria subject to power-constraint on the input. Instead to solve the optimization problem directly, we propose a two-step procedure. In the first step, by making suitable assumptions on the unknown input, we construct a quadratic map (transformation) of the input such that the transformed input design problems are convex, the number of optimization variables is independent of the number of input data, and their global minima can be found efficiently by applying well-developed convex optimization software packages. In the second step, we derive the expression of the optimal input based on the global minima found in the first step by solving the inverse image of the quadratic map. In addition, we derive analytic results for some special types of fixed kernels, which provide insights on the input design and also its dependence on the kernel structure.
The present paper deals with online convex optimization involving both time-varying loss functions, and time-varying constraints. The loss functions are not fully accessible to the learner, and instead only the function values (a.k.a. bandit feedback) are revealed at queried points. The constraints are revealed after making decisions, and can be instantaneously violated, yet they must be satisfied in the long term. This setting fits nicely the emerging online network tasks such as fog computing in the Internet-of-Things (IoT), where online decisions must flexibly adapt to the changing user preferences (loss functions), and the temporally unpredictable availability of resources (constraints). Tailored for such human-in-the-loop systems where the loss functions are hard to model, a family of bandit online saddle-point (BanSaP) schemes are developed, which adaptively adjust the online operations based on (possibly multiple) bandit feedback of the loss functions, and the changing environment. Performance here is assessed by: i) dynamic regret that generalizes the widely used static regret; and, ii) fit that captures the accumulated amount of constraint violations. Specifically, BanSaP is proved to simultaneously yield sub-linear dynamic regret and fit, provided that the best dynamic solutions vary slowly over time. Numerical tests in fog computation offloading tasks corroborate that our proposed BanSaP approach offers competitive performance relative to existing approaches that are based on gradient feedback.
Deep neural networks have shown effectiveness in many challenging tasks and proved their strong capability in automatically learning good feature representation from raw input. Nonetheless, designing their architectures still requires much human effort. Techniques for automatically designing neural network architectures such as reinforcement learning based approaches recently show promising results in benchmarks. However, these methods still train each network from scratch during exploring the architecture space, which results in extremely high computational cost. In this paper, we propose a novel reinforcement learning framework for automatic architecture designing, where the action is to grow the network depth or layer width based on the current network architecture with function preserved. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. The experiments on image benchmark datasets have demonstrated the efficiency and effectiveness of our proposed solution compared to existing automatic architecture designing methods.
Jul 18 2017 cs.CV
This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent long-short term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.
Jul 04 2017 cs.SY
The kernel-based regularization method has two core issues: kernel design and hyperparameter estimation. In this paper, we focus on the second issue and study the properties of several hyperparameter estimators including the empirical Bayes (EB) estimator, two Stein's unbiased risk estimators (SURE) and their corresponding Oracle counterparts, with an emphasis on the asymptotic properties of these hyperparameter estimators. To this goal, we first derive and then rewrite the first order optimality conditions of these hyperparameter estimators, leading to several insights on these hyperparameter estimators. Then we show that as the number of data goes to infinity, the two SUREs converge to the best hyperparameter minimizing the corresponding mean square error, respectively, while the more widely used EB estimator converges to another best hyperparameter minimizing the expectation of the EB estimation criterion. This indicates that the two SUREs are asymptotically optimal but the EB estimator is not. Surprisingly, the convergence rate of two SUREs is slower than that of the EB estimator, and moreover, unlike the two SUREs, the EB estimator is independent of the convergence rate of $\Phi^T\Phi/N$ to its limit, where $\Phi$ is the regression matrix and $N$ is the number of data. A Monte Carlo simulation is provided to demonstrate the theoretical results.
Recent advances in neural networks have inspired people to design hybrid recommendation algorithms that can incorporate both (1) user-item interaction information and (2) content information including image, audio, and text. Despite their promising results, neural network-based recommendation algorithms pose extensive computational costs, making it challenging to scale and improve upon. In this paper, we propose a general neural network-based recommendation framework, which subsumes several existing state-of-the-art recommendation algorithms, and address the efficiency issue by investigating sampling strategies in the stochastic gradient descent training for the framework. We tackle this issue by first establishing a connection between the loss functions and the user-item interaction bipartite graph, where the loss function terms are defined on links while major computation burdens are located at nodes. We call this type of loss functions "graph-based" loss functions, for which varied mini-batch sampling strategies can have different computational costs. Based on the insight, three novel sampling strategies are proposed, which can significantly improve the training efficiency of the proposed framework (up to $\times 30$ times speedup in our experiments), as well as improving the recommendation performance. Theoretical analysis is also provided for both the computational cost and the convergence. We believe the study of sampling strategies have further implications on general graph-based loss functions, and would also enable more research under the neural network-based recommendation framework.
Jun 09 2017 cs.CR
Android is designed with a number of built-in security features such as app sandboxing and permission-based access controls. Android supports multiple communication methods for apps to cooperate. This creates a security risk of app collusion. For instance, a sandboxed app with permission to access sensitive data might leak that data to another sandboxed app with access to the internet. In this paper, we present a method to detect potential collusion between apps. First, we extract from apps all information about their accesses to protected resources and communications. Then we identify sets of apps that might be colluding by using rules in first order logic codified in Prolog. After these, more computationally demanding approaches like taint analysis can focus on the identified sets that show collusion potential. This "filtering" approach is validated against a dataset of manually crafted colluding apps. We also demonstrate that our tool scales by running it on a set of more than 50,000 apps collected in the wild. Our tool allowed us to detect a large set of real apps that used collusion as a synchronization method to maximize the effects of a payload that was injected into all of them via the same SDK.
Learning a good representation of text is key to many recommendation applications. Examples include news recommendation where texts to be recommended are constantly published everyday. However, most existing recommendation techniques, such as matrix factorization based methods, mainly rely on interaction histories to learn representations of items. While latent factors of items can be learned effectively from user interaction data, in many cases, such data is not available, especially for newly emerged items. In this work, we aim to address the problem of personalized recommendation for completely new items with text information available. We cast the problem as a personalized text ranking problem and propose a general framework that combines text embedding with personalized recommendation. Users and textual content are embedded into latent feature space. The text embedding function can be learned end-to-end by predicting user interactions with items. To alleviate sparsity in interaction data, and leverage large amount of text data with little or no user interactions, we further propose a joint text embedding model that incorporates unsupervised text embedding with a combination module. Experimental results show that our model can significantly improve the effectiveness of recommendation systems on real-world datasets.
As a cutting-edge technology, microgrids feature intelligent EMSs and sophisticated control, which will dramatically change our energy infrastructure. The modern microgrids are a relatively recent development with high potential to bring distributed generation, DES devices, controllable loads, communication infrastructure, and many new technologies into the mainstream. As a more controllable and intelligent entity, a microgrid has more growth potential than ever before. However, there are still many open questions, such as the future business models and economics. What is the cost-benefit to the end-user? How should we systematically evaluate the potential benefits and costs of control and energy management in a microgrid?
Impressive image captioning results are achieved in domains with plenty of training image and sentence pairs (e.g., MSCOCO). However, transferring to a target domain with significant domain shifts but no paired training data (referred to as cross-domain image captioning) remains largely unexplored. We propose a novel adversarial training procedure to leverage unpaired data in the target domain. Two critic networks are introduced to guide the captioner, namely domain critic and multi-modal critic. The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain. The multi-modal critic assesses whether an image and its generated sentence are a valid pair. During training, the critics and captioner act as adversaries -- captioner aims to generate indistinguishable sentences, whereas critics aim at distinguishing them. The assessment improves the captioner through policy gradient updates. During inference, we further propose a novel critic-based planning method to select high-quality sentences without additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k) as the target domains. Our method consistently performs well on all datasets. In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after adaptation. Utilizing critics during inference further gives another 4.5% boost.
The proliferation of social media in communication and information dissemination has made it an ideal platform for spreading rumors. Automatically debunking rumors at their stage of diffusion is known as \textitearly rumor detection, which refers to dealing with sequential posts regarding disputed factual claims with certain variations and highly textual duplication over time. Thus, identifying trending rumors demands an efficient yet flexible model that is able to capture long-range dependencies among postings and produce distinct representations for the accurate early detection. However, it is a challenging task to apply conventional classification algorithms to rumor detection in earliness since they rely on hand-crafted features which require intensive manual efforts in the case of large amount of posts. This paper presents a deep attention model on the basis of recurrent neural networks (RNN) to learn \textitselectively temporal hidden representations of sequential posts for identifying rumors. The proposed model delves soft-attention into the recurrence to simultaneously pool out distinct features with particular focus and produce hidden representations that capture contextual variations of relevant posts over time. Extensive experiments on real datasets collected from social media websites demonstrate that (1) the deep attention based RNN model outperforms state-of-the-arts that rely on hand-crafted features; (2) the introduction of soft attention mechanism can effectively distill relevant parts to rumors from original posts in advance; (3) the proposed method detects rumors more quickly and accurately than competitors.
Mar 30 2017 cs.CV
In this paper, we propose a novel approach for learning multi-label classifiers with the help of privileged information. Specifically, we use similarity constraints to capture the relationship between available information and privileged information, and use ranking constraints to capture the dependencies among multiple labels. By integrating similarity constraints and ranking constraints into the learning process of classifiers, the privileged information and the dependencies among multiple labels are exploited to construct better classifiers during training. A maximum margin classifier is adopted, and an efficient learning algorithm of the proposed method is also developed. We evaluate the proposed method on two applications: multiple object recognition from images with the help of implicit information about object importance conveyed by the list of manually annotated image tags; and multiple facial action unit detection from low-resolution images augmented by high-resolution images. Experimental results demonstrate that the proposed method can effectively take full advantage of privileged information and dependencies among multiple labels for better object recognition and better facial action unit detection.
Mar 14 2017 cs.SE
Smart contracts are full-fledged programs that run on blockchains (e.g., Ethereum, one of the most popular blockchains). In Ethereum, gas (in Ether, a cryptographic currency like Bitcoin) is the execution fee compensating the computing resources of miners for running smart contracts. However, we find that under-optimized smart contracts cost more gas than necessary, and therefore the creators or users will be overcharged. In this work, we conduct the first investigation on Solidity, the recommended compiler, and reveal that it fails to optimize gas-costly programming patterns. In particular, we identify 7 gas-costly patterns and group them to 2 categories. Then, we propose and develop GASPER, a new tool for automatically locating gas-costly patterns by analyzing smart contracts' bytecodes. The preliminary results on discovering 3 representative patterns from 4,240 real smart contracts show that 93.5%, 90.1% and 80% contracts suffer from these 3 patterns, respectively.
Network resource allocation shows revived popularity in the era of data deluge and information explosion. Existing stochastic optimization approaches fall short in attaining a desirable cost-delay tradeoff. Recognizing the central role of Lagrange multipliers in network resource allocation, a novel learn-and-adapt stochastic dual gradient (LA-SDG) method is developed in this paper to learn the empirical optimal Lagrange multiplier from historical data, and adapt to the upcoming resource allocation strategy. Remarkably, it only requires one more sample (gradient) evaluation than the celebrated stochastic dual gradient (SDG) method. LA-SDG can be interpreted as a foresighted learning approach with an eye on the future, or, a modified heavy-ball approach from an optimization viewpoint. It is established - both theoretically and empirically - that LA-SDG markedly improves the cost-delay tradeoff over state-of-the-art allocation schemes.
Mar 01 2017 cs.CV
Recent progress in computer vision has been dominated by deep neural networks trained with large amount of labeled data. Collecting and annotating such datasets is however a tedious, and in some contexts impossible task; hence a recent surge in approaches that rely solely on synthetically generated data from 3D models for their training. For depth images however, the discrepancies with real scans noticeably affect the performance of such methods. In this paper, we propose an innovative end-to-end framework which simulate the whole mechanism of these devices, synthetically generating realistic depth data from 3D CAD models by comprehensively modeling vital factors such as sensor noise, material reflectance, surface geometry, etc. Besides covering a wider range of sensors than state-of-the-art methods, the proposed one also results in more realistic data. Going further than previous works, we not only qualitatively evaluate the generated scans, but also quantitatively measure through extensive experiments and comparisons how they impact the training of neural network algorithms for different 3D recognition tasks, demonstrating how our pipeline seamlessly integrates such architectures; and how it consistently and significantly enhances their performance-irrespective of the selected feature space or intermediate representations.
Feb 28 2017 cs.SI
Selfies have become increasingly fashionable in the social media era. People are willing to share their selfies in various social media platforms such as Facebook, Instagram and Flicker. The popularity of selfie have caught researchers' attention, especially psychologists. In computer vision and machine learning areas, little attention has been paid to this phenomenon as a valuable data source. In this paper, we focus on exploring the deeper personal patterns behind people's different kinds of selfie-posting behaviours. We develop this work based on a dataset of WeChat, one of the most extensively used instant messaging platform in China. In particular, we first propose an unsupervised approach to classify the images posted by users. Based on the classification result, we construct three types of user-level features that reflect user preference, activity and posting habit. Based on these features, for a series of selfie related tasks, we build classifiers that can accurately predict two sets of users with opposite selfie-posting behaviours. We have found that people's interest, activity and posting habit have a great influence on their selfie-posting behaviours. For example, the classification accuracy between selfie-posting addict and nonaddict reaches 89.36%. We also prove that using user's image information to predict these behaviours achieve better performance than using text information. More importantly, for each set of users with a specific selfie-posting behaviour, we extract and visualize significant personal patterns about them. In addition, we cluster users and extract their high-level attributes, revealing the correlation between these attributes and users' selfie-posting behaviours. In the end, we demonstrate that users' selfie-posting behaviour, as a good predictor, could predict their different preferences toward these high-level attributes accurately.
Jan 25 2017 cs.LO
Behaviour distances to measure the resemblance of two states in a (nondeterministic) fuzzy transition system have been proposed recently in the literature. Such a distance, defined as a pseudo-ultrametric over the state space of the model, provides a quantitative analogue of bisimilarity. In this paper, we focus on the problem of computing these distances. We first extend the definition of the pseudo-ultrametric by introducing discount such that the discounting factor being equal to 1 captures the original definition. We then provide polynomial-time algorithms to calculate the behavioural distances, in both the non-discounted and the discounted setting. The algorithm is strongly polynomial in the former case. Furthermore, we give a polynomial-time algorithm to compute bisimulation over fuzzy transition systems which captures the distance being equal to 0.
Existing approaches to online convex optimization (OCO) make sequential one-slot-ahead decisions, which lead to (possibly adversarial) losses that drive subsequent decision iterates. Their performance is evaluated by the so-called regret that measures the difference of losses between the online solution and the best yet fixed overall solution in hindsight. The present paper deals with online convex optimization involving adversarial loss functions and adversarial constraints, where the constraints are revealed after making decisions, and can be tolerable to instantaneous violations but must be satisfied in the long term. Performance of an online algorithm in this setting is assessed by: i) the difference of its losses relative to the best dynamic solution with one-slot-ahead information of the loss function and the constraint (that is here termed dynamic regret); and, ii) the accumulated amount of constraint violations (that is here termed dynamic fit). In this context, a modified online saddle-point (MOSP) scheme is developed, and proved to simultaneously yield sub-linear dynamic regret and fit, provided that the accumulated variations of per-slot minimizers and constraints are sub-linearly growing with time. MOSP is also applied to the dynamic network resource allocation task, and it is compared with the well-known stochastic dual gradient method. Under various scenarios, numerical experiments demonstrate the performance gain of MOSP relative to the state-of-the-art.
Dec 28 2016 cs.SI
The problem of ideology detection is to study the latent (political) placement for people, which is traditionally studied on politicians according to their voting behaviors. Recently, more and more studies begin to address the ideology detection problem for ordinary users based on their online behaviors that can be captured by social media, e.g., Twitter. As far as we are concerned, however, the vast majority of the existing methods on ideology detection on social media have oversimplified the problem as a binary classification problem (i.e., liberal vs. conservative). Moreover, though social links can play a critical role in deciding one's ideology, most of the existing work ignores the heterogeneous types of links in social media. In this paper we propose to detect \emphnumerical ideology positions for Twitter users, according to their \emphfollow, \emphmention, and \emphretweet links to a selected set of politicians. A unified probabilistic model is proposed that can (1) explain the reasons why links are built among people in terms of their ideology, (2) integrate heterogeneous types of links together in determining people's ideology, and (3) automatically learn the quality of each type of links in deciding one's ideology. Experiments have demonstrated the advantages of our model in terms of both ranking and political leaning classification accuracy. It is shown that (1) using multiple types of links is better than using any single type of links alone to determine one's ideology, and (2) our model is even more superior than baselines when dealing with people that are sparsely linked in one type of links. We also show that the detected ideology for Twitter users aligns with our intuition quite well.
Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies of history words, which are critical for image captioning. The effectiveness of our approach is validated on two datasets MS COCO and Flickr30K. Our extensive experimental results show that our method outperforms the vanilla recurrent neural network based language models and is competitive with the state-of-the-art methods.
Artistic style transfer is an image synthesis problem where the content of an image is reproduced with the style of another. Recent works show that a visually appealing style transfer can be achieved by using the hidden activations of a pretrained convolutional neural network. However, existing methods either apply (i) an optimization procedure that works for any style image but is very expensive, or (ii) an efficient feedforward network that only allows a limited number of trained styles. In this work we propose a simpler optimization objective based on local matching that combines the content structure and style textures in a single layer of the pretrained network. We show that our objective has desirable properties such as a simpler optimization landscape, intuitive parameter tuning, and consistent frame-by-frame performance on video. Furthermore, we use 80,000 natural images and 80,000 paintings to train an inverse network that approximates the result of the optimization. This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images.
Dec 13 2016 cs.SY
There are two key issues for the kernel-based regularization method: one is how to design a suitable kernel to embed in the kernel the prior knowledge of the LTI system to be identified, and the other one is how to tune the kernel such that the resulting regularized impulse response estimator can achieve a good bias-variance tradeoff. In this paper, we focus on the issue of kernel design. Depending on the type of the prior knowledge, we propose two methods to design kernels: one is from a machine learning perspective and the other one is from a system theory perspective. We also provide analysis results for both methods, which not only enhances our understanding for the existing kernels but also directs the design of new kernels.
In this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network. To address the challenges, we propose a task-guided and path-augmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using path-augmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods.
Dec 06 2016 cs.CV
To avoid the exhaustive search over locations and scales, current state-of-the-art object detection systems usually involve a crucial component generating a batch of candidate object proposals from images. In this paper, we present a simple yet effective approach for segmenting object proposals via a deep architecture of recursive neural networks (RNNs), which hierarchically groups regions for detecting object candidates over scales. Unlike traditional methods that mainly adopt fixed similarity measures for merging regions or finding object proposals, our approach adaptively learns the region merging similarity and the objectness measure during the process of hierarchical region grouping. Specifically, guided by a structured loss, the RNN model jointly optimizes the cross-region similarity metric with the region merging process as well as the objectness prediction. During inference of the object proposal generation, we introduce randomness into the greedy search to cope with the ambiguity of grouping regions. Extensive experiments on standard benchmarks, e.g., PASCAL VOC and ImageNet, suggest that our approach is capable of producing object proposals with high recall while well preserving the object boundaries and outperforms other existing methods in both accuracy and efficiency.
Sentiment analysis is crucial for extracting social signals from social media content. Due to the prevalence of images in social media, image sentiment analysis is receiving increasing attention in recent years. However, most existing systems are black-boxes that do not provide insight on how image content invokes sentiment and emotion in the viewers. Psychological studies have confirmed that salient objects in an image often invoke emotions. In this work, we investigate more fine-grained and more comprehensive interaction between visual saliency and visual sentiment. In particular, we partition images in several primary scene-type dimensions, including: open-closed, natural-manmade, indoor-outdoor, and face-noface. Using state of the art saliency detection algorithm and sentiment classification algorithm, we examine how the sentiment of the salient region(s) in an image relates to the overall sentiment of the image. The experiments on a representative image emotion dataset have shown interesting correlation between saliency and sentiment in different scene types and in turn shed light on the mechanism of visual sentiment evocation.
We propose a scalable approach to learn video-based question answering (QA): answer a "free-form natural language question" about a video content. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended fromMN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.
Nov 10 2016 cs.CL
Word embeddings are now ubiquitous forms of word representation in natural language processing. There have been applications of word embeddings for monolingual word sense disambiguation (WSD) in English, but few comparisons have been done. This paper attempts to bridge that gap by examining popular embeddings for the task of monolingual English WSD. Our simplified method leads to comparable state-of-the-art performance without expensive retraining. Cross-Lingual WSD - where the word senses of a word in a source language e come from a separate target translation language f - can also assist in language learning; for example, when providing translations of target vocabulary for learners. Thus we have also applied word embeddings to the novel task of cross-lingual WSD for Chinese and provide a public dataset for further benchmarking. We have also experimented with using word embeddings for LSTM networks and found surprisingly that a basic LSTM network does not work well. We discuss the ramifications of this outcome.
Oct 14 2016 cs.NI
Wireless object tracking applications are gaining popularity and will soon utilize emerging ultra-low-power device-to-device communication. However, severe energy constraints require much more careful accounting of energy usage than what prior art provides. In particular, the available energy, the differing power consumption levels for listening, receiving, and transmitting, as well as the limited control bandwidth must all be considered. Therefore, we formulate the problem of maximizing the throughput among a set of heterogeneous broadcasting nodes with differing power consumption levels, each subject to a strict ultra-low-power budget. We obtain the oracle throughput (i.e., maximum throughput achieved by an oracle) and use Lagrangian methods to design EconCast - a simple asynchronous distributed protocol in which nodes transition between sleep, listen, and transmit states, and dynamically change the transition rates. EconCast can operate in groupput or anyput modes to respectively maximize two alternative throughput measures. We show that EconCast approaches the oracle throughput. The performance is also evaluated numerically and via extensive simulations and it is shown that EconCast outperforms prior art by 6x - 17x under realistic assumptions. Moreover, we evaluate EconCast's latency performance and consider design tradeoffs when operating in groupput and anyput modes. Finally, we implement EconCast using the TI eZ430-RF2500-SEH energy harvesting nodes and experimentally show that in realistic environments it obtains 57% - 77% of the achievable throughput.
Existing approaches to resource allocation for nowadays stochastic networks are challenged to meet fast convergence and tolerable delay requirements. The present paper leverages online learning advances to facilitate stochastic resource allocation tasks. By recognizing the central role of Lagrange multipliers, the underlying constrained optimization problem is formulated as a machine learning task involving both training and operational modes, with the goal of learning the sought multipliers in a fast and efficient manner. To this end, an order-optimal offline learning approach is developed first for batch training, and it is then generalized to the online setting with a procedure termed learn-and-adapt. The novel resource allocation protocol permeates benefits of stochastic approximation and statistical learning to obtain low-complexity online updates with learning errors close to the statistical accuracy limits, while still preserving adaptation performance, which in the stochastic network optimization context guarantees queue stability. Analysis and simulated tests demonstrate that the proposed data-driven approach improves the delay and convergence performance of existing resource allocation schemes.
Oct 07 2016 cs.IR
We propose a framework for discriminative Information Retrieval (IR) atop linguistic features, trained to improve the recall of tasks such as answer candidate passage retrieval, the initial step in text-based Question Answering (QA). We formalize this as an instance of linear feature-based IR (Metzler and Croft, 2007), illustrating how a variety of knowledge discovery tasks are captured under this approach, leading to a 44% improvement in recall for candidate triage for QA.
Oct 04 2016 cs.LO
Subtyping in concurrency has been extensively studied since early 1990s as one of the most interesting issues in type theory. The correctness of subtyping relations has been usually provided as the soundness for type safety. The converse direction, the completeness, has been largely ignored in spite of its usefulness to define the largest subtyping relation ensuring type safety. This paper formalises preciseness (i.e. both soundness and completeness) of subtyping for mobile processes and studies it for the synchronous and the asynchronous session calculi. We first prove that the well-known session subtyping, the branching-selection subtyping, is sound and complete for the synchronous calculus. Next we show that in the asynchronous calculus, this subtyping is incomplete for type-safety: that is, there exist session types T and S such that T can safely be considered as a subtype of S, but T < S is not derivable by the subtyping. We then propose an asynchronous subtyping system which is sound and complete for the asynchronous calculus. The method gives a general guidance to design rigorous channel-based subtypings respecting desired safety properties. Both the synchronous and the asynchronous calculus are first considered with lin ear channels only, and then they are extended with session initialisations and c ommunications of expressions (including shared channels).
Autoscaling system can reconfigure cloud-based applications and services, through various cloud software configurations and hardware provisioning, to adapt to the changing environment at runtime. Such a behaviour offers the foundation to achieve elasticity in modern cloud computing paradigm. Given the importance of autoscaling in cloud, computational intelligence has been widely applied for engineering autoscaling system, leading to self-aware, self-adaptive and more dependable runtime scaling. In this paper, we present the brief background and history for autoscaling in the cloud, as well as their associations with self-awareness and self-adaptivity of a system. Subsequently, we conduct detailed survey and taxonomy of the key related work and identify the gaps in this area of research.
The read channel of a Flash memory cell degrades after repetitive program and erase (P/E) operations. This degradation is often modeled as a function of the number of P/E cycles. In contrast, this paper models the degradation as a function of the cumulative effect of the charge written and erased from the cell. Based on this modeling approach, this paper dynamically allocates voltage using lower-voltage write thresholds at the beginning of the device lifetime and increasing the thresholds as needed to maintain the mutual information of the read channel in the face of degradation. The paper introduces the technique in an idealized setting and then removes ideal assumptions about channel knowledge and available voltage resolution to conclude with a practical scheme with performance close to that of the idealized setting.
Sep 01 2016 cs.SE
Self-adaptive software (SAS) can reconfigure itself to adapt to the changing environment at runtime. Such a behavior permits continual optimization on various conflicting non- functional objectives, e.g., response time and energy consumption. In this paper, we present FEMOSAA, a novel framework that automatically synergizes the feature model and Multi-Objective Evolutionary Algorithm (MOEA), to optimize SAS at runtime. At design time, FEMOSAA automatically transposes the design of SAS, which is expressed as a feature model, to the chromosome representation and the reproduction operators (mutation and crossover) in MOEA. At runtime, the feature model serves as the domain knowledge to guide the search, providing more chances to find better solutions. In addition, we have designed a new method to search for the knee solutions, which can achieve balanced trade-off. We experimentally compare FEMOSAA with different variants and state-of-the-art approaches on a real world SAS. The results reveal its effectiveness and superiority over the others.
Anomaly detection plays an important role in modern data-driven security applications, such as detecting suspicious access to a socket from a process. In many cases, such events can be described as a collection of categorical values that are considered as entities of different types, which we call heterogeneous categorical events. Due to the lack of intrinsic distance measures among entities, and the exponentially large event space, most existing work relies heavily on heuristics to calculate abnormal scores for events. Different from previous work, we propose a principled and unified probabilistic model APE (Anomaly detection via Probabilistic pairwise interaction and Entity embedding) that directly models the likelihood of events. In this model, we embed entities into a common latent space using their observed co-occurrence in different events. More specifically, we first model the compatibility of each pair of entities according to their embeddings. Then we utilize the weighted pairwise interactions of different entity types to define the event probability. Using Noise-Contrastive Estimation with "context-dependent" noise distribution, our model can be learned efficiently regardless of the large event space. Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques.
A great video title describes the most salient event compactly and captures the viewer's attention. In contrast, video captioning tends to generate sentences that describe the video as a whole. Although generating a video title automatically is a very useful task, it is much less addressed than video captioning. We address video title generation for the first time by proposing two methods that extend state-of-the-art video captioners to this new task. First, we make video captioners highlight sensitive by priming them with a highlight detector. Our framework allows for jointly training a model for title generation and video highlight localization. Second, we induce high sentence diversity in video captioners, so that the generated titles are also diverse and catchy. This means that a large number of sentences might be required to learn the sentence structure of titles. Hence, we propose a novel sentence augmentation method to train a captioner with additional sentence-only examples that come without corresponding videos. We collected a large-scale Video Titles in the Wild (VTW) dataset of 18100 automatically crawled user-generated videos and titles. On VTW, our methods consistently improve title prediction accuracy, and achieve the best performance in both automatic and human evaluation. Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.
Elasticity in the cloud is often achieved by on-demand autoscaling. In such context, the goal is to optimize the Quality of Service (QoS) and cost objectives for the cloud-based services. However, the difficulty lies in the facts that these objectives, e.g., throughput and cost, can be naturally conflicted, and the QoS of cloud-based services often interfere due to the shared infrastructure in cloud. Consequently, dynamic and effective trade-off decision making of autoscaling in the cloud is necessary, yet challenging. In particular, it is even harder to achieve well-compromised trade-offs, where the decision largely improves the majority of the objectives, while causing relatively small degradations to others. In this paper, we present a self-adaptive decision making approach for autoscaling in the cloud. It is capable to adaptively produce autoscaling decisions that lead to well-compromised trade-offs without heavy human intervention. We leverage on ant colony inspired multi-objective optimization for searching and optimizing the trade-offs decisions, the result is then filtered by compromise-dominance, a mechanism that extracts the decisions with balanced improvements in the trade-offs. We experimentally compare our approach to four state-of-the-arts autoscaling approaches: rule, heuristic, randomized and multi-objective genetic algorithm based solutions. The results reveal the effectiveness of our approach over the others, including better quality of trade-offs and significantly smaller violation of the requirements.
Aug 23 2016 cs.SE
Elastic autoscaling is the fundamental mechanism that enables the cloud-based services to continually evolve themselves - through changing the related software configurations and hardware resource provisions - under time-varying workloads. However, given the increasingly complex dynamic, uncertainty and trade-offs related to the runtime QoS and cost/energy of services, cloud autoscaling system is becoming one of the most complex artifacts constructed by human and thus its effectiveness is difficult to be preserved. In this article, we present novel ideas for facilitating cloud autoscaling. Our hypothesis that cloud ecosystem, represented by a collection of cloud-based services, bears many similarities with the natural ecosystem. As such, we in- tend to investigate how ecological view can be adopted to better explain how the cloud-based services evolve, and to explore what are the key factors that drive stable and sustainable cloud-based services in the cloud. To achieve this goal, we aim to transpose ecological principles, theories and models into cloud autoscaling analogues and spontaneously improve long-term stability and sustainability of cloud ecosystem.
Modern Internet services are increasingly leveraging on cloud computing for flexible, elastic and on-demand provision. Typically, Quality of Service (QoS) of cloud-based services can be tuned using different underlying cloud configurations and resources, e.g., number of threads, CPU and memory etc., which are shared, leased and priced as utilities. This benefit is fundamentally grounded by autoscaling: an automatic and elastic process that adapts cloud configurations on-demand according to time-varying workloads. This thesis proposes a holistic cloud autoscaling framework to effectively and seamlessly address existing challenges related to different logical aspects of autoscaling, including architecting autoscaling system, modelling the QoS of cloud-based service, determining the granularity of control and deciding trade-off autoscaling decisions. The framework takes advantages of the principles of self-awareness and the related algorithms to adaptively handle the dynamics, uncertainties, QoS interference and trade-offs on objectives that are exhibited in the cloud. The major benefit is that, by leveraging the framework, cloud autoscaling can be effectively achieved without heavy human analysis and design time knowledge. Through conducting various experiments using RUBiS benchmark and realistic workload on real cloud setting, this thesis evaluates the effectiveness of the framework based on various quality indicators and compared with other state-of-the-art approaches.
Classical multiuser information theory studies the fundamental limits of models with a fixed (often small) number of users as the coding blocklength goes to infinity. This work proposes a new paradigm, referred to as \em many-user information theory, where the number of users is allowed to grow with the blocklength. This paradigm is motivated by emerging systems with a massive number of users in an area, such as machine-to-machine communication systems and sensor networks. The focus of the current paper is the \em many-access channel model, which consists of a single receiver and many transmitters, whose number increases unboundedly with the blocklength. Moreover, an unknown subset of transmitters may transmit in a given block and need to be identified. A new notion of capacity is introduced and characterized for the Gaussian many-access channel with random user activities. The capacity can be achieved by first detecting the set of active users and then decoding their messages.
Jun 16 2016 cs.CV
We present a new algorithm for multi-region segmentation of 2D images with objects that may partially occlude each other. Our algorithm is based on the observation hat human performance on this task is based both on prior knowledge about plausible shapes and taking into account the presence of occluding objects whose shape is already known - once an occluded region is identified, the shape prior can be used to guess the shape of the missing part. We capture the former aspect using a deep learning model of shape; for the latter, we simultaneously minimize the energy of all regions and consider only unoccluded pixels for data agreement. Existing algorithms incorporating object shape priors consider every object separately in turn and can't distinguish genuine deviation from the expected shape from parts missing due to occlusion. We show that our method significantly improves on the performance of a representative algorithm, as evaluated on both preprocessed natural and synthetic images. Furthermore, on the synthetic images, we recover the ground truth segmentation with good accuracy.
Subsurface applications including geothermal, geological carbon sequestration, oil and gas, etc., typically involve maximizing either the extraction of energy or the storage of fluids. Characterizing the subsurface is extremely complex due to heterogeneity and anisotropy. Due to this complexity, there are uncertainties in the subsurface parameters, which need to be estimated from multiple diverse as well as fragmented data streams. In this paper, we present a non-intrusive sequential inversion framework, for integrating data from geophysical and flow sources to constraint subsurface Discrete Fracture Networks (DFN). In this approach, we first estimate bounds on the statistics for the DFN fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained based on the flow data. The efficacy of this multi-physics based sequential inversion is demonstrated through a representative synthetic example.
The impact of culture in visual emotion perception has recently captured the attention of multimedia research. In this study, we pro- vide powerful computational linguistics tools to explore, retrieve and browse a dataset of 16K multilingual affective visual concepts and 7.3M Flickr images. First, we design an effective crowdsourc- ing experiment to collect human judgements of sentiment connected to the visual concepts. We then use word embeddings to repre- sent these concepts in a low dimensional vector space, allowing us to expand the meaning around concepts, and thus enabling insight about commonalities and differences among different languages. We compare a variety of concept representations through a novel evaluation task based on the notion of visual semantic relatedness. Based on these representations, we design clustering schemes to group multilingual visual concepts, and evaluate them with novel metrics based on the crowdsourced sentiment annotations as well as visual semantic relatedness. The proposed clustering framework enables us to analyze the full multilingual dataset in-depth and also show an application on a facial data subset, exploring cultural in- sights of portrait-related affective visual concepts.
May 27 2016 cs.CV
Palm vein recognition is a novel biometric identification technology. But how to gain a better vein extraction result from the raw palm image is still a challenging problem, especially when the raw data collection has the problem of asymmetric illumination. This paper proposes a method based on single scale Retinex algorithm to extract palm vein image when strong shadow presents due to asymmetric illumination and uneven geometry of the palm. We test our method on a multispectral palm image. The experimental result shows that the proposed method is robust to the influence of illumination angle and shadow. Compared to the traditional extraction methods, the proposed method can obtain palm vein lines with better visualization performance (the contrast ratio increases by 18.4%, entropy increases by 1.07%, and definition increases by 18.8%).
Apr 22 2016 cs.LG
We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper bound of the GPU memory, our algorithm allows deeper and more complex models to be explored, and helps advance the innovations in deep learning research. We focus on reducing the memory cost to store the intermediate feature maps and gradients during training. Computation graph analysis is used for automatic in-place operation and memory sharing optimizations. We show that it is possible to trade computation for memory - giving a more memory efficient training algorithm with a little extra computation cost. In the extreme case, our analysis also shows that the memory consumption can be reduced to O(log n) with as little as O(n log n) extra cost for forward computation. Our experiments show that we can reduce the memory cost of a 1,000-layer deep residual network from 48G to 7G with only 30 percent additional running time cost on ImageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences.
In this paper, we discuss the outer-synchronization of the asymmetrically connected recurrent time-varying neural networks. By both centralized and decentralized discretization data sampling principles, we derive several sufficient conditions based on diverse vector norms that guarantee that any two trajectories from different initial values of the identical neural network system converge together. The lower bounds of the common time intervals between data samples in centralized and decentralized principles are proved to be positive, which guarantees exclusion of Zeno behavior. A numerical example is provided to illustrate the efficiency of the theoretical results.
In this paper, we investigate stability of a class of analytic neural networks with the synaptic feedback via event-triggered rules. This model is general and include Hopfield neural network as a special case. These event-trigger rules can efficiently reduces loads of computation and information transmission at synapses of the neurons. The synaptic feedback of each neuron keeps a constant value based on the outputs of the other neurons at its latest triggering time but changes at its next triggering time, which is determined by certain criterion. It is proved that every trajectory of the analytic neural network converges to certain equilibrium under this event-triggered rule for all initial values except a set of zero measure. The main technique of the proof is the Lojasiewicz inequality to prove the finiteness of trajectory length. The realization of this event-triggered rule is verified by the exclusion of Zeno behaviors. Numerical examples are provided to illustrate the efficiency of the theoretical results.
Boolean satisfiability (SAT) has an extensive application domain in computer science, especially in electronic design automation applications. Circuit synthesis, optimization, and verification problems can be solved by transforming original problems to SAT problems. However, the SAT problem is known as NP-complete, which means there is no efficient method to solve it. Therefore, an efficient SAT solver to enhance the performance is always desired. We propose a hardware acceleration method for SAT problems. By surveying the properties of SAT problems and the decoding of low-density parity-check (LDPC) codes, a special class of error-correcting codes, we discover that both of them are constraint satisfaction problems. The belief propagation algorithm has been successfully applied to the decoding of LDPC, and the corresponding decoder hardware designs are extensively studied. Therefore, we proposed a belief propagation based algorithm to solve SAT problems. With this algorithm, the SAT solver can be accelerated by hardware. A software simulator is implemented to verify the proposed algorithm and the performance improvement is estimated. Our experiment results show that time complexity does not increase with the size of SAT problems and the proposed method can achieve at least 30x speedup compared to MiniSat.
Mar 10 2016 cs.LG
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Android OS supports multiple communication methods between apps. This opens the possibility to carry out threats in a collaborative fashion, c.f. the Soundcomber example from 2011. In this paper we provide a concise definition of collusion and report on a number of automated detection approaches, developed in co-operation with Intel Security.
Increasing threats of global warming and climate changes call for an energy-efficient and sustainable design of future wireless communication systems. To this end, a novel two-scale stochastic control framework is put forth for smart-grid powered coordinated multi-point (CoMP) systems. Taking into account renewable energy sources (RES), dynamic pricing, two-way energy trading facilities and imperfect energy storage devices, the energy management task is formulated as an infinite-horizon optimization problem minimizing the time-average energy transaction cost, subject to the users' quality of service (QoS) requirements. Leveraging the Lyapunov optimization approach as well as the stochastic subgradient method, a two-scale online control (TS-OC) approach is developed for the resultant smart-grid powered CoMP systems. Using only historical data, the proposed TS-OC makes online control decisions at two timescales, and features a provably feasible and asymptotically near-optimal solution. Numerical tests further corroborate the theoretical analysis, and demonstrate the merits of the proposed approach.
Feb 16 2016 cs.CV
Maximally stable extremal regions (MSER), which is a popular method to generate character proposals/candidates, has shown superior performance in scene text detection. However, the pixel-level operation limits its capability for handling some challenging cases (e.g., multiple connected characters, separated parts of one character and non-uniform illumination). To better tackle these cases, we design a character proposal network (CPN) by taking advantage of the high capacity and fast computing of fully convolutional network (FCN). Specifically, the network simultaneously predicts characterness scores and refines the corresponding locations. The characterness scores can be used for proposal ranking to reject non-character proposals and the refining process aims to obtain the more accurate locations. Furthermore, considering the situation that different characters have different aspect ratios, we propose a multi-template strategy, designing a refiner for each aspect ratio. The extensive experiments indicate our method achieves recall rates of 93.88%, 93.60% and 96.46% on ICDAR 2013, SVT and Chinese2k datasets respectively using less than 1000 proposals, demonstrating promising performance of our character proposal network.
Jan 26 2016 cs.NI
Object tracking applications are gaining popularity and will soon utilize Energy Harvesting (EH) low-power nodes that will consume power mostly for Neighbor Discovery (ND) (i.e., identifying nodes within communication range). Although ND protocols were developed for sensor networks, the challenges posed by emerging EH low-power transceivers were not addressed. Therefore, we design an ND protocol tailored for the characteristics of a representative EH prototype: the TI eZ430-RF2500-SEH. We present a generalized model of ND accounting for unique prototype characteristics (i.e., energy costs for transmission/reception, and transceiver state switching times/costs). Then, we present the Power Aware Neighbor Discovery Asynchronously (Panda) protocol in which nodes transition between the sleep, receive, and transmit states. We analyze \name and select its parameters to maximize the ND rate subject to a homogeneous power budget. We also present Panda-D, designed for non-homogeneous EH nodes. We perform extensive testbed evaluations using the prototypes and study various design tradeoffs. We demonstrate a small difference (less then 2%) between experimental and analytical results, thereby confirming the modeling assumptions. Moreover, we show that Panda improves the ND rate by up to 3x compared to related protocols. Finally, we show that Panda-D operates well under non-homogeneous power harvesting.
One common trend in image tagging research is to focus on visually relevant tags, and this tends to ignore the personal and social aspect of tags, especially on photoblogging websites such as Flickr. Previous work has correctly identified that many of the tags that users provide on images are not visually relevant (i.e. representative of the salient content in the image) and they go on to treat such tags as noise, ignoring that the users chose to provide those tags over others that could have been more visually relevant. Another common assumption about user generated tags for images is that the order of these tags provides no useful information for the prediction of tags on future images. This assumption also tends to define usefulness in terms of what is visually relevant to the image. For general tagging or labeling applications that focus on providing visual information about image content, these assumptions are reasonable, but when considering personalized image tagging applications, these assumptions are at best too rigid, ignoring user choice and preferences. We challenge the aforementioned assumptions, and provide a machine learning approach to the problem of personalized image tagging with the following contributions: 1.) We reformulate the personalized image tagging problem as a search/retrieval ranking problem, 2.) We leverage the order of tags, which does not always reflect visual relevance, provided by the user in the past as a cue to their tag preferences, similar to click data, 3.) We propose a technique to augment sparse user tag data (semi-supervision), and 4.) We demonstrate the efficacy of our method on a subset of Flickr images, showing improvement over previous state-of-art methods.
What makes a person pick certain tags over others when tagging an image? Does the order that a person presents tags for a given image follow an implicit bias that is personal? Can these biases be used to improve existing automated image tagging systems? We show that tag ordering, which has been largely overlooked by the image tagging community, is an important cue in understanding user tagging behavior and can be used to improve auto-tagging systems. Inspired by the assumption that people order their tags, we propose a new way of measuring tag preferences, and also propose a new personalized tagging objective function that explicitly considers a user's preferred tag orderings. We also provide a (partially) greedy algorithm that produces good solutions to our new objective and under certain conditions produces an optimal solution. We validate our method on a subset of Flickr images that spans 5000 users, over 5200 tags, and over 90,000 images. Our experiments show that exploiting personalized tag orders improves the average performance of state-of-art approaches both on per-image and per-user bases.
A large amount of research activity in power systems areas has focused on developing computational methods to solve load flow equations where a key question is the maximum number of isolated solutions.Though several concrete upper bounds exist, recent studies have hinted that much sharper upper bounds that depend the topology of underlying power networks may exist. This paper establishes such a topology dependent solution bound which is actually the best possible bound in the sense that it is always attainable. We also develop a geometric construction called adjacency polytope which accurately captures the topology of the underlying power network and is immensely useful in the computation of the solution bound. Finally we highlight the significant implications of the development of such solution bound in solving load flow equations.
MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters. This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion. Our preliminary experiments reveal promising results on large scale deep neural network applications using multiple GPU machines.
Nov 19 2015 cs.LG
We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scratch. Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network. Our techniques are based on the concept of function-preserving transformations between neural network specifications. This differs from previous approaches to pre-training that altered the function represented by a neural net when adding layers to it. Using our knowledge transfer mechanism to add depth to Inception modules, we demonstrate a new state of the art accuracy rating on the ImageNet dataset.
Nov 17 2015 cs.CV
Autonomous indoor navigation of Micro Aerial Vehicles (MAVs) possesses many challenges. One main reason is that GPS has limited precision in indoor environments. The additional fact that MAVs are not able to carry heavy weight or power consuming sensors, such as range finders, makes indoor autonomous navigation a challenging task. In this paper, we propose a practical system in which a quadcopter autonomously navigates indoors and finds a specific target, i.e., a book bag, by using a single camera. A deep learning model, Convolutional Neural Network (ConvNet), is used to learn a controller strategy that mimics an expert pilot's choice of action. We show our system's performance through real-time experiments in diverse indoor locations. To understand more about our trained network, we use several visualization techniques.
Nov 16 2015 cs.CV
Salient object detection increasingly receives attention as an important component or step in several pattern recognition and image processing tasks. Although a variety of powerful saliency models have been intensively proposed, they usually involve heavy feature (or model) engineering based on priors (or assumptions) about the properties of objects and backgrounds. Inspired by the effectiveness of recently developed feature learning, we provide a novel Deep Image Saliency Computing (DISC) framework for fine-grained image saliency computing. In particular, we model the image saliency from both the coarse- and fine-level observations, and utilize the deep convolutional neural network (CNN) to learn the saliency representation in a progressive manner. Specifically, our saliency model is built upon two stacked CNNs. The first CNN generates a coarse-level saliency map by taking the overall image as the input, roughly identifying saliency regions in the global context. Furthermore, we integrate superpixel-based local context information in the first CNN to refine the coarse-level saliency map. Guided by the coarse saliency map, the second CNN focuses on the local context to produce fine-grained and accurate saliency map while preserving object details. For a testing image, the two CNNs collaboratively conduct the saliency computing in one shot. Our DISC framework is capable of uniformly highlighting the objects-of-interest from complex background while preserving well object details. Extensive experiments on several standard benchmarks suggest that DISC outperforms other state-of-the-art methods and it also generalizes well across datasets without additional training. The executable version of DISC is available online: http://vision.sysu.edu.cn/projects/DISC.
Nov 02 2015 cs.SY
This paper proposes a new time-scaling approach for computational optimal control of a distributed parameter system governed by the Saint-Venant PDEs. We propose the time-scaling approach, which can change a uniform time partition to a nonuniform one. We also derive the gradient formulas by using the variational method. Then the method of lines (MOL) is applied to compute the Saint-Venant PDEs after implementing the time-scaling transformation and the associate costate PDEs. Finally, we compare the optimization results using the proposed time-scaling approach with the one not using it. The simulation result demonstrates the effectiveness of the proposed time-scaling method.
Using a SAT-solver on top of a partial previously-known solution we improve the upper bound of the packing chromatic number of the infinite square lattice from 17 to 15. We discuss the merits of SAT-solving for this kind of problem as well as compare the performance of different encodings. Further, we improve the lower bound from 12 to 13 again using a SAT-solver, demonstrating the versatility of this technology for our approach.
In this paper, the fixed-time cluster synchronization problem for complex networks via pinning control is discussed. Fixed-time synchronization has been a hot topic in recent years, which means that the network can achieve synchronization in finite-time and the settling time is bounded by a constant for any initial values. To realize the fixed-time cluster synchronization, a simple distributed protocol by pinning control technique is designed, whose validity is rigorously proved, and some sufficient criteria for fixed-time cluster synchronization are also obtained. Especially, when the cluster number is one, the cluster synchronization becomes the complete synchronization problem; when the intrinsic dynamics for each node is missed, the fixed-time cluster synchronization becomes the fixed-time cluster (or complete) consensus problem; when the network has only one node, the coupling term between nodes will disappear, and the synchronization problem becomes the simplest master-slave case, which also includes the stability problem for nonlinear systems like neural networks. All these cases are also discussed. Finally, numerical simulations are presented to demonstrate the correctness of obtained theoretical results.