Apr 20 2018 cs.RO
In this paper, we studied a line-based SLAM method for road structure mapping using multi-beam LiDAR. We propose to use the polyline as the basic mapping element instead of grid cell or point cloud, because the line-based representation is precise and lightweight, and it can directly generate vector-based HD map as demanded by autonomous driving systems. We explored: 1) The extraction and vectorization of road structures based on local probabilistic fusion. 2) The efficient line-based matching between frames of vectorized road structures. A specified road structure, the road boundary, is taken as an example. The results testified the feasibility and effectiveness of the proposed method. We applied our proposed mapping system in three different scenes and achieved the average absolute matching error of 0.07m, the average relative matching error of 8.64%.
The automatic parking is being massively developed by car manufacturers and providers. The key to this system is the robust detection of parking slots and road structures, such as lane markings. In this paper, we proposed an HFCN-based segmentation method for parking slot and lane markings in a panoramic surround view (PSV) dataset. A surround view image is made of four calibrated images captured from four fisheye cameras. We collect and label more than 4,200 surround view images for this task, which contain various illuminated scenes of different types of parking slots. A VH-HFCN network is proposed, which adopts a highly fused convolutional network (HFCN) as the base, with an extra efficient VH-stage for better segmenting various markings. The VH-stage consists of two independent linear convolution paths with vertical and horizontal convolution kernels respectively. This modification enables the network to robustly and precisely extract linear features. We evaluate our model on the PSV dataset and the results show outstanding performance in ground markings segmentation. Based on the segmented markings, parking slots and lanes are acquired by skeletonization, hough line transform and line arrangement.
We introduce a new benchmark, WinoBias, for coreference resolution focused on gender bias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing coreference benchmark datasets. Our dataset and code are available at http://winobias.org.
Apr 18 2018 cs.RO
TiEV is an autonomous driving platform implemented by Tongji University of China. The vehicle is drive-by-wire and is fully powered by electricity. We devised the software system of TiEV from scratch, which is capable of driving the vehicle autonomously in urban paths as well as on fast express roads. We describe our whole system, especially novel modules of probabilistic perception fusion, incremental mapping, the 1st and the 2nd planning and the overall safety concern. TiEV finished 2016 and 2017 Intelligent Vehicle Future Challenge of China held at Changshu. We show our experiences on the development of autonomous vehicles and future trends.
Apr 18 2018 cs.CV
We propose DoubleFusion, a new real-time system that combines volumetric dynamic reconstruction with data-driven template fitting to simultaneously reconstruct detailed geometry, non-rigid motion and the inner human body shape from a single depth camera. One of the key contributions of this method is a double layer representation consisting of a complete parametric body shape inside, and a gradually fused outer surface layer. A pre-defined node graph on the body surface parameterizes the non-rigid deformations near the body, and a free-form dynamically changing graph parameterizes the outer surface layer far from the body, which allows more general reconstruction. We further propose a joint motion tracking method based on the double layer representation to enable robust and fast motion tracking performance. Moreover, the inner body shape is optimized online and forced to fit inside the outer surface layer. Overall, our method enables increasingly denoised, detailed and complete surface reconstructions, fast motion tracking performance and plausible inner body shape reconstruction in real-time. In particular, experiments show improved fast motion tracking and loop closure performance on more challenging scenarios.
Apr 13 2018 cs.CV
Modern object detectors usually suffer from low accuracy issues, as foregrounds always drown in tons of backgrounds and become hard examples during training. Compared with those proposal-based ones, real-time detectors are in far more serious trouble since they renounce the use of region-proposing stage which is used to filter a majority of backgrounds for achieving real-time rates. Though foregrounds as hard examples are in urgent need of being mined from tons of backgrounds, a considerable number of state-of-the-art real-time detectors, like YOLO series, have yet to profit from existing hard example mining methods, as using these methods need detectors fit series of prerequisites. In this paper, we propose a general hard example mining method named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples. By using LRM, some elements representing easy examples in final feature map are filtered and detectors are forced to concentrate on hard examples during training. Extensive experiments validate the effectiveness of our method. With our method, the improvements of YOLOv2 detector on auto-driving related dataset KITTI and more general dataset PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.
Apr 11 2018 cs.CY
Third party tracking allows companies to identify users and track their behaviour across multiple digital services. This paper presents an empirical study of the prevalence of third-party trackers on 959,000 apps from the US and UK Google Play stores. We find that most apps contain third party tracking, and the distribution of trackers is long-tailed with several highly dominant trackers accounting for a large portion of the coverage. The extent of tracking also differs between categories of apps; in particular, news apps and apps targeted at children appear to be amongst the worst in terms of the number of third party trackers associated with them. Third party tracking is also revealed to be a highly trans-national phenomenon, with many trackers operating in jurisdictions outside the EU. Based on these findings, we draw out some significant legal compliance challenges facing the tracking industry.
Apr 11 2018 cs.CV
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification and autonomous driving, etc. To this end, models need to comprehensively perceive the semantic information and the differences between instances in a multi-human image, which is recently defined as the multi-human parsing task. In this paper, we present a new large-scale database "Multi-Human Parsing (MHP)" for algorithm development and evaluation, and advances the state-of-the-art in understanding humans in crowded scenes. MHP contains 25,403 elaborately annotated images with 58 fine-grained semantic category labels, involving 2-26 persons per image and captured in real-world scenes from various viewpoints, poses, occlusion, interactions and background. We further propose a novel deep Nested Adversarial Network (NAN) model for multi-human parsing. NAN consists of three Generative Adversarial Network (GAN)-like sub-nets, respectively performing semantic saliency prediction, instance-agnostic parsing and instance-aware clustering. These sub-nets form a nested structure and are carefully designed to learn jointly in an end-to-end way. NAN consistently outperforms existing state-of-the-art solutions on our MHP and several other datasets, and serves as a strong baseline to drive the future research for multi-human parsing.
The workforce remains the most basic element of social production, even in modern societies. Its migration, especially for developing economies such as China, plays a strong role in the reallocation of productive resources and offers a promising proxy for understanding socio-economic issues. Nevertheless, due to long cycle, expensive cost and coarse granularity, conventional surveys face challenges in comprehensively profiling it. With the permeation of smart and mobile devices in recent decades, booming social media has essentially broken spatio-temporal constraints and enabled the continuous sensing of the real-time mobility of massive numbers of individuals. In this study, we demonstrate that similar to a natural shock, the Spring Festival culturally drives workforce travel between workplaces and hometowns, and the trajectory footprints from social media therefore open a window with unparalleled richness and fine granularity to explore laws in national-level workforce migration. To understand the core driving forces of workforce migration flux between cities in China, various indicators reflecting the benefits and costs of migration are introduced into our prediction model. We find that urban GDP (gross domestic product) and travel time are two excellent indicators to help make predictions. Diverse migration patterns are then revealed by clustering the trajectories, which give important clues to help understand the different roles of Chinese cities in their own development and in regional economic development. These patterns are further explained as a joint effect of the subjective will to seek personal benefits and the capacity requirements of local labour markets. Our study implies that the non-negligible entanglement between social media and macroeconomic behaviours can be insightful for policymaking in social-economic issues.
Mar 21 2018 cs.CV
Segmentation of histological images is one of the most crucial tasks for many biomedical analyses including quantification of certain tissue type. However, challenges are posed by high variability and complexity of structural features in such images, in addition to imaging artifacts. Further, the conventional approach of manual thresholding is labor-intensive, and highly sensitive to inter- and intra-image intensity variations. An accurate and robust automated segmentation method is of high interest. We propose and evaluate an elegant convolutional neural network (CNN) designed for segmentation of histological images, particularly those with Masson's trichrome stain. The network comprises of 11 successive convolutional - rectified linear unit - batch normalization layers, and outperformed state-of-the-art CNNs on a dataset of cardiac histological images (labeling fibrosis, myocytes, and background) with a Dice similarity coefficient of 0.947. With 100 times fewer (only 300 thousand) trainable parameters, our CNN is less susceptible to overfitting, and is efficient. Additionally, it retains image resolution from input to output, captures fine-grained details, and can be trained end-to-end smoothly. To the best of our knowledge, this is the first deep CNN tailored for the problem of concern, and may be extended to solve similar segmentation tasks to facilitate investigations into pathology and clinical treatment.
Deep learning defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. Deep learning (DL) has been widely adopted in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the robustness of a DL system against adversarial attacks is usually measured by the accuracy of test data. Considering the limitation of accessible test data, good performance on test data can hardly guarantee the robustness and generality of DL systems. Different from traditional software systems which have clear and controllable logic and functionality, a DL system is trained with data and lacks thorough understanding. This makes it difficult for system analysis and defect detection, which could potentially hinder its real-world deployment without safety guarantees. In this paper, we propose DeepGauge, a comprehensive and multi-granularity testing criteria for DL systems, which renders a complete and multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, with four state-of-the-art adversarial data generation techniques. The effectiveness of DeepGauge sheds light on the construction of robust DL systems.
Social media can be a double-edged sword for modern communications, either a convenient channel exchanging ideas or an unexpected conduit circulating fake news through a large population. Existing studies of fake news focus on efforts on theoretical modelling of propagation or identification methods based on black-box machine learning, neglecting the possibility of identifying fake news using only structural features of propagation of fake news compared to those of real news and in particular the ability to identify fake news at early stages of propagation. Here we track large databases of fake news and real news in both, Twitter in Japan and its counterpart Weibo in China, and accumulate their complete traces of re-posting. It is consistently revealed in both media that fake news spreads distinctively, even at early stages of spreading, in a structure that resembles multiple broadcasters, while real news circulates with a dominant source. A novel predictability feature emerges from this difference in their propagation networks, offering new paths of early detection of fake news in social media. Instead of commonly used features like texts or users for fake news identification, our finding demonstrates collective structural signals that could be useful for filtering out fake news at early stages of their propagation evolution.
Mar 02 2018 cs.AI
Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing" problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.
In this paper, we propose a channel tracking method for massive multi-input and multi-output systems under both time-varying and spatial-varying circumstance. Exploiting the characteristics of massive antenna array, a spatial-temporal basis expansion model is designed to reduce the effective dimensions of up-link and down-link channel, which decomposes channel state information into the time-varying spatial information and gain information. We firstly model the users movements as a one-order unknown Markov process, which is blindly learned by the expectation and maximization (EM) approach. Then, the up-link time varying spatial information can be blindly tracked by Taylor series expansion of the steering vector, while the rest up-link channel gain information can be trained by only a few pilot symbols. Due to angle reciprocity (spatial reciprocity), the spatial information of the down-link channel can be immediately obtained from the up-link counterpart, which greatly reduces the complexity of down-link channel tracking. Various numerical results are provided to demonstrate the effectiveness of the proposed method.
Feb 27 2018 cs.LG
We propose a retrieval-augmented convolutional network and propose to train it with local mixup, a novel variant of the recently proposed mixup algorithm. The proposed hybrid architecture combining a convolutional network and an off-the-shelf retrieval engine was designed to mitigate the adverse effect of off-manifold adversarial examples, while the proposed local mixup addresses on-manifold ones by explicitly encouraging the classifier to locally behave linearly on the data manifold. Our evaluation of the proposed approach against five readily-available adversarial attacks on three datasets--CIFAR-10, SVHN and ImageNet--demonstrate the improved robustness compared to the vanilla convolutional network.
We study the problem of maximizing a monotone set function subject to a cardinality constraint $k$ in the setting where some number of elements $\tau$ is deleted from the returned set. The focus of this work is on the worst-case adversarial setting. While there exist constant-factor guarantees when the function is submodular, there are no guarantees for non-submodular objectives. In this work, we present a new algorithm Oblivious-Greedy and prove the first constant-factor approximation guarantees for a wider class of non-submodular objectives. The obtained theoretical bounds are the first constant-factor bounds that also hold in the linear regime, i.e. when the number of deletions $\tau$ is linear in $k$. Our bounds depend on established parameters such as the submodularity ratio and some novel ones such as the inverse curvature. We bound these parameters for two important objectives including support selection and variance reduction. Finally, we numerically demonstrate the robust performance of Oblivious-Greedy for these two objectives on various datasets.
Deep convolutional neural networks have achieved great success in various applications. However, training an effective DNN model for a specific task is rather challenging because it requires a prior knowledge or experience to design the network architecture, repeated trial-and-error process to tune the parameters, and a large set of labeled data to train the model. In this paper, we propose to overcome these challenges by actively adapting a pre-trained model to a new task with less labeled examples. Specifically, the pre-trained model is iteratively fine tuned based on the most useful examples. The examples are actively selected based on a novel criterion, which jointly estimates the potential contribution of an instance on optimizing the feature representation as well as improving the classification model for the target task. On one hand, the pre-trained model brings plentiful information from its original task, avoiding redesign of the network architecture or training from scratch; and on the other hand, the labeling cost can be significantly reduced by active label querying. Experiments on multiple datasets and different pre-trained models demonstrate that the proposed approach can achieve cost-effective training of DNNs.
Feb 08 2018 cs.CY
Third-party networks collect vast amounts of data about users via web sites and mobile applications. Consolidations among tracker companies can significantly increase their individual tracking capabilities, prompting scrutiny by competition regulators. Traditional measures of market share, based on revenue or sales, fail to represent the tracking capability of a tracker, especially if it spans both web and mobile. This paper proposes a new approach to measure the concentration of tracking capability, based on the reach of a tracker on popular websites and apps. Our results reveal that tracker prominence and parent-subsidiary relationships have significant impact on accurately measuring concentration.
Data-driven decision-making consequential to individuals raises important questions of accountability and justice. Indeed, European law provides individuals limited rights to 'meaningful information about the logic' behind significant, autonomous decisions such as loan approvals, insurance quotes, and CV filtering. We undertake three experimental studies examining people's perceptions of justice in algorithmic decision-making under different scenarios and explanation styles. Dimensions of justice previously observed in response to human decision-making appear similarly engaged in response to algorithmic decisions. Qualitative analysis identified several concerns and heuristics involved in justice perceptions including arbitrariness, generalisation, and (in)dignity. Quantitative analysis indicates that explanation styles primarily matter to justice perceptions only when subjects are exposed to multiple different styles---under repeated exposure of one style, scenario effects obscure any explanation effects. Our results suggests there may be no 'best' approach to explaining algorithmic decisions, and that reflection on their automated nature both implicates and mitigates justice dimensions.
Sense and avoid capability enables insects to fly versatilely and robustly in dynamic complex environment. Their biological principles are so practical and efficient that inspired we human imitating them in our flying machines. In this paper, we studied a novel bio-inspired collision detector and its application on a quadcopter. The detector is inspired from LGMD neurons in the locusts, and modeled into an STM32F407 MCU. Compared to other collision detecting methods applied on quadcopters, we focused on enhancing the collision selectivity in a bio-inspired way that can considerably increase the computing efficiency during an obstacle detecting task even in complex dynamic environment. We designed the quadcopter's responding operation imminent collisions and tested this bio-inspired system in an indoor arena. The observed results from the experiments demonstrated that the LGMD collision detector is feasible to work as a vision module for the quadcopter's collision avoidance task.
Jan 08 2018 cs.CV
Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset
Dec 27 2017 cs.CV
As a special type of object detection, pedestrian detection in generic scenes has made a significant progress trained with large amounts of labeled training data manually. While the models trained with generic dataset work bad when they are directly used in specific scenes. With special viewpoints, flow light and backgrounds, datasets from specific scenes are much different from the datasets from generic scenes. In order to make the generic scene pedestrian detectors work well in specific scenes, the labeled data from specific scenes are needed to adapt the models to the specific scenes. While labeling the data manually spends much time and money, especially for specific scenes, each time with a new specific scene, large amounts of images must be labeled. What's more, the labeling information is not so accurate in the pixels manually and different people make different labeling information. In this paper, we propose an ACP-based method, with augmented reality's help, we build the virtual world of specific scenes, and make people walking in the virtual scenes where it is possible for them to appear to solve this problem of lacking labeled data and the results show that data from virtual world is helpful to adapt generic pedestrian detectors to specific scenes.
Nov 28 2017 cs.CV
As two fundamental problems, graph cuts and graph matching have been investigated over decades, resulting in vast literature in these two topics respectively. However the way of jointly applying and solving graph cuts and matching receives few attention. In this paper, we first formalize the problem of simultaneously cutting a graph into two partitions i.e. graph cuts and establishing their correspondence i.e. graph matching. Then we develop an optimization algorithm by updating matching and cutting alternatively, provided with theoretical analysis. The efficacy of our algorithm is verified on both synthetic dataset and real-world images containing similar regions or structures.
In this paper, we design robust beamforming to guarantee the physical layer security for a multiuser beam division multiple access (BDMA) massive multiple-input multiple-output (MIMO) system, when the channel estimation errors are taken into consideration. With the aid of artificial noise (AN), the proposed design are formulated as minimizing the transmit power of the base station (BS), while providing legal users and the eavesdropper (Eve) with different signal-to-interference-plus-noise ratio (SINR). It is strictly proved that, under BDMA massive MIMO scheme, the initial non-convex optimization can be equivalently converted to a convex semi-definite programming (SDP) problem and the optimal rank-one beamforming solutions can be guaranteed. In stead of directly resorting to the convex tool, we make one step further by deriving the optimal beamforming direction and the optimal beamforming power allocation in closed-form, which greatly reduces the computational complexity and makes the proposed design practical for real world applications. Simulation results are then provided to verify the efficiency of the proposed algorithm.
Nov 17 2017 cs.CV
Face analytics benefits many multimedia applications. It consists of a number of tasks, such as facial emotion recognition and face parsing, and most existing approaches generally treat these tasks independently, which limits their deployment in real scenarios. In this paper we propose an integrated Face Analytics Network (iFAN), which is able to perform multiple tasks jointly for face analytics with a novel carefully designed network architecture to fully facilitate the informative interaction among different tasks. The proposed integrated network explicitly models the interactions between tasks so that the correlations between tasks can be fully exploited for performance boost. In addition, to solve the bottleneck of the absence of datasets with comprehensive training data for various tasks, we propose a novel cross-dataset hybrid training strategy. It allows "plug-in and play" of multiple datasets annotated for different tasks without the requirement of a fully labeled common dataset for all the tasks. We experimentally show that the proposed iFAN achieves state-of-the-art performance on multiple face analytics tasks using a single integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on the Helen dataset for face parsing, a normalized mean error of 5.81% on the MTFL dataset for facial landmark localization and an accuracy of 45.73% on the BNU dataset for emotion recognition with a single model.
Nov 16 2017 cs.AI
In this work we introduce a new framework for performing temporal predictions in the presence of uncertainty. It is based on a simple idea of disentangling components of the future state which are predictable from those which are inherently unpredictable, and encoding the unpredictable components into a low-dimensional latent variable which is fed into a forward model. Our method uses a supervised training objective which is fast and easy to train. We evaluate it in the context of video prediction on multiple datasets and show that it is able to consistently generate diverse predictions without the need for alternating minimization over a latent space or adversarial training.
Rapid growth of modern technologies such as internet and mobile computing are bringing dramatically increased e-commerce payments, as well as the explosion in transaction fraud. Meanwhile, fraudsters are continually refining their tricks, making rule-based fraud detection systems difficult to handle the ever-changing fraud patterns. Many data mining and artificial intelligence methods have been proposed for identifying small anomalies in large transaction data sets, increasing detecting efficiency to some extent. Nevertheless, there is always a contradiction that most methods are irrelevant to transaction sequence, yet sequence-related methods usually cannot learn information at single-transaction level well. In this paper, a new "within->between->within" sandwich-structured sequence learning architecture has been proposed by stacking an ensemble method, a deep sequential learning method and another top-layer ensemble classifier in proper order. Moreover, attention mechanism has also been introduced in to further improve performance. Models in this structure have been manifested to be very efficient in scenarios like fraud detection, where the information sequence is made up of vectors with complex interconnected features.
Homophily, ranging from demographics to sentiments, breeds connections in social networks, either offline or online. However, with the prosperous growth of music streaming service, whether homophily exists in online music listening remains unclear. In this study, two online social networks of a same group of active users are established respectively in Netease Music and Weibo. Through presented multiple similarity measures, it is evidently demonstrated that homophily does exist in music listening of both online social networks. The unexpected music similarity in Weibo also implies that knowledge from generic social networks can be confidently transfered to domain-oriented networks for context enrichment and algorithm enhancement. Comprehensive factors that might function in formation of homophily are further probed and many interesting patterns are profoundly revealed. It is found that female friends are more homogeneous in music listening and positive and energetic songs significantly pull users close. Our methodology and findings would shed lights on realistic applications in online music services.
Machine Learning (ML) techniques, such as Neural Network, are widely used in today's applications. However, there is still a big gap between the current ML systems and users' requirements. ML systems focus on improving the performance of models in training, while individual users cares more about response time and expressiveness of the tool. Many existing research and product begin to move computation towards edge devices. Based on the numerical computing system Owl, we propose to build the Zoo system to support construction, compose, and deployment of ML models on edge and local devices.
Finding hot topics in scholarly fields can help researchers to keep up with the latest concepts, trends, and inventions in their field of interest. Due to the rarity of complete large-scale scholarly data, earlier studies target this problem based on manual topic extraction from a limited number of domains, with their focus solely on a single feature such as coauthorship, citation relations, and etc. Given the compromised effectiveness of such predictions, in this paper we use a real scholarly dataset from Microsoft Academic Graph, which provides more than 12000 topics in the field of Computer Science (CS), including 1200 venues, 14.4 million authors, 30 million papers and their citation relations over the period of 1950 till now. Aiming to find the topics that will trend in CS area, we innovatively formalize a hot topic prediction problem where, with joint consideration of both inter- and intra-topical influence, 17 different scientific features are extracted for comprehensive description of topic status. By leveraging all those 17 features, we observe good accuracy of topic scale forecasting after 5 and 10 years with R2 values of 0.9893 and 0.9646, respectively. Interestingly, our prediction suggests that the maximum value matters in finding hot topics in scholarly fields, primarily from three aspects: (1) the maximum value of each factor, such as authors' maximum h-index and largest citation number, provides three times the amount of information than the average value in prediction; (2) the mutual influence between the most correlated topics serve as the most telling factor in long-term topic trend prediction, interpreting that those currently exhibiting the maximum growth rates will drive the correlated topics to be hot in the future; (3) we predict in the next 5 years the top 100 fastest growing (maximum growth rate) topics that will potentially get the major attention in CS area.
Oct 04 2017 cs.CV
Fine-grained image classification is to recognize hundreds of subcategories in each basic-level category. Existing methods employ discriminative localization to find the key distinctions among subcategories. However, they generally have two limitations: (1) Discriminative localization relies on region proposal methods to hypothesize the locations of discriminative regions, which are time-consuming. (2) The training of discriminative localization depends on object or part annotations, which are heavily labor-consuming. It is highly challenging to address the two key limitations simultaneously, and existing methods only focus on one of them. Therefore, we propose a weakly supervised discriminative localization approach (WSDL) for fast fine-grained image classification to address the two limitations at the same time, and its main advantages are: (1) n-pathway end-to-end discriminative localization network is designed to improve classification speed, which simultaneously localizes multiple different discriminative regions for one image to boost classification accuracy, and shares full-image convolutional features generated by region proposal network to accelerate the process of generating region proposals as well as reduce the computation of convolutional operation. (2) Multi-level attention guided localization learning is proposed to localize discriminative regions with different focuses automatically, without using object and part annotations, avoiding the labor consumption. Different level attentions focus on different characteristics of the image, which are complementary and boost the classification accuracy. Both are jointly employed to simultaneously improve classification speed and eliminate dependence on object and part annotations. Compared with state-of-the-art methods on 2 widely-used fine-grained image classification datasets, our WSDL approach achieves the best performance.
Oct 04 2017 cs.CL
Knowledge of users' emotion states helps improve human-computer interaction. In this work, we presented EmoNet, an emotion detector of Chinese daily dialogues based on deep convolutional neural networks. In order to maintain the original linguistic features, such as the order, commonly used methods like segmentation and keywords extraction were not adopted, instead we increased the depth of CNN and tried to let CNN learn inner linguistic relationships. Our main contribution is that we presented a new model and a new pipeline which can be used in multi-language environment to solve sentimental problems. Experimental results shows EmoNet has a great capacity in learning the emotion of dialogues and achieves a better result than other state of art detectors do.
Sep 26 2017 cs.SY
Unmanned aerial vehicle (UAV)-satellite communication has drawn dramatic attention for its potential to build the integrated space-air-ground network and the seamless wide-area coverage. The key challenge to UAV-satellite communication is its unstable beam pointing due to the UAV navigation, which is a typical SatCom on-the-move scenario. In this paper, we propose a blind beam tracking approach for Ka-band UAVsatellite communication system, where UAV is equipped with a large-scale antenna array. The effects of UAV navigation are firstly released through the mechanical adjustment, which could approximately point the beam towards the target satellite through beam stabilization and dynamic isolation. Specially, the attitude information can be realtimely derived from data fusion of lowcost sensors. Then, the precision of the beam pointing is blindly refined through electrically adjusting the weight of the massive antennas, where an array structure based simultaneous perturbation algorithm is designed. Simulation results are provided to demonstrate the superiority of the proposed method over the existing ones.
Sep 26 2017 cs.CV
Discriminative localization is essential for fine-grained image classification task, which devotes to recognizing hundreds of subcategories in the same basic-level category. Reflecting on discriminative regions of objects, key differences among different subcategories are subtle and local. Existing methods generally adopt a two-stage learning framework: The first stage is to localize the discriminative regions of objects, and the second is to encode the discriminative features for training classifiers. However, these methods generally have two limitations: (1) Separation of the two-stage learning is time-consuming. (2) Dependence on object and parts annotations for discriminative localization learning leads to heavily labor-consuming labeling. It is highly challenging to address these two important limitations simultaneously. Existing methods only focus on one of them. Therefore, this paper proposes the discriminative localization approach via saliency-guided Faster R-CNN to address the above two limitations at the same time, and our main novelties and advantages are: (1) End-to-end network based on Faster R-CNN is designed to simultaneously localize discriminative regions and encode discriminative features, which accelerates classification speed. (2) Saliency-guided localization learning is proposed to localize the discriminative region automatically, avoiding labor-consuming labeling. Both are jointly employed to simultaneously accelerate classification speed and eliminate dependence on object and parts annotations. Comparing with the state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach achieves both the best classification accuracy and efficiency.
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the necessity of UNI but still enjoy the similar benefits as RWwJ. We observe that many graphs under study, called target graphs, do not exist in isolation. In many situations, a target graph is related to an auxiliary graph and a bipartite graph, and they together form a better connected \em two-layered network structure. This new viewpoint brings extra benefits to graph sampling: if directly sampling a target graph is difficult, we can sample it indirectly with the assistance of the other two graphs. We propose a series of new graph sampling techniques by exploiting such a two-layered network structure to estimate target graph characteristics. Experiments conducted on both synthetic and real-world networks demonstrate the effectiveness and usefulness of these new techniques.
Aug 31 2017 cs.SI
In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., people send/receive more greetings to/from their friends on holidays than regular days. We also observe that some videos or hashtags suddenly go viral through people's sharing on online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with sudden surges of user interactions in networks, which we call "bursts" in this work. We uncover that the emergence of a burst is accompanied with the formation of triangles in some properly defined networks. This finding motivates us to propose a new and robust method to detect bursts on OSNs. We first introduce a new measure, "triadic cardinality distribution", corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, in a network. We show that this distribution not only changes when a burst occurs, but it also has a robustness property that it is immunized against common spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can more reliably detect bursts than simply counting interactions on OSNs. To avoid handling massive activity data generated by OSN users during the triadic tracking, we design an efficient "sample-estimate" framework to provide maximum likelihood estimate on the triadic cardinality distribution. We propose several sampling methods, and provide insights about their performance difference, through both theoretical analysis and empirical experiments on real world networks.
Aug 29 2017 cs.CR
Algorithmic complexity vulnerabilities occur when the worst-case time/space complexity of an application is significantly higher than the respective average case for particular user-controlled inputs. When such conditions are met, an attacker can launch Denial-of-Service attacks against a vulnerable application by providing inputs that trigger the worst-case behavior. Such attacks have been known to have serious effects on production systems, take down entire websites, or lead to bypasses of Web Application Firewalls. Unfortunately, existing detection mechanisms for algorithmic complexity vulnerabilities are domain-specific and often require significant manual effort. In this paper, we design, implement, and evaluate SlowFuzz, a domain-independent framework for automatically finding algorithmic complexity vulnerabilities. SlowFuzz automatically finds inputs that trigger worst-case algorithmic behavior in the tested binary. SlowFuzz uses resource-usage-guided evolutionary search techniques to automatically find inputs that maximize computational resource utilization for a given application.
Random key graphs have received considerable attention and been used in various applications including secure sensor networks, social networks, the study of epidemics, cryptanalysis, and recommender systems. In this paper, we investigate a $q$-composite random key graph, whose construction on $n$ nodes is as follows: each node independently selects a set of $K_n$ different keys uniformly at random from the same pool of $P_n$ distinct keys, and two nodes establish an undirected edge in between if and only if they share at least $q$ key(s). Such graph denoted by $G_q(n,K_n,P_n)$ models a secure sensor network employing the well-known $q$-composite key predistribution. For $G_q(n,K_n,P_n)$, we analyze the probabilities of $G_q(n,K_n,P_n)$ having $k$-connectivity, $k$-robustness, a Hamilton cycle and a perfect matching, respectively. Our studies of these four properties are motivated by a detailed discussion of their applications to networked control. Our results reveal that $G_q(n,K_n,P_n)$ exhibits a sharp transition for each property: as $K_n$ increases, the probability that $G_q(n,K_n,P_n)$ has the property sharply increases from $0$ to $1$. These results provide fundamental guidelines to design secure sensor networks for different control-related applications: distributed in-network parameter estimation, fault-tolerant consensus, and resilient data backup.
Drug repositioning, that is finding new uses for existing drugs to treat more patients. Cumulative studies demonstrate that the mature miRNAs as well as their precursors can be targeted by small molecular drugs. At the same time, human diseases result from the disordered interplay of tissue- and cell lineage-specific processes. However, few computational researches predict drug-disease potential relationships based on miRNA data and tissue specificity. Therefore, based on miRNA data and the tissue specificity of diseases, we propose a new method named as miTS to predict the potential treatments for diseases. Firstly, based on miRNAs data, target genes and information of FDA approved drugs, we evaluate the relationships between miRNAs and drugs in the tissue-specific PPI network. Then, we construct a tripartite network: drug-miRNA-disease Finally, we obtain the potential drug-disease associations based on the tripartite network. In this paper, we take breast cancer as case study and focus on the top-30 predicted drugs. 25 of them (83.3%) are found having known connections with breast cancer in CTD benchmark and the other 5 drugs are potential drugs for breast cancer. We further evaluate the 5 newly predicted drugs from clinical records, literature mining, KEGG pathways enrichment analysis and overlapping genes between enriched pathways. For each of the 5 new drugs, strongly supported evidences can be found in three or more aspects. In particular, Regorafenib has 15 overlapping KEGG pathways with breast cancer and their p-values are all very small. In addition, whether in the literature curation or clinical validation, Regorafenib has a strong correlation with breast cancer. All the facts show that Regorafenib is likely to be a truly effective drug, worthy of our further study. It further follows that our method miTS is effective and practical for predicting new drug indications.
Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.
Jul 24 2017 cs.HC
Head gesture is a natural means of face-to-face communication between people but the recognition of head gestures in the context of virtual reality and use of head gesture as an interface for interacting with virtual avatars and virtual environments have been rarely investigated. In the current study, we present an approach for real-time head gesture recognition on head-mounted displays using Cascaded Hidden Markov Models. We conducted two experiments to evaluate our proposed approach. In experiment 1, we trained the Cascaded Hidden Markov Models and assessed the offline classification performance using collected head motion data. In experiment 2, we characterized the real-time performance of the approach by estimating the latency to recognize a head gesture with recorded real-time classification data. Our results show that the proposed approach is effective in recognizing head gestures. The method can be integrated into a virtual reality system as a head gesture interface for interacting with virtual worlds.
Jun 20 2017 cs.SY
The output impedance matrix of a grid-connected converter plays an important role in analyzing system stability. Due to the dynamics of the DC-link control and the phase locked loop (PLL), the output impedance matrices of the converter and grid are difficult to be diagonally decoupled simultaneously, neither in the dq domain nor in the phase domain. It weakens the effectiveness of impedance-based stability criterion (ISC) in system oscillation analysis. To this end, this paper innovatively proposes the generalized-impedance based stability criterion (GISC) to reduce the dimension of the transfer function matrix and simplify system small-signal stability analysis. Firstly, the impedances of the converter and the grid in polar coordinates are formulated, and the concept of generalized-impedance of the converter and the grid is put forward. Secondly, through strict mathematical derivation, the equation that implies the dynamic interaction between the converter and the grid is then extracted from the characteristic equation of the grid-connected converter system. Using the proposed method, the small-signal instability of system can be interpreted as the resonance of the generalized-impedances of the converter and the grid. Besides, the GISC is equivalent to ISC when the dynamics of the outer-loop control and PLL are not considered. Finally, the effectiveness of the proposed method is further verified using the MATLAB based digital simulation and RT-LAB based hardware-in-the-loop (HIL) simulation.
While autoencoders are a key technique in representation learning for continuous structures, such as images or wave forms, developing general-purpose autoencoders for discrete structures, such as text sequence or discretized images, has proven to be more challenging. In particular, discrete inputs make it more difficult to learn a smooth encoder that preserves the complex local relationships in the input space. In this work, we propose an adversarially regularized autoencoder (ARAE) with the goal of learning more robust discrete-space representations. ARAE jointly trains both a rich discrete-space encoder, such as an RNN, and a simpler continuous space generator function, while using generative adversarial network (GAN) training to constrain the distributions to be similar. This method yields a smoother contracted code space that maps similar inputs to nearby codes, and also an implicit latent variable GAN model for generation. Experiments on text and discretized images demonstrate that the GAN model produces clean interpolations and captures the multimodality of the original space, and that the autoencoder produces improve- ments in semi-supervised learning as well as state-of-the-art results in unaligned text style transfer task using only a shared continuous-space representation.
Jun 09 2017 cs.SI
Recent studies suggest that human emotions diffuse in not only real-world communities but also online social media. More and more mechanisms beyond emotion contagion are revealed, including emotion correlations which indicate their influence and the coupling of emotion diffusion and network structure such as tie strength. Besides, different emotions might even compete in shaping the public opinion. However, a comprehensive model that considers up-to-date findings to replicate the patterns of emotion contagion in online social media is still missing. In this paper, to bridge this vital gap, we propose an agent-based emotion contagion model which combines features of emotion influence and tie strength preference in the dissemination process. The simulation results indicate that anger-dominated users have higher vitality than joy-dominated ones, and anger prefers weaker ties than joy in diffusion, which could make it easier to spread between online groups. Moreover, anger's high influence makes it competitive and easily to dominate the community, especially when negative public events occur. It is also surprisingly revealed that as the ratio of anger approaches joy with a gap less than 10%, angry tweets and users will eventually dominate the online social media and arrives the collective outrage in the cyber space. The critical gap disclosed here can be indeed warning signals at early stages for outrage controlling in online social media. All the parameters of the presented model can be easily estimated from the empirical observations and their values from historical data could help reproduce the emotion contagion of different circumstances. Our model would shed lights on the study of multiple issues like forecasting of emotion contagion in terms of computer simulations.
May 23 2017 cs.CV
Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human parsing by introducing the problem of multi-human parsing in the wild. Existing works on human parsing mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address the multi-human parsing problem, we introduce a new multi-human parsing (MHP) dataset and a novel multi-human parsing model named MH-Parser. The MHP dataset contains multiple persons captured in real-world scenes with pixel-level fine-grained semantic annotations in an instance-aware setting. The MH-Parser generates global parsing maps and person instance masks simultaneously in a bottom-up fashion with the help of a new Graph-GAN model. We envision that the MHP dataset will serve as a valuable data resource to develop new multi-human parsing models, and the MH-Parser offers a strong baseline to drive future research for multi-human parsing in the wild.
How the online social media, like Twitter or its variant Weibo, interacts with the stock market and whether it can be a convincing proxy to predict the stock market have been debated for years, especially for China. As the traditional theory in behavioral finance states, the individual emotions can influence decision-makings of investors, it is reasonable to further explore these controversial topics systematically from the perspective of online emotions, which are richly carried by massive tweets in social media. Through thorough studies on over 10 million stock-relevant tweets and 3 million investors from Weibo, it is revealed that inexperienced investors with high emotional volatility are more sensible to the market fluctuations than the experienced or institutional ones, and their dominant occupation also indicates that the Chinese market might be more emotional as compared to its western counterparts. Then both correlation analysis and causality test demonstrate that five attributes of the stock market in China can be competently predicted by various online emotions, like disgust, joy, sadness and fear. Specifically, the presented prediction model significantly outperforms the baseline model, including the one taking purely financial time series as input features, on predicting five attributes of the stock market under the $K$-means discretization. We also employ this prediction model in the scenario of realistic online application and its performance is further testified.
Apr 07 2017 cs.CV
Fine-grained image classification is to recognize hundreds of subcategories belonging to the same basic-level category, such as 200 subcategories belonging to the bird, which is highly challenging due to large variance in the same subcategory and small variance among different subcategories. Existing methods generally first locate the objects or parts and then discriminate which subcategory the image belongs to. However, they mainly have two limitations: (1) Relying on object or part annotations which are heavily labor consuming. (2) Ignoring the spatial relationships between the object and its parts as well as among these parts, both of which are significantly helpful for finding discriminative parts. Therefore, this paper proposes the object-part attention model (OPAM) for weakly supervised fine-grained image classification, and the main novelties are: (1) Object-part attention model integrates two level attentions: object-level attention localizes objects of images, and part-level attention selects discriminative parts of object. Both are jointly employed to learn multi-view and multi-scale features to enhance their mutual promotions. (2) Object-part spatial constraint model combines two spatial constraints: object spatial constraint ensures selected parts highly representative, and part spatial constraint eliminates redundancy and enhances discrimination of selected parts. Both are jointly employed to exploit the subtle and local differences for distinguishing the subcategories. Importantly, neither object nor part annotations are used in our proposed approach, which avoids the heavy labor consumption of labeling. Comparing with more than 10 state-of-the-art methods on 4 widely-used datasets, our OPAM approach achieves the best performance.
Apr 04 2017 cs.CV
Unconstrained face recognition performance evaluations have traditionally focused on Labeled Faces in the Wild (LFW) dataset for imagery and the YouTubeFaces (YTF) dataset for videos in the last couple of years. Spectacular progress in this field has resulted in saturation on verification and identification accuracies for those benchmark datasets. In this paper, we propose a unified learning framework named Transferred Deep Feature Fusion (TDFF) targeting at the new IARPA Janus Benchmark A (IJB-A) face recognition dataset released by NIST face challenge. The IJB-A dataset includes real-world unconstrained faces from 500 subjects with full pose and illumination variations which are much harder than the LFW and YTF datasets. Inspired by transfer learning, we train two advanced deep convolutional neural networks (DCNN) with two different large datasets in source domain, respectively. By exploring the complementarity of two distinct DCNNs, deep feature fusion is utilized after feature extraction in target domain. Then, template specific linear SVMs is adopted to enhance the discrimination of framework. Finally, multiple matching scores corresponding different templates are merged as the final results. This simple unified framework exhibits excellent performance on IJB-A dataset. Based on the proposed approach, we have submitted our IJB-A results to National Institute of Standards and Technology (NIST) for official evaluation. Moreover, by introducing new data and advanced neural architecture, our method outperforms the state-of-the-art by a wide margin on IJB-A dataset.
Being dominant factors driving the human actions, personalities can be excellent indicators in predicting the offline and online behavior of different individuals. However, because of the great expense and inevitable subjectivity in questionnaires and surveys, it is challenging for conventional studies to explore the connection between personality and behavior and gain insights in the context of large amount individuals. Considering the more and more important role of the online social media in daily communications, we argue that the footprint of massive individuals, like tweets in Weibo, can be the inspiring proxy to infer the personality and further understand its functions in shaping the online human behavior. In this study, a map from self-reports of personalities to online profiles of 293 active users in Weibo is established to train a competent machine learning model, which then successfully identifies over 7,000 users as extroverts or introverts. Systematical comparisons from perspectives of tempo-spatial patterns, online activities, emotion expressions and attitudes to virtual honor surprisingly disclose that the extrovert indeed behaves differently from the introvert in Weibo. Our findings provide solid evidence to justify the methodology of employing machine learning to objectively study personalities of massive individuals and shed lights on applications of probing personalities and corresponding behaviors solely through online profiles.
Mar 20 2017 cs.SY
This paper is the second of a two-part series that discusses the implementation issues and test results of a robust Unscented Kalman Filter (UKF) for power system dynamic state estimation with non-Gaussian synchrophasor measurement noise. The tuning of the parameters of our Generalized Maximum-Likelihood-type robust UKF (GM-UKF) is presented and discussed in a systematic way. Using simulations carried out on the IEEE 39-bus system, its performance is evaluated under different scenarios, including i) the occurrence of two different types of noises following thick-tailed distributions, namely the Laplace or Cauchy probability distributions for real and reactive power measurements; ii) the occurrence of observation and innovation outliers; iii) the occurrence of PMU measurement losses due to communication failures; iv) cyber attacks; and v) strong system nonlinearities. It is also compared to the UKF and the Generalized Maximum-Likelihood-type robust iterated EKF (GM-IEKF). Simulation results reveal that the GM-UKF outperforms the GM-IEKF and the UKF in all scenarios considered. In particular, when the system is operating under stressed conditions, inducing system nonlinearities, the GM-IEKF and the UKF diverge while our GM-UKF does converge. In addition, when the power measurement noises obey a Cauchy distribution, our GM-UKF converges to a state estimate vector that exhibits a much higher statistical efficiency than that of the GM-IEKF; by contrast, the UKF fails to converge. Finally, potential applications and future work of the proposed GM-UKF are discussed in concluding remarks section.
Mar 14 2017 cs.SI
Many people dream to become famous, YouTube video makers also wish their videos to have a large audience, and product retailers always hope to expose their products to customers as many as possible. Do these seemingly different phenomena share a common structure? We find that fame, popularity, or exposure, could be modeled as a node's discoverability on some properly defined network, and all of the previously mentioned phenomena can be commonly stated as a target node wants to be discovered easily by the other nodes in the network. In this work, we explicitly define a node's discoverability in a network, and formulate a general node discoverability optimization problem, where the goal is to create a budgeted set of incoming edges to the target node so as to optimize the target node's discoverability in the network. Although the optimization problem is proven to be NP-hard, we find that the defined discoverability measures have good properties that enable us to use a greedy algorithm to find provably near-optimal solutions. The computational complexity of a greedy algorithm is dominated by the time cost of an oracle call, i.e., calculating the marginal gain of a node. To scale up the oracle call over large networks, we propose an estimation-and-refinement approach, that provides a good trade-off between estimation accuracy and computational efficiency. Experiments conducted on real-world networks demonstrate that our method is thousands of times faster than an exact method using dynamic programming, thereby allowing us to solve the node discoverability optimization problem on large networks.
Mar 08 2017 cs.IR
Entity information network is used to describe structural relationships between entities. Taking advantage of its extension and heterogeneity, entity information network is more and more widely applied to relationship modeling. Recent years, lots of researches about entity information network modeling have been proposed, while seldom of them concentrate on equipment-standard system with properties of multi-layer, multi-dimension and multi-scale. In order to efficiently deal with some complex issues in equipment-standard system such as standard revising, standard controlling, and production designing, a heterogeneous information network model for equipment-standard system is proposed in this paper. Three types of entities and six types of relationships are considered in the proposed model. Correspondingly, several different similarity-measuring methods are used in the modeling process. The experiments show that the heterogeneous information network model established in this paper can reflect relationships between entities accurately. Meanwhile, the modeling process has a good performance on time consumption.
Mar 02 2017 cs.LG
Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is pushed to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users' current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also analyse the robustness of our approach against adversarial attacks, as well as its feasibility by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi).
Recent decades have witnessed the tremendous development of network science, which indeed brings a new and insightful language to model real systems of different domains. Betweenness, a widely employed centrality in network science, is a decent proxy in investigating network loads and rankings. However, the extremely high computational cost greatly prevents its applying on large networks. Though several parallel algorithms have been presented to reduce its calculation cost on unweighted networks, a fast solution for weighted networks, which are in fact more ubiquitous than unweighted ones in reality, is still missing. In this study, we develop an efficient parallel GPU-based approach to boost the calculation of betweenness centrality on very large and weighted networks. Comprehensive and systematic evaluations on both synthetic and real-world networks demonstrate that our solution can arrive the performance of 30x to 150x speedup over the CPU implementation by integrating the work-efficient and warp-centric strategies. Our algorithm is completely open-sourced and free to the community and it is public available through https://dx.doi.org/10.6084/m9.figshare.4542405. Considering the pervasive deployment and declining price of GPU on personal computers and servers, our solution will indeed offer unprecedented opportunities for exploring the betweenness related problems in network science.
Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC^2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw text features are firstly embedded into compact binary codes by using one existing unsupervised dimensionality reduction methods. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, meanwhile the output units are used to fit the pre-trained binary codes in the training process. Finally, we get the optimal clusters by employing K-means to cluster the learned representations. Extensive experimental results demonstrate that the proposed framework is effective, flexible and outperform several popular clustering methods when tested on three public short text datasets.
Dec 28 2016 cs.CV
Face recognition techniques have been developed significantly in recent years. However, recognizing faces with partial occlusion is still challenging for existing face recognizers which is heavily desired in real-world applications concerning surveillance and security. Although much research effort has been devoted to developing face de-occlusion methods, most of them can only work well under constrained conditions, such as all the faces are from a pre-defined closed set. In this paper, we propose a robust LSTM-Autoencoders (RLA) model to effectively restore partially occluded faces even in the wild. The RLA model consists of two LSTM components, which aims at occlusion-robust face encoding and recurrent occlusion removal respectively. The first one, named multi-scale spatial LSTM encoder, reads facial patches of various scales sequentially to output a latent representation, and occlusion-robustness is achieved owing to the fact that the influence of occlusion is only upon some of the patches. Receiving the representation learned by the encoder, the LSTM decoder with a dual channel architecture reconstructs the overall face and detects occlusion simultaneously, and by feat of LSTM, the decoder breaks down the task of face de-occlusion into restoring the occluded part step by step. Moreover, to minimize identify information loss and guarantee face recognition accuracy over recovered faces, we introduce an identity-preserving adversarial training scheme to further improve RLA. Extensive experiments on both synthetic and real datasets of faces with occlusion clearly demonstrate the effectiveness of our proposed RLA in removing different types of facial occlusion at various locations. The proposed method also provides significantly larger performance gain than other de-occlusion methods in promoting recognition performance over partially-occluded faces.
Directed networks such as gene regulation networks and neural networks are connected by arcs (directed links). The nodes in a directed network are often strongly interwound by a huge number of directed cycles, which lead to complex information-processing dynamics in the network and make it highly challenging to infer the intrinsic direction of information flow. In this theoretical paper, based on the principle of minimum-feedback, we explore the node hierarchy of directed networks and distinguish feedforward and feedback arcs. Nearly optimal node hierarchy solutions, which minimize the number of feedback arcs from lower-level nodes to higher-level nodes, are constructed by belief-propagation and simulated-annealing methods. For real-world networks, we quantify the extent of feedback scarcity by comparison with the ensemble of direction-randomized networks and identify the most important feedback arcs. Our methods are also useful for visualizing directed networks.
Dec 07 2016 cs.CV
Benefiting from its high efficiency and simplicity, Simple Linear Iterative Clustering (SLIC) remains one of the most popular over-segmentation tools. However, due to explicit enforcement of spatial similarity for region continuity, the boundary adaptation of SLIC is sub-optimal. It also has drawbacks on convergence rate as a result of both the fixed search region and separately doing the assignment step and the update step. In this paper, we propose an alternative approach to fix the inherent limitations of SLIC. In our approach, each pixel actively searches its corresponding segment under the help of its neighboring pixels, which naturally enables region coherence without being harmful to boundary adaptation. We also jointly perform the assignment and update steps, allowing high convergence rate. Extensive evaluations on Berkeley segmentation benchmark verify that our method outperforms competitive methods under various evaluation metrics. It also has the lowest time cost among existing methods (approximately 30fps for a 481x321 image on a single CPU core).
We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from our ability to distinguish among different observations belonging to the same class. Examples of such observations include images of a set of labeled objects captured at different viewpoints, or recordings of set of speakers dictating multiple phrases. In both instances, the intra-class diversity is the source of the unspecified factors of variation: each object is observed at multiple viewpoints, and each speaker dictates multiple phrases. Learning to disentangle the specified factors from the unspecified ones becomes easier when strong supervision is possible. Suppose that during training, we have access to pairs of images, where each pair shows two different objects captured from the same viewpoint. This source of alignment allows us to solve our task using existing methods. However, labels for the unspecified factors are usually unavailable in realistic scenarios where data acquisition is not strictly controlled. We address the problem of disentanglement in this more general setting by combining deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of generalizing to unseen classes and intra-class variabilities.
Growth in leisure travel has become increasingly significant economically, socially, and environmentally. However, flexible but uncoordinated travel behaviors exacerbate traffic congestion. Mobile phone records not only reveal human mobility patterns, but also enable us to manage travel demand for system efficiency. In this paper, we propose a location recommendation system that infers personal preferences while accounting for constraints imposed by road capacity in order to manage travel demand. We first infer unobserved preferences using a machine learning technique from phone records. We then formulate an optimization method to improve system efficiency. Coupling mobile phone data with traffic counts and road network infrastructures collected in Andorra, this study shows that uncoordinated travel behaviors lead to longer average travel delay, implying the opportunities in managing travel demand by collective decisions. The interplay between congestion relief and overall satisfied location preferences observed in extensive simulations indicate that moderate sacrifices of individual utility lead to significant travel time savings. Specifically, the results show that under full compliance rate, travel delay fell by 52% at a cost of 31% less satisfaction. Under 60% compliance rate, 41% travel delay is saved with a 17% reduction in satisfaction. This paper highlights the effectiveness of the synergy among collective behaviors in increasing system efficiency.
Oct 04 2016 cs.HC
The movement of a user's face, easily detected by a smartphone's front camera, is an underexploited input modality for mobile interactions. We introduce three sets of face-engaged interaction techniques for augmenting the traditional mobile inputs, which leverages the combination of the head movements with touch gestures and device motions, all sensed via the phone's built-in sensors. We systematically present the space of design considerations for mobile interactions using one or more of the three input modalities (i.e., touch, motion, and head). The additional affordances of the proposed techniques expand the mobile interaction vocabulary, and can facilitate unique usage scenarios such as one-hand or touch-free interaction. An initial evaluation was conducted and users had positive reactions to the new techniques, indicating the promise of an intuitive and convenient user experience.
We introduce the "Energy-based Generative Adversarial Network" model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.
Increasing evidence suggests that, similar to face-to-face communications, human emotions also spread in online social media. However, the mechanisms underlying this emotional contagion, for example, whether different feelings spread in unlikely ways or how the spread of emotions relates to the social network, is rarely investigated. Indeed, because of high costs and spatio-temporal limitations, explorations of this topic are challenging using conventional questionnaires or controlled experiments. Because they are collection points for natural affective responses of massive individuals, online social media sites offer an ideal proxy for tackling this issue from the perspective of computational social science. In this paper, based on the analysis of millions of tweets in Weibo, surprisingly, we find that anger is more contagious than joy, indicating that it can spark more angry follow-up tweets. Moreover, regarding dissemination in social networks, anger travels easily along weaker ties than joy, meaning that it can infiltrate different communities and break free of local traps because strangers share such content more often. Through a simple diffusion model, we reveal that greater contagion and weaker ties function cooperatively to speed up anger's spread. The diffusion of real-world events with different dominant emotions provides further testimony to the findings. To the best of our knowledge, this is the first time that quantitative long-term evidence has been presented that reveals a difference in the mechanism by which joy and anger are disseminated. Our findings shed light on both personal anger management in human communications and on controlling collective outrage in cyberspace.
Jul 21 2016 cs.CV
Natural images are generated under many factors, including shape, pose, illumination etc. Most existing ConvNets formulate object recognition from natural images as a single task classification problem, and attempt to learn features useful for object categories, but invariant to other factors of variation as much as possible. These architectures do not explicitly learn other factors, like pose and lighting, instead, they usually discard them by pooling and normalization. In this work, we take the opposite approach: we train ConvNets for object recognition by retaining other factors (pose in our case) and learn them jointly with object category. We design a new multi-task leaning (MTL) ConvNet, named disentangling CNN (disCNN), which explicitly enforces the disentangled representations of object identity and pose, and is trained to predict object categories and pose transformations. We show that disCNN achieves significantly better object recognition accuracies than AlexNet trained solely to predict object categories on the iLab-20M dataset, which is a large scale turntable dataset with detailed object pose and lighting information. We further show that the pretrained disCNN/AlexNet features on iLab- 20M generalize to object recognition on both Washington RGB-D and ImageNet datasets, and the pretrained disCNN features are significantly better than the pretrained AlexNet features for fine-tuning object recognition on the ImageNet dataset.
Jul 21 2016 cs.CV
Despite significant recent progress, the best available computer vision algorithms still lag far behind human capabilities, even for recognizing individual discrete objects under various poses, illuminations, and backgrounds. Here we present a new approach to using object pose information to improve deep network learning. While existing large-scale datasets, e.g. ImageNet, do not have pose information, we leverage the newly published turntable dataset, iLab-20M, which has ~22M images of 704 object instances shot under different lightings, camera viewpoints and turntable rotations, to do more controlled object recognition experiments. We introduce a new convolutional neural network architecture, what/where CNN (2W-CNN), built on a linear-chain feedforward CNN (e.g., AlexNet), augmented by hierarchical layers regularized by object poses. Pose information is only used as feedback signal during training, in addition to category information; during test, the feedforward network only predicts category. To validate the approach, we train both 2W-CNN and AlexNet using a fraction of the dataset, and 2W-CNN achieves 6% performance improvement in category prediction. We show mathematically that 2W-CNN has inherent advantages over AlexNet under the stochastic gradient descent (SGD) optimization procedure. Further more, we fine-tune object recognition on ImageNet by using the pretrained 2W-CNN and AlexNet features on iLab-20M, results show that significant improvements have been achieved, compared with training AlexNet from scratch. Moreover, fine-tuning 2W-CNN features performs even better than fine-tuning the pretrained AlexNet features. These results show pretrained features on iLab- 20M generalizes well to natural image datasets, and 2WCNN learns even better features for object recognition than AlexNet.
Jun 14 2016 cs.LG
We propose to learn multiple local Mahalanobis distance metrics to perform k-nearest neighbor (kNN) classification of temporal sequences. Temporal sequences are first aligned by dynamic time warping (DTW); given the alignment path, similarity between two sequences is measured by the DTW distance, which is computed as the accumulated distance between matched temporal point pairs along the alignment path. Traditionally, Euclidean metric is used for distance computation between matched pairs, which ignores the data regularities and might not be optimal for applications at hand. Here we propose to learn multiple Mahalanobis metrics, such that DTW distance becomes the sum of Mahalanobis distances. We adapt the large margin nearest neighbor (LMNN) framework to our case, and formulate multiple metric learning as a linear programming problem. Extensive sequence classification results show that our proposed multiple metrics learning approach is effective, insensitive to the preceding alignment qualities, and reaches the state-of-the-art performances on UCR time series datasets.
Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially. This paper introduces the Active Long Term Memory Networks (A-LTM), a model of sequential multi-task deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge. A-LTM exploits the non-convex nature of deep neural networks and actively maintains knowledge of previously learned, inactive tasks using a distillation loss. Distortions of the learned input-output map are penalized but hidden layers are free to transverse towards new local optima that are more favorable for the multi-task objective. We re-frame the McClelland's seminal Hippocampal theory with respect to Catastrophic Inference (CI) behavior exhibited by modern deep architectures trained with back-propagation and inhomogeneous sampling of latent factors across epochs. We present empirical results of non-trivial CI during continual learning in Deep Linear Networks trained on the same task, in Convolutional Neural Networks when the task shifts from predicting semantic to graphical factors and during domain adaptation from simple to complex environments. We present results of the A-LTM model's ability to maintain viewpoint recognition learned in the highly controlled iLab-20M dataset with 10 object categories and 88 camera viewpoints, while adapting to the unstructured domain of Imagenet with 1,000 object categories.
Jun 07 2016 cs.CV
Dynamic Time Warping (DTW) is an algorithm to align temporal sequences with possible local non-linear distortions, and has been widely applied to audio, video and graphics data alignments. DTW is essentially a point-to-point matching method under some boundary and temporal consistency constraints. Although DTW obtains a global optimal solution, it does not necessarily achieve locally sensible matchings. Concretely, two temporal points with entirely dissimilar local structures may be matched by DTW. To address this problem, we propose an improved alignment algorithm, named shape Dynamic Time Warping (shapeDTW), which enhances DTW by taking point-wise local structural information into consideration. shapeDTW is inherently a DTW algorithm, but additionally attempts to pair locally similar structures and to avoid matching points with distinct neighborhood structures. We apply shapeDTW to align audio signal pairs having ground-truth alignments, as well as artificially simulated pairs of aligned sequences, and obtain quantitatively much lower alignment errors than DTW and its two variants. When shapeDTW is used as a distance measure in a nearest neighbor classifier (NN-shapeDTW) to classify time series, it beats DTW on 64 out of 84 UCR time series datasets, with significantly improved classification accuracies. By using a properly designed local structure descriptor, shapeDTW improves accuracies by more than 10% on 18 datasets. To the best of our knowledge, shapeDTW is the first distance measure under the nearest neighbor classifier scheme to significantly outperform DTW, which had been widely recognized as the best distance measure to date. Our code is publicly accessible at: https://github.com/jiapingz/shapeDTW.
With the rapid growth of knowledge bases (KBs) on the web, how to take full advantage of them becomes increasingly important. Knowledge base-based question answering (KB-QA) is one of the most promising approaches to access the substantial knowledge. Meantime, as the neural network-based (NN-based) methods develop, NN-based KB-QA has already achieved impressive results. However, previous work did not put emphasis on question representation, and the question is converted into a fixed vector regardless of its candidate answers. This simple representation strategy is unable to express the proper information of the question. Hence, we present a neural attention-based model to represent the questions dynamically according to the different focuses of various candidate answer aspects. In addition, we leverage the global knowledge inside the underlying KB, aiming at integrating the rich KB information into the representation of the answers. And it also alleviates the out of vocabulary (OOV) problem, which helps the attention model to represent the question more precisely. The experimental results on WEBQUESTIONS demonstrate the effectiveness of the proposed approach.
May 27 2016 cs.AI
Nowadays, metro systems play an important role in meeting the urban transportation demand in large cities. The understanding of passenger route choice is critical for public transit management. The wide deployment of Automated Fare Collection(AFC) systems opens up a new opportunity. However, only each trip's tap-in and tap-out timestamp and stations can be directly obtained from AFC system records; the train and route chosen by a passenger are unknown, which are necessary to solve our problem. While existing methods work well in some specific situations, they don't work for complicated situations. In this paper, we propose a solution that needs no additional equipment or human involvement than the AFC systems. We develop a probabilistic model that can estimate from empirical analysis how the passenger flows are dispatched to different routes and trains. We validate our approach using a large scale data set collected from the Shenzhen metro system. The measured results provide us with useful inputs when building the passenger path choice model.