- In a streaming environment, there is often a need for statistical prediction models to detect and adapt to concept drifts (i.e., changes in the underlying relationship between the response and predictor data streams being modeled) so as to mitigate deteriorating predictive performance over time. Various concept drift detection approaches have been proposed in the past decades. However, they do not perform well across different concept drift types (e.g., gradual or abrupt, recurrent or irregular) and different data stream distributions (e.g., balanced and imbalanced labels). This paper presents a novel framework for statistical prediction models (such as a classifier) that detects and also adapts to the various concept drift types, even in the presence of imbalanced data labels. The framework leverages a hierarchical set of hypothesis tests in an online fashion to detect concept drifts and employs an adaptive training strategy to significantly boost its adaptation capability. The performance of the proposed concept drift detection and adaptation framework is compared to benchmark approaches using both simulated and real-world datasets spanning the breadth of concept drift types. The proposed approaches significantly outperform benchmark solutions in terms of precision, delay of detection as well as the adaptability across different concepts, regardless of data characteristics.
- Jul 25 2017 cs.NI arXiv:1707.07514v1Device-free localization plays an important role in many ubiquitous applications. Among the different technologies proposed, Wi-Fi based technology using commercial devices has attracted much attention due to its low cost, ease of deployment, and high potential for accurate localization. Existing solutions use either fingerprints that require labor-intensive radio-map survey and updates, or models constructed from empirical studies with dense deployment of Wi-Fi transceivers. In this work, we explore the Fresnel Zone Theory in physics and propose a generic Fresnel Penetration Model (FPM), which reveals the linear relationship between specific Fresnel zones and multicarrier Fresnel phase difference, along with the Fresnel phase offset caused by static multipath environments. We validate FPM in both outdoor and complex indoor environments. Furthermore, we design a multicarrier FPM based device-free localization system (MFDL), which overcomes a number of practical challenges, particularly the Fresnel phase difference estimation and phase offset calibration in multipath-rich indoor environments. Extensive experimental results show that compared with the state-of-the-art work (LiFS), our MFDL system achieves better localization accuracy with much fewer number of Wi-Fi transceivers. Specifically, using only three transceivers, the median localization error of MFDL is as low as 45$cm$ in an outdoor environment of 36$m^2$, and 55$cm$ in indoor settings of 25$m^2$. Increasing the number of transceivers to four allows us to achieve 75$cm$ median localization error in a 72$m^2$ indoor area, compared with the 1.1$m$ median localization error achieved by LiFS using 11 transceivers in a 70$m^2$ area.
- Jul 25 2017 cs.CL arXiv:1707.07279v1We study the helpful product reviews identification problem in this paper. We observe that the evidence-conclusion discourse relations, also known as arguments, often appear in product reviews, and we hypothesise that some argument-based features, e.g. the percentage of argumentative sentences, the evidences-conclusions ratios, are good indicators of helpful reviews. To validate this hypothesis, we manually annotate arguments in 110 hotel reviews, and investigate the effectiveness of several combinations of argument-based features. Experiments suggest that, when being used together with the argument-based features, the state-of-the-art baseline features can enjoy a performance boost (in terms of F1) of 11.01\% in average.
- Jul 18 2017 cs.SI physics.soc-ph arXiv:1707.05150v1In this paper, we are interested in modeling the diffusion of information in a multilayer network using thermodynamic diffusion approach. State of each agent is viewed as a topic mixture represented by a distribution over multiple topics. We have observed and learned diffusion-related thermodynamical patterns in the training data set, and we have used the estimated diffusion structure to predict the future states of the agents. A priori knowledge of a fraction of the state of all agents changes the problem to be a Kalman predictor problem that refines the predicted system state using the error in estimation of the agents. A real world Twitter data set is then used to evaluate and validate our information diffusion model.
- Jul 13 2017 cs.SY arXiv:1707.03657v1This paper investigates the consensus problem of multiple uncertain Lagrangian systems. Due to the discontinuity resulted from the switching topology, achieving consensus in the context of uncertain Lagrangian systems is challenging. We propose a new adaptive controller based on dynamic feedback to resolve this problem and additionally propose a new analysis tool for rigorously demonstrating the stability and convergence of the networked systems. The new introduced analysis tool is referred to as uniform integral-L_p stability, which is motivated for addressing integral-input-output properties of linear time-varying systems. It is then shown that the consensus errors between the systems converge to zero so long as the union of the graphs contains a directed spanning tree. It is also shown that the proposed controller enjoys the robustness with respect to constant communication delays. The performance of the proposed adaptive controllers is shown by numerical simulations.
- Jul 12 2017 cs.NI arXiv:1707.03203v1In this paper, we consider a wireless powered communication network (WPCN) consisting of a multi-antenna hybrid access point (HAP) that transfers wireless energy to and receives sensing data from a cluster of low-power wireless devices (WDs). To enhance the throughput performance of some far-away WDs, we allow one of the WDs to act as the cluster head (CH) that helps forward the messages of the other cluster members (CMs). However, the performance of the proposed cluster-based cooperation is fundamentally limited by the high energy consumption of the CH, who needs to transmit all the WDs' messages including its own. To tackle this issue, we exploit the capability of multi-antenna energy beamforming (EB) at the HAP, which can focus more transferred power to the CH to balance its energy consumption in assisting the other WDs. Specifically, we first derive the throughput performance of each individual WD under the proposed scheme. Then, we jointly optimize the EB design, the transmit time allocation among the HAP and the WDs, and the transmit power allocation of the CH to maximize the minimum data rate achievable among all the WDs (the max-min throughput) for improved throughput fairness among the WDs. An efficient optimal algorithm is proposed to solve the joint optimization problem. Moreover, we simulate under practical network setups and show that the proposed multi-antenna enabled cluster-based cooperation can effectively improve the throughput fairness of WPCN.
- Jul 11 2017 cs.CV arXiv:1707.02785v3Identity alignment models assume precisely annotated images manually. Human labelling is unrealistic on large sized imagery data. Detection models introduce varying amount of noise and hamper identity alignment performance. In this work, we propose to refine images by removing the undesired pixels. This is achieved by learning to eliminate less informative pixels in identity alignment. To this end, we formulate a method of automatically detecting and removing identity class irrelevant pixels in auto-detected bounding boxes. Experiments validate the benefits of our model in improving identity alignment.
- Jun 30 2017 cs.CV arXiv:1706.09579v2In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.
- Jun 28 2017 cs.GR arXiv:1706.08891v1Wayfinding signs play an important role in guiding users to navigate in a virtual environment and in helping pedestrians to find their ways in a real-world architectural site. Conventionally, the wayfinding design of a virtual environment is created manually, so as the wayfinding design of a real-world architectural site. The many possible navigation scenarios, as well as the interplay between signs and human navigation, can make the manual design process overwhelming and non-trivial. As a result, creating a wayfinding design for a typical layout can take months to several years. In this paper, we introduce the Way to Go! approach for automatically generating a wayfinding design for a given layout. The designer simply has to specify some navigation scenarios; our approach will automatically generate an optimized wayfinding design with signs properly placed considering human agents' visibility and possibility of making mistakes during a navigation. We demonstrate the effectiveness of our approach in generating wayfinding designs for different layouts such as a train station, a downtown and a canyon. We evaluate our results by comparing different wayfinding designs and show that our optimized wayfinding design can guide pedestrians to their destinations effectively and efficiently. Our approach can also help the designer visualize the accessibility of a destination from different locations, and correct any "blind zone" with additional signs.
- Jun 23 2017 cs.CV arXiv:1706.07157v1This thesis describes a study to perform change detection on Very High Resolution satellite images using image fusion based on 2D Discrete Wavelet Transform and Fuzzy C-Means clustering algorithm. Multiple other methods are also quantitatively and qualitatively compared in this study.
- We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1--4\% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.
- Spatial item recommendation has become an important means to help people discover interesting locations, especially when people pay a visit to unfamiliar regions. Some current researches are focusing on modelling individual and collective geographical preferences for spatial item recommendation based on users' check-in records, but they fail to explore the phenomenon of user interest drift across geographical regions, i.e., users would show different interests when they travel to different regions. Besides, they ignore the influence of public comments for subsequent users' check-in behaviors. Specifically, it is intuitive that users would refuse to check in to a spatial item whose historical reviews seem negative overall, even though it might fit their interests. Therefore, it is necessary to recommend the right item to the right user at the right location. In this paper, we propose a latent probabilistic generative model called LSARS to mimic the decision-making process of users' check-in activities both in home-town and out-of-town scenarios by adapting to user interest drift and crowd sentiments, which can learn location-aware and sentiment-aware individual interests from the contents of spatial items and user reviews. Due to the sparsity of user activities in out-of-town regions, LSARS is further designed to incorporate the public preferences learned from local users' check-in behaviors. Finally, we deploy LSARS into two practical application scenes: spatial item recommendation and target user discovery. Extensive experiments on two large-scale location-based social networks (LBSNs) datasets show that LSARS achieves better performance than existing state-of-the-art methods.
- Jun 20 2017 cs.SY arXiv:1706.06027v1In this paper, we develop two zonotope-based set-membership estimation algorithms for identification of time-varying parameters in linear models, where both additive and multiplicative uncertainties are treated explicitly. The two recursive algorithms can be differentiated by their ways of processing the data and required computations. The first algorithm, which is referred to as Cone And Zonotope Intersection (CAZI), requires solving linear programming problems at each iteration. The second algorithm, referred to as the Polyhedron And Zonotope Intersection (PAZI), involves linear programming as well as an optimization subject to linear matrix inequalities (LMIs). Both algorithms are capable of providing tight overbounds of the feasible solution set (FSS) in our numerical case studies. Furthermore, PAZI provides an additional opportunity of further analyzing the relation between the estimation results at different iterations. An application to health monitoring of marine engines is considered to demonstrate the utility and effectiveness of the algorithms.
- Jun 19 2017 cs.CV arXiv:1706.05150v1This article describes the final solution of team monkeytyping, who finished in second place in the YouTube-8M video understanding challenge. The dataset used in this challenge is a large-scale benchmark for multi-label video classification. We extend the work in [1] and propose several improvements for frame sequence modeling. We propose a network structure called Chaining that can better capture the interactions between labels. Also, we report our approaches in dealing with multi-scale information and attention pooling. In addition, We find that using the output of model ensemble as a side target in training can boost single model performance. We report our experiments in bagging, boosting, cascade, and stacking, and propose a stacking algorithm called attention weighted stacking. Our final submission is an ensemble that consists of 74 sub models, all of which are listed in the appendix.
- Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.
- Jun 13 2017 cs.CV arXiv:1706.03458v1With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art.
- Jun 13 2017 cs.NI arXiv:1706.03695v1Information Centric Networking is a new networking paradigm that treats content as first class entity. It provides content to users without regards to the current location of the content. The publish/subscribe systems have gained popularity in Internet. Pub/sub systems dismisses the need for users to request every content of their interest. Instead, the content is supplied to interested users (subscribers) as and when it is published. CCN/NDN are popular ICN proposals widely accepted in the ICN community however, they do not provide an efficient pub/sub mechanism. COPSS enhances CCN/NDN with an efficient pub/sub capability. Internet of Things (IoT) is a growing topic of interest in both Academia and Industry. The current designs for IoT relies on IP. However, the IoT devices are constrained in their available resources and IP is heavy for their operation.We observed that IoT's are information centric in nature and hence ICN is a more suitable candidate to support IoT environments. Although NDN and COPSS work well for the Internet, their current full fledged implementations cannot be used by the resource constrained IoT devices. CCN-lite is a light weight, inter-operable version of the CCNx protocol for supporting the IoT devices. However, CCN-lite like its ancestors lacks the support for an efficient pub/sub mechanism. In this paper, we developed COPSS-lite, an efficient and light weight implementation of pub/sub for IoT. COPSS-lite is developed to enhance CCN-lite and also support multi-hop connection by incorporating the famous RPL protocol for low power and lossy networks. We provide a preliminary evaluation to show proof of operability with real world sensor devices in IoT lab. Our results show that COPSS-lite is compact, operates on all platforms that support CCN-lite and we observe significant performance benefits with COPSS-lite in IoT environments.
- On a daily investment decision in a security market, the price earnings (PE) ratio is one of the most widely applied methods being used as a firm valuation tool by investment experts. Unfortunately, recent academic developments in financial econometrics and machine learning rarely look at this tool. In practice, fundamental PE ratios are often estimated only by subjective expert opinions. The purpose of this research is to formalize a process of fundamental PE estimation by employing advanced dynamic Bayesian network (DBN) methodology. The estimated PE ratio from our model can be used either as a information support for an expert to make investment decisions, or as an automatic trading system illustrated in experiments. Forward-backward inference and EM parameter estimation algorithms are derived with respect to the proposed DBN structure. Unlike existing works in literatures, the economic interpretation of our DBN model is well-justified by behavioral finance evidences of volatility. A simple but practical trading strategy is invented based on the result of Bayesian inference. Extensive experiments show that our trading strategy equipped with the inferenced PE ratios consistently outperforms standard investment benchmarks.
- Jun 09 2017 cs.CL arXiv:1706.02459v1Current Chinese social media text summarization models are based on an encoder-decoder framework. Although its generated summaries are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms baseline systems on a social media corpus.
- Jun 06 2017 cs.CV arXiv:1706.01061v1Faster R-CNN is one of the most representative and successful methods for object detection, and has been becoming increasingly popular in various objection detection applications. In this report, we propose a robust deep face detection approach based on Faster R-CNN. In our approach, we exploit several new techniques including new multi-task loss function design, online hard example mining, and multi-scale training strategy to improve Faster R-CNN in multiple aspects. The proposed approach is well suited for face detection, so we call it Face R-CNN. Extensive experiments are conducted on two most popular and challenging face detection benchmarks, FDDB and WIDER FACE, to demonstrate the superiority of the proposed approach over state-of-the-arts.
- In recent years, RTB(Real Time Bidding) becomes a popular online advertisement trading method. During the auction, each DSP is supposed to evaluate this opportunity and respond with an ad and corresponding bid price. Generally speaking, this is a kind of assginment problem for DSP. However, unlike traditional one, this assginment problem has bid price as additional variable. It's essential to find an optimal ad selection and bid price determination strategy. In this document, two major steps are taken to tackle it. First, the augmented GAP(Generalized Assignment Problem) is proposed and a general bidding strategy is correspondingly provided. Second, we show that DSP problem is a special case of the augmented GAP and the general bidding strategy applies. To the best of our knowledge, our solution is the first DSP bidding framework that is derived from strict second price auction assumption and is generally applicable to the multiple ads scenario with various objectives and constraints. Our strategy is verified through simulation and outperforms state-of-the-art strategies in real application.
- May 23 2017 cs.CV arXiv:1705.07594v1Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations.
- May 22 2017 cs.CV arXiv:1705.06861v1This letter adopts long short-term memory(LSTM) to predict sea surface temperature(SST), which is the first attempt, to our knowledge, to use recurrent neural network to solve the problem of SST prediction, and to make one week and one month daily prediction. We formulate the SST prediction problem as a time series regression problem. LSTM is a special kind of recurrent neural network, which introduces gate mechanism into vanilla RNN to prevent the vanished or exploding gradient problem. It has strong ability to model the temporal relationship of time series data and can handle the long-term dependency problem well. The proposed network architecture is composed of two kinds of layers: LSTM layer and full-connected dense layer. LSTM layer is utilized to model the time series relationship. Full-connected layer is utilized to map the output of LSTM layer to a final prediction. We explore the optimal setting of this architecture by experiments and report the accuracy of coastal seas of China to confirm the effectiveness of the proposed method. In addition, we also show its online updated characteristics.
- May 19 2017 cs.CL arXiv:1705.06463v1Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.
- In this paper, we study power-efficient resource allocation for multicarrier non-orthogonal multiple access (MC-NOMA) systems. The resource allocation algorithm design is formulated as a non-convex optimization problem which jointly designs the power allocation, rate allocation, user scheduling, and successive interference cancellation (SIC) decoding policy for minimizing the total transmit power. The proposed framework takes into account the imperfection of channel state information at transmitter (CSIT) and quality of service (QoS) requirements of users. To facilitate the design of optimal SIC decoding policy on each subcarrier, we define a channel-to-noise ratio outage threshold. Subsequently, the considered non-convex optimization problem is recast as a generalized linear multiplicative programming problem, for which a globally optimal solution is obtained via employing the branch-and-bound approach. The optimal resource allocation policy serves as a system performance benchmark due to its high computational complexity. To strike a balance between system performance and computational complexity, we propose a suboptimal iterative resource allocation algorithm based on difference of convex programming. Simulation results demonstrate that the suboptimal scheme achieves a close-to-optimal performance. Also, both proposed schemes provide significant transmit power savings than that of conventional orthogonal multiple access (OMA) schemes.
- Many robotic tasks require heavy computation, which can easily exceed the robot's onboard computer capability. A promising solution to address this challenge is outsourcing the computation to the cloud. However, exploiting the potential of cloud resources in robotic software is difficult, because it involves complex code modification and extensive (re)configuration procedures. Moreover, quality of service (QoS) such as timeliness, which is critical to robot's behavior, have to be considered. In this paper, we propose a transparent and QoS-aware software framework called Cloudroid for cloud robotic applications. This framework supports direct deployment of existing robotic software packages to the cloud, transparently transforming them into Internet-accessible cloud services. And with the automatically generated service stubs, robotic applications can outsource their computation to the cloud without any code modification. Furthermore, the robot and the cloud can cooperate to maintain the specific QoS property such as request response time, even in a highly dynamic and resource-competitive environment. We evaluated Cloudroid based on a group of typical robotic scenarios and a set of software packages widely adopted in real-world robot practices. Results show that robot's capability can be enhanced significantly without code modification and specific QoS objectives can be guaranteed. In certain tasks, the "cloud + robot" setup shows improved performance in orders of magnitude compared with the robot native setup.
- Massive public resume data emerging on the WWW indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as talent seeking and evaluation. Existing RA studies based on statistical analyzing have primarily focused on talent recruitment by identifying explicit attributes. However, they failed to discover the implicit semantic information, i.e., individual career progress patterns and social-relations, which are vital to comprehensive understanding of career development. Besides, how to visualize them for better human cognition is also challenging. To tackle these issues, we propose a visual analytics system ResumeVis to mine and visualize resume data. Firstly, a text-mining based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. By interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes' collective mobility. Case studies with over 2500 online officer resumes demonstrate the effectiveness of our system. We provide a demonstration video.
- We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field-of-view and depth-of-field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with remarkably better resolution, matching the performance of higher numerical aperture lenses, also significantly surpassing their limited field-of-view and depth-of-field. These results are transformative for various fields that use microscopy tools, including e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, our presented approach is broadly applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better and better as they continue to image specimen and establish new transformations among different modes of imaging.
- P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.
- This work shows how to efficiently construct binary de Bruijn sequences, even those with large orders, using the cycle joining method. The cycles are generated by an LFSR with a chosen period $e$ whose irreducible characteristic polynomial can be derived from any primitive polynomial of degree $n$ satisfying $e = \frac{2^n-1}{t}$ by $t$-decimation. The crux is our proof that determining Zech's logarithms is equivalent to identifying conjugate pairs shared by any pair of cycles. The approach quickly finds enough number of conjugate pairs between any two cycles to ensure the existence of trees containing all vertices in the adjacency graph of the LFSR. When the characteristic polynomial $f(x)$ is a product of distinct irreducible polynomials, we combine the approach via Zech's logarithms and a recently proposed method to determine the conjugate pairs. This allows us to efficiently generate de Bruijn sequences with larger orders. Along the way, we establish new properties of Zech's logarithms.
- We consider a classical k-center problem in trees. Let T be a tree of n vertices and every vertex has a nonnegative weight. The problem is to find k centers on the edges of T such that the maximum weighted distance from all vertices to their closest centers is minimized. Megiddo and Tamir (SIAM J. Comput., 1983) gave an algorithm that can solve the problem in O(n\log^2 n) time by using Cole's parametric search. Since then it has been open for over three decades whether the problem can be solved in O(n\log n) time. In this paper, we present an O(n\log n) time algorithm for the problem and thus settle the open problem affirmatively.
- May 08 2017 cs.CL arXiv:1705.02131v1Argument Component Boundary Detection (ACBD) is an important sub-task in argumentation mining; it aims at identifying the word sequences that constitute argument components, and is usually considered as the first sub-task in the argumentation mining pipeline. Existing ACBD methods heavily depend on task-specific knowledge, and require considerable human efforts on feature-engineering. To tackle these problems, in this work, we formulate ACBD as a sequence labeling problem and propose a variety of Recurrent Neural Network (RNN) based methods, which do not use domain specific or handcrafted features beyond the relative position of the sentence in the document. In particular, we propose a novel joint RNN model that can predict whether sentences are argumentative or not, and use the predicted results to more precisely detect the argument component boundaries. We evaluate our techniques on two corpora from two different genres; results suggest that our joint RNN model obtain the state-of-the-art performance on both datasets.
- May 08 2017 cs.CL arXiv:1705.02077v1Argumentation mining aims at automatically extracting the premises-claim discourse structures in natural language texts. There is a great demand for argumentation corpora for customer reviews. However, due to the controversial nature of the argumentation annotation task, there exist very few large-scale argumentation corpora for customer reviews. In this work, we novelly use the crowdsourcing technique to collect argumentation annotations in Chinese hotel reviews. As the first Chinese argumentation dataset, our corpus includes 4814 argument component annotations and 411 argument relation annotations, and its annotations qualities are comparable to some widely used argumentation corpora in other languages.
- May 05 2017 cs.CV arXiv:1705.01908v2Recently, realistic image generation using deep neural networks has become a hot topic in machine learning and computer vision. Images can be generated at the pixel level by learning from a large collection of images. Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a potential application in digital entertainment. In this paper, we investigate the sketch-to-image synthesis problem by using conditional generative adversarial networks (cGAN). We propose the auto-painter model which can automatically generate compatible colors for a sketch. The new model is not only capable of painting hand-draw sketch with proper colors, but also allowing users to indicate preferred colors. Experimental results on two sketch datasets show that the auto-painter performs better that existing image-to-image methods.
- May 01 2017 cs.CV arXiv:1704.08944v1Color and intensity are two important components in an image. Usually, groups of image pixels, which are similar in color or intensity, are an informative representation for an object. They are therefore particularly suitable for computer vision tasks, such as saliency detection and object proposal generation. However, image pixels, which share a similar real-world color, may be quite different since colors are often distorted by intensity. In this paper, we reinvestigate the affinity matrices originally used in image segmentation methods based on spectral clustering. A new affinity matrix, which is robust to color distortions, is formulated for object discovery. Moreover, a Cohesion Measurement (CM) for object regions is also derived based on the formulated affinity matrix. Based on the new Cohesion Measurement, a novel object discovery method is proposed to discover objects latent in an image by utilizing the eigenvectors of the affinity matrix. Then we apply the proposed method to both saliency detection and object proposal generation. Experimental results on several evaluation benchmarks demonstrate that the proposed CM based method has achieved promising performance for these two tasks.
- In this paper, we consider a coverage problem for uncertain points in a tree. Let T be a tree containing a set P of n (weighted) demand points, and the location of each demand point P_i∈P is uncertain but is known to appear in one of m_i points on T each associated with a probability. Given a covering range \lambda, the problem is to find a minimum number of points (called centers) on T to build facilities for serving (or covering) these demand points in the sense that for each uncertain point P_i∈P, the expected distance from P_i to at least one center is no more than $\lambda$. The problem has not been studied before. We present an O(|T|+M\log^2 M) time algorithm for the problem, where |T| is the number of vertices of T and M is the total number of locations of all uncertain points of P, i.e., M=\sum_P_i∈Pm_i. In addition, by using this algorithm, we solve a k-center problem on T for the uncertain points of P.
- In this paper, we consider the problems for covering multiple intervals on a line. Given a set B of m line segments (called "barriers") on a horizontal line L and another set S of n horizontal line segments of the same length in the plane, we want to move all segments of S to L so that their union covers all barriers and the maximum movement of all segments of S is minimized. Previously, an O(n^3 log n)-time algorithm was given for the problem but only for the special case m = 1. In this paper, we propose an O(n^2 log n log log n + nm log m)-time algorithm for any m, which improves the previous work even for m = 1. We then consider a line-constrained version of the problem in which the segments of S are all initially on the line L. Previously, an O(n log n)-time algorithm was known for the case m = 1. We present an algorithm of O((n + m) log(n + m)) time for any m. These problems may have applications in mobile sensor barrier coverage in wireless sensor networks.
- Orthogonal frequency division multiplexing (OFDM) has been widely used in communication systems operating in the millimeter wave (mmWave) band to combat frequency-selective fading and achieve multi-Gbps transmissions, such as IEEE 802.15.3c and IEEE 802.11ad. For mmWave systems with ultra high sampling rate requirements, the use of low-resolution analog-to-digital converters (ADCs) (i.e., 1-3 bits) ensures an acceptable level of power consumption and system costs. However, orthogonality among sub-channels in the OFDM system cannot be maintained because of the severe non-linearity caused by low-resolution ADC, which renders the design of data detector challenging. In this study, we develop an efficient algorithm for optimal data detection in the mmWave OFDM system with low-resolution ADCs. The analytical performance of the proposed detector is derived and verified to achieve the fundamental limit of the Bayesian optimal design. On the basis of the derived analytical expression, we further propose a power allocation (PA) scheme that seeks to minimize the average symbol error rate. In addition to the optimal data detector, we also develop a feasible channel estimation method, which can provide high-quality channel state information without significant pilot overhead. Simulation results confirm the accuracy of our analysis and illustrate that the performance of the proposed detector in conjunction with the proposed PA scheme is close to the optimal performance of the OFDM system with infinite-resolution ADC.
- Apr 12 2017 cs.SI physics.soc-ph arXiv:1704.03261v1Traces of user activities recorded in online social networks such as the creation, viewing and forwarding/sharing of information over time open new possibilities to quantitatively and systematically understand the information diffusion process on social networks. From an online social network like WeChat, we could collect a large number of information cascade trees, each of which tells the spreading trajectory of a message/information such as which user creates the information and which users view or forward the information shared by which neighbors. In this work, we propose two heterogeneous non-linear models. Both models are validated by the WeChat data in reproducing and explaining key features of cascade trees. Specifically, we firstly apply the Random Recursive Tree (RRT) to model the cascade tree topologies, capturing key features, i.e. the average path length and degree variance of a cascade tree in relation to the number of nodes (size) of the tree. The RRT model with a single parameter $\theta$ describes the growth mechanism of a tree, where a node in the existing tree has a probability $d_i^{\theta}$ of being connected to a newly added node that depends on the degree $d_i$ of the existing node. The identified parameter $\theta$ quantifies the relative depth or broadness of the cascade trees, indicating that information propagates via a star-like broadcasting or viral-like hop by hop spreading. The RRT model explains the appearance of hubs, thus a possibly smaller average path length as the cascade size increases, as observed in WeChat. We further propose the stochastic Susceptible View Forward Removed (SVFR) model to depict the dynamic user behaviors including creating, viewing, forwarding and ignoring a message on a given social network.
- Phase retrieval(PR) problem is a kind of ill-condition inverse problem which can be found in various of applications. Utilizing the sparse priority, an algorithm called SWF(Sparse Wirtinger Flow) is proposed in this paper to deal with sparse PR problem based on the Wirtinger flow method. SWF firstly recovers the support of the signal and then updates the evaluation by hard thresholding method with an elaborate initialization. Theoretical analyses show that SWF has a geometric convergence for any $k$ sparse $n$ length signal with the sampling complexity $\mathcal{O}(k^2\mathrm{log}n)$. To get $\varepsilon$ accuracy, the computational complexity of SWF is $\mathcal{O}(k^3n\mathrm{log}n\mathrm{log}\frac{1}{\varepsilon})$. Numerical tests also demonstrate that SWF performs better than state-of-the-art methods especially when we have no priori knowledge about sparsity $k$. Moreover, SWF is also robust to the noise
- Apr 11 2017 cs.CV arXiv:1704.02581v2Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.
- Apr 10 2017 cs.CV arXiv:1704.02224v1We propose a novel 3D neural network architecture for 3D hand pose estimation from a single depth image. Different from previous works that mostly run on 2D depth image domain and require intermediate or post process to bring in the supervision from 3D space, we convert the depth map to a 3D volumetric representation, and feed it into a 3D convolutional neural network(CNN) to directly produce the pose in 3D requiring no further process. Our system does not require the ground truth reference point for initialization, and our network architecture naturally integrates both local feature and global context in 3D space. To increase the coverage of the hand pose space of the training data, we render synthetic depth image by transferring hand pose from existing real image datasets. We evaluation our algorithm on two public benchmarks and achieve the state-of-the-art performance. The synthetic hand pose dataset will be available.
- Apr 07 2017 cs.AI arXiv:1704.01815v1The key issues pertaining to collection of epidemic disease data for our analysis purposes are that it is a labour intensive, time consuming and expensive process resulting in availability of sparse sample data which we use to develop prediction models. To address this sparse data issue, we present novel Incremental Transductive methods to circumvent the data collection process by applying previously acquired data to provide consistent, confidence-based labelling alternatives to field survey research. We investigated various reasoning approaches for semisupervised machine learning including Bayesian models for labelling data. The results show that using the proposed methods, we can label instances of data with a class of vector density at a high level of confidence. By applying the Liberal and Strict Training Approaches, we provide a labelling and classification alternative to standalone algorithms. The methods in this paper are components in the process of reducing the proliferation of the Schistosomiasis disease and its effects.
- Apr 05 2017 cs.CL arXiv:1704.00849v3Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for speech. Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment. In this paper, we propose a non-parallel VC framework with a variational autoencoding Wasserstein generative adversarial network (VAW-GAN) that explicitly considers a VC objective when building the speech model. Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality.
- Apr 04 2017 cs.CV arXiv:1704.00033v1We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience. The re-training process effectively performs distance metric learning under the object persistency constraints, to modify the view-manifold of object representations. It reduces the effective distance between the representations of different views of the same object without compromising the distance between those of the views of different objects, resulting in the untangling of the view-manifolds between individual objects within the same category and across categories. This untangling enables the model to discriminate and recognize objects within the same category, independent of viewpoints. We found that this ability is not limited to the trained objects, but transfers to novel objects in both trained and untrained categories, as well as to a variety of completely novel artificial synthetic objects. This transfer in learning suggests the modification of distance metrics in view- manifolds is more general and abstract, likely at the levels of parts, and independent of the specific objects or categories experienced during training. Interestingly, the resulting transformation of feature representation in the deep networks is found to significantly better match human perceptual similarity judgment than AlexNet, suggesting that object persistence could be an important constraint in the development of perceptual similarity judgment in biological neural networks.
- Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus on addressing audio information only.In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent success of convolutional neural networks (CNNs) in SE, we propose an audio-visual deep CNN (AVDCNN) SE model, which incorporates audio and visual streams into a unified network model.In the proposed AVDCNN SE model,audio and visual features are first processed using individual CNNs, and then, fused into a joint network to generate enhanced speech at an output layer. The AVDCNN model is trained in an end-to-end manner, and parameters are jointly learned through back-propagation. We evaluate enhanced speech using five objective criteria. Results show that the AVDCNN yields notably better performance as compared to an audio-only CNN-based SE model, confirming the effectiveness of integrating visual information into the SE process.
- Mar 28 2017 cs.CV arXiv:1703.08912v1In this paper, we will investigate the contribution of color names for salient object detection. Each input image is first converted to the color name space, which is consisted of 11 probabilistic channels. By exploring the topological structure relationship between the figure and the ground, we obtain a saliency map through a linear combination of a set of sequential attention maps. To overcome the limitation of only exploiting the surroundedness cue, two global cues with respect to color names are invoked for guiding the computation of another weighted saliency map. Finally, we integrate the two saliency maps into a unified framework to infer the saliency result. In addition, an improved post-processing procedure is introduced to effectively suppress the background while uniformly highlight the salient objects. Experimental results show that the proposed model produces more accurate saliency maps and performs well against 23 saliency models in terms of three evaluation metrics on three public datasets.
- This paper studies physical layer security in a wireless ad hoc network with numerous legitimate transmitter-receiver pairs and eavesdroppers. A hybrid full-/half-duplex receiver deployment strategy is proposed to secure legitimate transmissions, by letting a fraction of legitimate receivers work in the full-duplex (FD) mode sending jamming signals to confuse eavesdroppers upon their information receptions, and letting the other receivers work in the half-duplex mode just receiving their desired signals. The objective of this paper is to choose properly the fraction of FD receivers for achieving the optimal network security performance. Both accurate expressions and tractable approximations for the connection outage probability and the secrecy outage probability of an arbitrary legitimate link are derived, based on which the area secure link number, network-wide secrecy throughput and network-wide secrecy energy efficiency are optimized respectively. Various insights into the optimal fraction are further developed and its closed-form expressions are also derived under perfect self-interference cancellation or in a dense network. It is concluded that the fraction of FD receivers triggers a non-trivial trade-off between reliability and secrecy, and the proposed strategy can significantly enhance the network security performance.
- Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. A holy-grail solution towards generic image manipulation should be able to efficiently alter an input image with any personalized signals (even signals unseen during training), such as diverse paintings and arbitrary descriptive attributes. However, existing methods are either inefficient to simultaneously process multiple signals (let alone generalize to unseen signals), or unable to handle signals from other modalities. In this paper, we make the first attempt to address the zero-shot image manipulation task. We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a fully-differentiable architecture that jointly optimizes an image-transformation network (TNet) and a parameter network (PNet). The PNet learns to generate key transformation parameters for the TNet given any guiding signal while the TNet performs fast zero-shot image manipulation according to both signal-dependent parameters from the PNet and signal-invariant parameters from the TNet itself. Extensive experiments show that our ZM-Net can perform high-quality image manipulation conditioned on different forms of guiding signals (e.g. style images and attributes) in real-time (tens of milliseconds per image) even for unseen signals. Moreover, a large-scale style dataset with over 20,000 style images is also constructed to promote further research.
- Mar 20 2017 cs.CV arXiv:1703.05870v2Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR.
- Mar 14 2017 cs.CE arXiv:1703.03930v1Multiscale optimization is an attractive research field recently. For the most of optimization tools, design parameters should be updated during a close loop. Therefore, a simple Python code is programmed to obtain effective properties of Representative Volume Element (RVE) under Periodic Boundary Conditions (PBCs). It can compute the mechanical properties of a composite with a periodic structure, in two or three dimensions. The computation method is based on the Asymptotic Homogenization Theory (AHT). With simple modifications, the basic Python code may be extended to the computation of the effective properties of more complex microstructure. Moreover, the code provides a convenient platform upon the optimization for the material and geometric composite design. The user may experiment with various algorithms and tackle a wide range of problems. To verify the effectiveness and reliability of the code, a three-dimensional case is employed to illuminate the code. Finally numerical results obtained by the code agree well with the available theoretical and experimental results
- Given a rectilinear domain $\mathcal{P}$ of $h$ pairwise-disjoint rectilinear obstacles with a total of $n$ vertices in the plane, we study the problem of computing bicriteria rectilinear shortest paths between two points $s$ and $t$ in $\mathcal{P}$. Three types of bicriteria rectilinear paths are considered: minimum-link shortest paths, shortest minimum-link paths, and minimum-cost paths where the cost of a path is a non-decreasing function of both the number of edges and the length of the path. The one-point and two-point path queries are also considered. Algorithms for these problems have been given previously. Our contributions are threefold. First, we find a critical error in all previous algorithms. Second, we correct the error in a not-so-trivial way. Third, we further improve the algorithms so that they are even faster than the previous (incorrect) algorithms when $h$ is relatively small. For example, for the minimum-link shortest paths, we obtain the following results. Our algorithm computes a minimum-link shortest $s$-$t$ path in $O(n+h\log^{3/2} h)$ time. For the one-point queries, we build a data structure of size $O(n+ h\log h)$ in $O(n+h\log^{3/2} h)$ time for a source point $s$, such that given any query point $t$, a minimum-link shortest $s$-$t$ path can be determined in $O(\log n)$ time. For the two-point queries, with $O(n+h^2\log^2 h)$ time and space preprocessing, a minimum-link shortest $s$-$t$ path can be determined in $O(\log n+\log^2 h)$ time for any two query points $s$ and $t$; alternatively, with $O(n+h^2\cdot \log^{2} h \cdot 4^{\sqrt{\log h}})$ time and $O(n+h^2\cdot \log h \cdot 4^{\sqrt{\log h}})$ space preprocessing, we can answer each two-point query in $O(\log n)$ time.
- Mar 14 2017 cs.CE arXiv:1703.04355v1This study presents a meshless-based local reanalysis (MLR) method. The purpose of this study is to extend reanalysis methods to the Kriging interpolation meshless method due to its high efficiency. In this study, two reanalysis methods: combined approximations CA) and indirect factorization updating (IFU) methods are utilized. Considering the computational cost of meshless methods, the reanalysis method improves the efficiency of the full meshless method significantly. Compared with finite element method (FEM)-based reanalysis methods, the main superiority of meshless-based reanalysis method is to break the limitation of mesh connection. The meshless-based reanalysis is much easier to obtain the stiffness matrix even for solving the mesh distortion problems. However, compared with the FEM-based reanalysis method, the critical challenge is to use much more nodes in the influence domain due to high order interpolation. Therefore, a local reanalysis method which only needs to calculate the local stiffness matrix in the influence domain is suggested to improve the efficiency further. Several typical numerical examples are tested and the performance of the suggested method is verified.
- Let $s$ be a point in a polygonal domain $\mathcal{P}$ of $h-1$ holes and $n$ vertices. We consider a quickest visibility query problem. Given a query point $q$ in $\mathcal{P}$, the goal is to find a shortest path in $\mathcal{P}$ to move from $s$ to see $q$ as quickly as possible. Previously, Arkin et al. (SoCG 2015) built a data structure of size $O(n^22^{\alpha(n)}\log n)$ that can answer each query in $O(K\log^2 n)$ time, where $\alpha(n)$ is the inverse Ackermann function and $K$ is the size of the visibility polygon of $q$ in $\mathcal{P}$ (and $K$ can be $\Theta(n)$ in the worst case). In this paper, we present a new data structure of size $O(n\log h + h^2)$ that can answer each query in $O(h\log h\log n)$ time. Our result improves the previous work when $h$ is relatively small. In particular, if $h$ is a constant, then our result even matches the best result for the simple polygon case (i.e., $h=1$), which is optimal. As a by-product, we also have a new algorithm for a shortest-path-to-segment query problem. Given a query line segment $\tau$ in $\mathcal{P}$, the query seeks a shortest path from $s$ to all points of $\tau$. Previously, Arkin et al. gave a data structure of size $O(n^22^{\alpha(n)}\log n)$ that can answer each query in $O(\log^2 n)$ time, and another data structure of size $O(n^3\log n)$ with $O(\log n)$ query time. We present a data structure of size $O(n)$ with query time $O(h\log \frac{n}{h})$, which also favors small values of $h$ and is optimal when $h=O(1)$.
- Mar 08 2017 cs.CY arXiv:1703.02497v1Despite being popularly referred to as the ultimate solution for all problems of our current electric power system, smart grid is still a growing and unstable concept. It is usually considered as a set of advanced features powered by promising technological solutions. In this paper, we describe smart grid as a socio-technical transition and illustrate the evolutionary path on which a smart grid can be realized. Through this conceptual lens, we reveal the role of big data, and how it can fuel the organic growth of smart grid. We also provide a rough estimate of how much data will be potentially generated from different data sources, which helps clarify the big data challenges during the evolutionary process.
- Mar 06 2017 cs.CV arXiv:1703.01086v2This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks (RRPN), which is designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in orientation. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to the feature map for a text region classifier. The whole framework is built upon region proposal based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection comparing with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets, and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
- Mar 03 2017 physics.med-ph cs.CV arXiv:1703.00797v1Brain CT has become a standard imaging tool for emergent evaluation of brain condition, and measurement of midline shift (MLS) is one of the most important features to address for brain CT assessment. We present a simple method to estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons and our automated system were asked to measure MLS and MLS/ICWMAX in the same sets of axial CT images obtained from 41 patients admitted to ICU under neurosurgical service. A weighted midline (WML) was plotted based on individual pixel intensities, with higher weighted given to the darker portions. The MLS could then be measured as the distance between the WML and ideal midline (IML) near the foramen of Monro. The average processing time to output an automatic MLS measurement was around 10 seconds. Our automated system achieved an overall accuracy of 90.24% when the CT images were calibrated automatically, and performed better when the calibrations of head rotation were done manually (accuracy: 92.68%). MLS/ICWMAX and MLS both gave results in same confusion matrices and produced similar ROC curve results. We demonstrated a simple, fast and accurate automated system of MLS measurement and introduced a new parameter (MLS/ICWMAX) as a good alternative to MLS in terms of estimating the degree of brain deformation, especially when non-DICOM images (e.g. JPEG) are more easily accessed.
- This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.
- Feb 22 2017 cs.SY arXiv:1702.06265v1In this paper, we investigate the task-space consensus problem for multiple robotic systems with both the uncertain kinematics and dynamics in the case of existence of constant communication delays. We propose an observer-based adaptive controller to achieve the manipulable consensus without relying on the measurement of task-space velocities, and also formalize the concept of manipulability to quantify the degree of adjustability of the consensus value. The proposed new control scheme employs a new distributed observer that does not rely on the joint velocity, and a new kinematic parameter adaptation law with a distributed adaptive kinematic regressor matrix that is driven by both the observation and consensus errors. In addition, it is shown that the proposed controller has the separation property, which yields an adaptive kinematic controller that is applicable to most industrial/commercial robots. The performance of the proposed observer-based adaptive schemes are shown by numerical simulations.
- Feb 22 2017 cs.CL arXiv:1702.06239v1Argument component detection (ACD) is an important sub-task in argumentation mining. ACD aims at detecting and classifying different argument components in natural language texts. Historical annotations (HAs) are important features the human annotators consider when they manually perform the ACD task. However, HAs are largely ignored by existing automatic ACD techniques. Reinforcement learning (RL) has proven to be an effective method for using HAs in some natural language processing tasks. In this work, we propose a RL-based ACD technique, and evaluate its performance on two well-annotated corpora. Results suggest that, in terms of classification accuracy, HAs-augmented RL outperforms plain RL by at most 17.85%, and outperforms the state-of-the-art supervised learning algorithm by at most 11.94%.
- We study a demand response problem from operator's perspective with realistic settings, in which the operator faces uncertainty and limited communication. Specifically, the operator does not know the cost function of consumers and cannot have multiple rounds of information exchange with consumers. We formulate an optimization problem for the operator to minimize its operational cost considering time-varying demand response targets and responses of consumers. We develop a joint online learning and pricing algorithm. In each time slot, the operator sends out a price signal to all consumers and estimates the cost functions of consumers based on their noisy responses. We measure the performance of our algorithm using regret analysis and show that our online algorithm achieves logarithmic regret with respect to the operating horizon. In addition, our algorithm employs linear regression to estimate the aggregate response of consumers, making it easy to implement in practice. Simulation experiments validate the theoretic results and show that the performance gap between our algorithm and the offline optimality decays quickly.
- Feb 09 2017 cs.CV physics.med-ph arXiv:1702.02223v1The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images. Because CNN does not need tumor segmentation or feature calculation, it is more convenient and more objective than the classical methods. However, CNN does not make use of the import diagnostic features, which have been proved more discriminative than the texture features for classifying small-sized lymph nodes. Therefore, incorporating the diagnostic features into CNN is a promising direction for future research.
- Feb 08 2017 cs.CG arXiv:1702.01836v1We study approximation algorithms for the following geometric version of the maximum coverage problem: Let $\mathcal{P}$ be a set of $n$ weighted points in the plane. Let $D$ represent a planar object, such as a rectangle, or a disk. We want to place $m$ copies of $D$ such that the sum of the weights of the points in $\mathcal{P}$ covered by these copies is maximized. For any fixed $\varepsilon>0$, we present efficient approximation schemes that can find a $(1-\varepsilon)$-approximation to the optimal solution. In particular, for $m=1$ and for the special case where $D$ is a rectangle, our algorithm runs in time $O(n\log (\frac{1}{\varepsilon}))$, improving on the previous result. For $m>1$ and the rectangular case, our algorithm runs in $O(\frac{n}{\varepsilon}\log (\frac{1}{\varepsilon})+\frac{m}{\varepsilon}\log m +m(\frac{1}{\varepsilon})^{O(\min(\sqrt{m},\frac{1}{\varepsilon}))})$ time. For a more general class of shapes (including disks, polygons with $O(1)$ edges), our algorithm runs in $O(n(\frac{1}{\varepsilon})^{O(1)}+\frac{m}{\epsilon}\log m + m(\frac{1}{\varepsilon})^{O(\min(m,\frac{1}{\varepsilon^2}))})$ time.
- Kriging or Gaussian Process Regression is applied in many fields as a non-linear regression model as well as a surrogate model in the field of evolutionary computation. However, the computational and space complexity of Kriging, that is cubic and quadratic in the number of data points respectively, becomes a major bottleneck with more and more data available nowadays. In this paper, we propose a general methodology for the complexity reduction, called cluster Kriging, where the whole data set is partitioned into smaller clusters and multiple Kriging models are built on top of them. In addition, four Kriging approximation algorithms are proposed as candidate algorithms within the new framework. Each of these algorithms can be applied to much larger data sets while maintaining the advantages and power of Kriging. The proposed algorithms are explained in detail and compared empirically against a broad set of existing state-of-the-art Kriging approximation methods on a well-defined testing framework. According to the empirical study, the proposed algorithms consistently outperform the existing algorithms. Moreover, some practical suggestions are provided for using the proposed algorithms.
- Feb 02 2017 cs.CV arXiv:1702.00254v3We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
- It is known that certain structures of the signal in addition to the standard notion of sparsity (called structured sparsity) can improve the sample complexity in several compressive sensing applications. Recently, Hegde et al. proposed a framework, called approximation-tolerant model-based compressive sensing, for recovering signals with structured sparsity. Their framework requires two oracles, the head- and the tail-approximation projection oracles. The two oracles should return approximate solutions in the model which is closest to the query signal. In this paper, we consider two structured sparsity models and obtain improved projection algorithms. The first one is the tree sparsity model, which captures the support structure in the wavelet decomposition of piecewise-smooth signals. We propose a linear time $(1-\epsilon)$-approximation algorithm for head-approximation projection and a linear time $(1+\epsilon)$-approximation algorithm for tail-approximation projection. The best previous result is an $\tilde{O}(n\log n)$ time bicriterion approximation algorithm (meaning that their algorithm may return a solution of sparsity larger than $k$) by Hegde et al. Our result provides an affirmative answer to the open problem mentioned in the survey of Hegde and Indyk. As a corollary, we can recover a constant approximate $k$-sparse signal. The other is the Constrained Earth Mover Distance (CEMD) model, which is useful to model the situation where the positions of the nonzero coefficients of a signal do not change significantly as a function of spatial (or temporal) locations. We obtain the first single criterion constant factor approximation algorithm for the head-approximation projection. The previous best known algorithm is a bicriterion approximation. Using this result, we can get a faster constant approximation algorithm with fewer measurements for the recovery problem in CEMD model.
- Jan 16 2017 cs.LG arXiv:1701.03647v1Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by pairwise constraints and the process of encoding is conducted under these guidances. The pairwise constraints are encoded in hidden layer features of pcGRBM. Then, some pairwise hidden features of pcGRBM flock together and another part of them are separated by the guidances. In order to deal with real-valued data, the binary visible units are replaced by linear units with Gausian noise in the pcGRBM model. In the learning process of pcGRBM, the pairwise constraints are iterated transitions between visible and hidden units during CD learning procedure. Then, the proposed model is inferred by approximative gradient descent method and the corresponding learning algorithm is designed in this paper. In order to compare the availability of pcGRBM and traditional RBMs with Gaussian visible units, the features of the pcGRBM and RBMs hidden layer are used as input 'data' for K-means, spectral clustering (SP) and affinity propagation (AP) algorithms, respectively. A thorough experimental evaluation is performed with sixteen image datasets of Microsoft Research Asia Multimedia (MSRA-MM). The experimental results show that the clustering performance of K-means, SP and AP algorithms based on pcGRBM model are significantly better than traditional RBMs. In addition, the pcGRBM model for clustering task shows better performance than some semi-supervised clustering algorithms.
- Jan 09 2017 cs.MM arXiv:1701.01500v2A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., $1920 \times 1080$, $1280 \times 720$, $960 \times 540$ and $640 \times 360$). For each of the 880 video clips, we encode it using the H.264 codec with $QP=1, \cdots, 51$ and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)". This work describes the subjective test procedure, detection and removal of outlying measured data, and the properties of collected JND data. Finally, the significance and implications of the VideoSet to future video coding research and standardization efforts are pointed out. All source/coded video clips as well as measured JND data included in the VideoSet are available to the public in the IEEE DataPort.
- Phase retrieval(PR) problem is a kind of ill-condition inverse problem which is arising in various of applications. Based on the Wirtinger flow(WF) method, a reweighted Wirtinger flow(RWF) method is proposed to deal with PR problem. RWF finds the global optimum by solving a series of sub-PR problems with changing weights. Theoretical analyses illustrate that the RWF has a geometric convergence from a deliberate initialization when the weights are bounded by 1 and $\frac{10}{9}$. Numerical testing shows RWF has a lower sampling complexity compared with WF. As an essentially adaptive truncated Wirtinger flow(TWF) method, RWF performs better than TWF especially when the ratio between sampling number $m$ and length of signal $n$ is small.
- We determine the cycle structure of linear feedback shift register with arbitrary monic characteristic polynomial over any finite field. For each cycle, a method to find a state and a new way to represent the state are proposed.