results for au:Wang_H in:cs

- In recent years, RTB(Real Time Bidding) becomes a popular online advertisement trading method. During the auction, each DSP is supposed to evaluate this opportunity and respond with an ad and corresponding bid price. Generally speaking, this is a kind of assginment problem for DSP. However, unlike traditional one, this assginment problem has bid price as additional variable. It's essential to find an optimal ad selection and bid price determination strategy. In this document, two major steps are taken to tackle it. First, the augmented GAP(Generalized Assignment Problem) is proposed and a general bidding strategy is correspondingly provided. Second, we show that DSP problem is a special case of the augmented GAP and the general bidding strategy applies. To the best of our knowledge, our solution is the first DSP bidding framework that is derived from strict second price auction assumption and is generally applicable to the multiple ads scenario with various objectives and constraints. Our strategy is verified through simulation and outperforms state-of-the-art strategies in real application.
- May 23 2017 cs.CV arXiv:1705.07594v1Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations.
- May 22 2017 cs.CV arXiv:1705.06861v1This letter adopts long short-term memory(LSTM) to predict sea surface temperature(SST), which is the first attempt, to our knowledge, to use recurrent neural network to solve the problem of SST prediction, and to make one week and one month daily prediction. We formulate the SST prediction problem as a time series regression problem. LSTM is a special kind of recurrent neural network, which introduces gate mechanism into vanilla RNN to prevent the vanished or exploding gradient problem. It has strong ability to model the temporal relationship of time series data and can handle the long-term dependency problem well. The proposed network architecture is composed of two kinds of layers: LSTM layer and full-connected dense layer. LSTM layer is utilized to model the time series relationship. Full-connected layer is utilized to map the output of LSTM layer to a final prediction. We explore the optimal setting of this architecture by experiments and report the accuracy of coastal seas of China to confirm the effectiveness of the proposed method. In addition, we also show its online updated characteristics.
- May 19 2017 cs.CL arXiv:1705.06463v1Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.
- In this paper, we study power-efficient resource allocation for multicarrier non-orthogonal multiple access (MC-NOMA) systems. The resource allocation algorithm design is formulated as a non-convex optimization problem which jointly designs the power allocation, rate allocation, user scheduling, and successive interference cancellation (SIC) decoding policy for minimizing the total transmit power. The proposed framework takes into account the imperfection of channel state information at transmitter (CSIT) and quality of service (QoS) requirements of users. To facilitate the design of optimal SIC decoding policy on each subcarrier, we define a channel-to-noise ratio outage threshold. Subsequently, the considered non-convex optimization problem is recast as a generalized linear multiplicative programming problem, for which a globally optimal solution is obtained via employing the branch-and-bound approach. The optimal resource allocation policy serves as a system performance benchmark due to its high computational complexity. To strike a balance between system performance and computational complexity, we propose a suboptimal iterative resource allocation algorithm based on difference of convex programming. Simulation results demonstrate that the suboptimal scheme achieves a close-to-optimal performance. Also, both proposed schemes provide significant transmit power savings than that of conventional orthogonal multiple access (OMA) schemes.
- Many robotic tasks require heavy computation, which can easily exceed the robot's onboard computer capability. A promising solution to address this challenge is outsourcing the computation to the cloud. However, exploiting the potential of cloud resources in robotic software is difficult, because it involves complex code modification and extensive (re)configuration procedures. Moreover, quality of service (QoS) such as timeliness, which is critical to robot's behavior, have to be considered. In this paper, we propose a transparent and QoS-aware software framework called Cloudroid for cloud robotic applications. This framework supports direct deployment of existing robotic software packages to the cloud, transparently transforming them into Internet-accessible cloud services. And with the automatically generated service stubs, robotic applications can outsource their computation to the cloud without any code modification. Furthermore, the robot and the cloud can cooperate to maintain the specific QoS property such as request response time, even in a highly dynamic and resource-competitive environment. We evaluated Cloudroid based on a group of typical robotic scenarios and a set of software packages widely adopted in real-world robot practices. Results show that robot's capability can be enhanced significantly without code modification and specific QoS objectives can be guaranteed. In certain tasks, the "cloud + robot" setup shows improved performance in orders of magnitude compared with the robot native setup.
- Massive public resume data emerging on the WWW indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as talent seeking and evaluation. Existing RA studies based on statistical analyzing have primarily focused on talent recruitment by identifying explicit attributes. However, they failed to discover the implicit semantic information, i.e., individual career progress patterns and social-relations, which are vital to comprehensive understanding of career development. Besides, how to visualize them for better human cognition is also challenging. To tackle these issues, we propose a visual analytics system ResumeVis to mine and visualize resume data. Firstly, a text-mining based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. By interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes' collective mobility. Case studies with over 2500 online officer resumes demonstrate the effectiveness of our system. We provide a demonstration video.
- We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field-of-view and depth-of-field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with remarkably better resolution, matching the performance of higher numerical aperture lenses, also significantly surpassing their limited field-of-view and depth-of-field. These results are transformative for various fields that use microscopy tools, including e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, our presented approach is broadly applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better and better as they continue to image specimen and establish new transformations among different modes of imaging.
- P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.
- This work shows how to efficiently construct binary de Bruijn sequences, even those with large orders, using the cycle joining method. The cycles are generated by an LFSR with a chosen period $e$ whose irreducible characteristic polynomial can be derived from any primitive polynomial of degree $n$ satisfying $e = \frac{2^n-1}{t}$ by $t$-decimation. The crux is our proof that determining Zech's logarithms is equivalent to identifying conjugate pairs shared by any pair of cycles. The approach quickly finds enough number of conjugate pairs between any two cycles to ensure the existence of trees containing all vertices in the adjacency graph of the LFSR. When the characteristic polynomial $f(x)$ is a product of distinct irreducible polynomials, we combine the approach via Zech's logarithms and a recently proposed method to determine the conjugate pairs. This allows us to efficiently generate de Bruijn sequences with larger orders. Along the way, we establish new properties of Zech's logarithms.
- We consider a classical k-center problem in trees. Let T be a tree of n vertices and every vertex has a nonnegative weight. The problem is to find k centers on the edges of T such that the maximum weighted distance from all vertices to their closest centers is minimized. Megiddo and Tamir (SIAM J. Comput., 1983) gave an algorithm that can solve the problem in O(n\log^2 n) time by using Cole's parametric search. Since then it has been open for over three decades whether the problem can be solved in O(n\log n) time. In this paper, we present an O(n\log n) time algorithm for the problem and thus settle the open problem affirmatively.
- May 08 2017 cs.CL arXiv:1705.02131v1Argument Component Boundary Detection (ACBD) is an important sub-task in argumentation mining; it aims at identifying the word sequences that constitute argument components, and is usually considered as the first sub-task in the argumentation mining pipeline. Existing ACBD methods heavily depend on task-specific knowledge, and require considerable human efforts on feature-engineering. To tackle these problems, in this work, we formulate ACBD as a sequence labeling problem and propose a variety of Recurrent Neural Network (RNN) based methods, which do not use domain specific or handcrafted features beyond the relative position of the sentence in the document. In particular, we propose a novel joint RNN model that can predict whether sentences are argumentative or not, and use the predicted results to more precisely detect the argument component boundaries. We evaluate our techniques on two corpora from two different genres; results suggest that our joint RNN model obtain the state-of-the-art performance on both datasets.
- May 08 2017 cs.CL arXiv:1705.02077v1Argumentation mining aims at automatically extracting the premises-claim discourse structures in natural language texts. There is a great demand for argumentation corpora for customer reviews. However, due to the controversial nature of the argumentation annotation task, there exist very few large-scale argumentation corpora for customer reviews. In this work, we novelly use the crowdsourcing technique to collect argumentation annotations in Chinese hotel reviews. As the first Chinese argumentation dataset, our corpus includes 4814 argument component annotations and 411 argument relation annotations, and its annotations qualities are comparable to some widely used argumentation corpora in other languages.
- May 05 2017 cs.CV arXiv:1705.01908v2Recently, realistic image generation using deep neural networks has become a hot topic in machine learning and computer vision. Images can be generated at the pixel level by learning from a large collection of images. Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a potential application in digital entertainment. In this paper, we investigate the sketch-to-image synthesis problem by using conditional generative adversarial networks (cGAN). We propose the auto-painter model which can automatically generate compatible colors for a sketch. The new model is not only capable of painting hand-draw sketch with proper colors, but also allowing users to indicate preferred colors. Experimental results on two sketch datasets show that the auto-painter performs better that existing image-to-image methods.
- May 01 2017 cs.CV arXiv:1704.08944v1Color and intensity are two important components in an image. Usually, groups of image pixels, which are similar in color or intensity, are an informative representation for an object. They are therefore particularly suitable for computer vision tasks, such as saliency detection and object proposal generation. However, image pixels, which share a similar real-world color, may be quite different since colors are often distorted by intensity. In this paper, we reinvestigate the affinity matrices originally used in image segmentation methods based on spectral clustering. A new affinity matrix, which is robust to color distortions, is formulated for object discovery. Moreover, a Cohesion Measurement (CM) for object regions is also derived based on the formulated affinity matrix. Based on the new Cohesion Measurement, a novel object discovery method is proposed to discover objects latent in an image by utilizing the eigenvectors of the affinity matrix. Then we apply the proposed method to both saliency detection and object proposal generation. Experimental results on several evaluation benchmarks demonstrate that the proposed CM based method has achieved promising performance for these two tasks.
- In this paper, we consider a coverage problem for uncertain points in a tree. Let T be a tree containing a set P of n (weighted) demand points, and the location of each demand point P_i∈P is uncertain but is known to appear in one of m_i points on T each associated with a probability. Given a covering range \lambda, the problem is to find a minimum number of points (called centers) on T to build facilities for serving (or covering) these demand points in the sense that for each uncertain point P_i∈P, the expected distance from P_i to at least one center is no more than $\lambda$. The problem has not been studied before. We present an O(|T|+M\log^2 M) time algorithm for the problem, where |T| is the number of vertices of T and M is the total number of locations of all uncertain points of P, i.e., M=\sum_P_i∈Pm_i. In addition, by using this algorithm, we solve a k-center problem on T for the uncertain points of P.
- In this paper, we consider the problems for covering multiple intervals on a line. Given a set B of m line segments (called "barriers") on a horizontal line L and another set S of n horizontal line segments of the same length in the plane, we want to move all segments of S to L so that their union covers all barriers and the maximum movement of all segments of S is minimized. Previously, an O(n^3 log n)-time algorithm was given for the problem but only for the special case m = 1. In this paper, we propose an O(n^2 log n log log n + nm log m)-time algorithm for any m, which improves the previous work even for m = 1. We then consider a line-constrained version of the problem in which the segments of S are all initially on the line L. Previously, an O(n log n)-time algorithm was known for the case m = 1. We present an algorithm of O((n + m) log(n + m)) time for any m. These problems may have applications in mobile sensor barrier coverage in wireless sensor networks.
- Orthogonal frequency division multiplexing (OFDM) has been widely used in communication systems operating in the millimeter wave (mmWave) band to combat frequency-selective fading and achieve multi-Gbps transmissions, such as IEEE 802.15.3c and IEEE 802.11ad. For mmWave systems with ultra high sampling rate requirements, the use of low-resolution analog-to-digital converters (ADCs) (i.e., 1-3 bits) ensures an acceptable level of power consumption and system costs. However, orthogonality among sub-channels in the OFDM system cannot be maintained because of the severe non-linearity caused by low-resolution ADC, which renders the design of data detector challenging. In this study, we develop an efficient algorithm for optimal data detection in the mmWave OFDM system with low-resolution ADCs. The analytical performance of the proposed detector is derived and verified to achieve the fundamental limit of the Bayesian optimal design. On the basis of the derived analytical expression, we further propose a power allocation (PA) scheme that seeks to minimize the average symbol error rate. In addition to the optimal data detector, we also develop a feasible channel estimation method, which can provide high-quality channel state information without significant pilot overhead. Simulation results confirm the accuracy of our analysis and illustrate that the performance of the proposed detector in conjunction with the proposed PA scheme is close to the optimal performance of the OFDM system with infinite-resolution ADC.
- Apr 12 2017 cs.SI physics.soc-ph arXiv:1704.03261v1Traces of user activities recorded in online social networks such as the creation, viewing and forwarding/sharing of information over time open new possibilities to quantitatively and systematically understand the information diffusion process on social networks. From an online social network like WeChat, we could collect a large number of information cascade trees, each of which tells the spreading trajectory of a message/information such as which user creates the information and which users view or forward the information shared by which neighbors. In this work, we propose two heterogeneous non-linear models. Both models are validated by the WeChat data in reproducing and explaining key features of cascade trees. Specifically, we firstly apply the Random Recursive Tree (RRT) to model the cascade tree topologies, capturing key features, i.e. the average path length and degree variance of a cascade tree in relation to the number of nodes (size) of the tree. The RRT model with a single parameter $\theta$ describes the growth mechanism of a tree, where a node in the existing tree has a probability $d_i^{\theta}$ of being connected to a newly added node that depends on the degree $d_i$ of the existing node. The identified parameter $\theta$ quantifies the relative depth or broadness of the cascade trees, indicating that information propagates via a star-like broadcasting or viral-like hop by hop spreading. The RRT model explains the appearance of hubs, thus a possibly smaller average path length as the cascade size increases, as observed in WeChat. We further propose the stochastic Susceptible View Forward Removed (SVFR) model to depict the dynamic user behaviors including creating, viewing, forwarding and ignoring a message on a given social network.
- Phase retrieval(PR) problem is a kind of ill-condition inverse problem which can be found in various of applications. Utilizing the sparse priority, an algorithm called SWF(Sparse Wirtinger Flow) is proposed in this paper to deal with sparse PR problem based on the Wirtinger flow method. SWF firstly recovers the support of the signal and then updates the evaluation by hard thresholding method with an elaborate initialization. Theoretical analyses show that SWF has a geometric convergence for any $k$ sparse $n$ length signal with the sampling complexity $\mathcal{O}(k^2\mathrm{log}n)$. To get $\varepsilon$ accuracy, the computational complexity of SWF is $\mathcal{O}(k^3n\mathrm{log}n\mathrm{log}\frac{1}{\varepsilon})$. Numerical tests also demonstrate that SWF performs better than state-of-the-art methods especially when we have no priori knowledge about sparsity $k$. Moreover, SWF is also robust to the noise
- Apr 11 2017 cs.CV arXiv:1704.02581v2Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.
- Apr 10 2017 cs.CV arXiv:1704.02224v1We propose a novel 3D neural network architecture for 3D hand pose estimation from a single depth image. Different from previous works that mostly run on 2D depth image domain and require intermediate or post process to bring in the supervision from 3D space, we convert the depth map to a 3D volumetric representation, and feed it into a 3D convolutional neural network(CNN) to directly produce the pose in 3D requiring no further process. Our system does not require the ground truth reference point for initialization, and our network architecture naturally integrates both local feature and global context in 3D space. To increase the coverage of the hand pose space of the training data, we render synthetic depth image by transferring hand pose from existing real image datasets. We evaluation our algorithm on two public benchmarks and achieve the state-of-the-art performance. The synthetic hand pose dataset will be available.
- Apr 07 2017 cs.AI arXiv:1704.01815v1The key issues pertaining to collection of epidemic disease data for our analysis purposes are that it is a labour intensive, time consuming and expensive process resulting in availability of sparse sample data which we use to develop prediction models. To address this sparse data issue, we present novel Incremental Transductive methods to circumvent the data collection process by applying previously acquired data to provide consistent, confidence-based labelling alternatives to field survey research. We investigated various reasoning approaches for semisupervised machine learning including Bayesian models for labelling data. The results show that using the proposed methods, we can label instances of data with a class of vector density at a high level of confidence. By applying the Liberal and Strict Training Approaches, we provide a labelling and classification alternative to standalone algorithms. The methods in this paper are components in the process of reducing the proliferation of the Schistosomiasis disease and its effects.
- Apr 05 2017 cs.CL arXiv:1704.00849v2Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for speech. Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment. In this paper, we propose a non-parallel VC framework with a Wasserstein generative adversarial network (W-GAN) that explicitly takes a VC-related objective into account. Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality.
- Apr 04 2017 cs.CV arXiv:1704.00033v1We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience. The re-training process effectively performs distance metric learning under the object persistency constraints, to modify the view-manifold of object representations. It reduces the effective distance between the representations of different views of the same object without compromising the distance between those of the views of different objects, resulting in the untangling of the view-manifolds between individual objects within the same category and across categories. This untangling enables the model to discriminate and recognize objects within the same category, independent of viewpoints. We found that this ability is not limited to the trained objects, but transfers to novel objects in both trained and untrained categories, as well as to a variety of completely novel artificial synthetic objects. This transfer in learning suggests the modification of distance metrics in view- manifolds is more general and abstract, likely at the levels of parts, and independent of the specific objects or categories experienced during training. Interestingly, the resulting transformation of feature representation in the deep networks is found to significantly better match human perceptual similarity judgment than AlexNet, suggesting that object persistence could be an important constraint in the development of perceptual similarity judgment in biological neural networks.
- Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus on addressing audio information only.In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent success of convolutional neural networks (CNNs) in SE, we propose an audio-visual deep CNN (AVDCNN) SE model, which incorporates audio and visual streams into a unified network model.In the proposed AVDCNN SE model,audio and visual features are first processed using individual CNNs, and then, fused into a joint network to generate enhanced speech at an output layer. The AVDCNN model is trained in an end-to-end manner, and parameters are jointly learned through back-propagation. We evaluate enhanced speech using five objective criteria. Results show that the AVDCNN yields notably better performance as compared to an audio-only CNN-based SE model, confirming the effectiveness of integrating visual information into the SE process.
- Mar 28 2017 cs.CV arXiv:1703.08912v1In this paper, we will investigate the contribution of color names for salient object detection. Each input image is first converted to the color name space, which is consisted of 11 probabilistic channels. By exploring the topological structure relationship between the figure and the ground, we obtain a saliency map through a linear combination of a set of sequential attention maps. To overcome the limitation of only exploiting the surroundedness cue, two global cues with respect to color names are invoked for guiding the computation of another weighted saliency map. Finally, we integrate the two saliency maps into a unified framework to infer the saliency result. In addition, an improved post-processing procedure is introduced to effectively suppress the background while uniformly highlight the salient objects. Experimental results show that the proposed model produces more accurate saliency maps and performs well against 23 saliency models in terms of three evaluation metrics on three public datasets.
- This paper studies physical layer security in a wireless ad hoc network with numerous legitimate transmitter-receiver pairs and eavesdroppers. A hybrid full-/half-duplex receiver deployment strategy is proposed to secure legitimate transmissions, by letting a fraction of legitimate receivers work in the full-duplex (FD) mode sending jamming signals to confuse eavesdroppers upon their information receptions, and letting the other receivers work in the half-duplex mode just receiving their desired signals. The objective of this paper is to choose properly the fraction of FD receivers for achieving the optimal network security performance. Both accurate expressions and tractable approximations for the connection outage probability and the secrecy outage probability of an arbitrary legitimate link are derived, based on which the area secure link number, network-wide secrecy throughput and network-wide secrecy energy efficiency are optimized respectively. Various insights into the optimal fraction are further developed and its closed-form expressions are also derived under perfect self-interference cancellation or in a dense network. It is concluded that the fraction of FD receivers triggers a non-trivial trade-off between reliability and secrecy, and the proposed strategy can significantly enhance the network security performance.
- Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. A holy-grail solution towards generic image manipulation should be able to efficiently alter an input image with any personalized signals (even signals unseen during training), such as diverse paintings and arbitrary descriptive attributes. However, existing methods are either inefficient to simultaneously process multiple signals (let alone generalize to unseen signals), or unable to handle signals from other modalities. In this paper, we make the first attempt to address the zero-shot image manipulation task. We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a fully-differentiable architecture that jointly optimizes an image-transformation network (TNet) and a parameter network (PNet). The PNet learns to generate key transformation parameters for the TNet given any guiding signal while the TNet performs fast zero-shot image manipulation according to both signal-dependent parameters from the PNet and signal-invariant parameters from the TNet itself. Extensive experiments show that our ZM-Net can perform high-quality image manipulation conditioned on different forms of guiding signals (e.g. style images and attributes) in real-time (tens of milliseconds per image) even for unseen signals. Moreover, a large-scale style dataset with over 20,000 style images is also constructed to promote further research.
- Mar 20 2017 cs.CV arXiv:1703.05870v2Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR.
- Mar 14 2017 cs.CE arXiv:1703.03930v1Multiscale optimization is an attractive research field recently. For the most of optimization tools, design parameters should be updated during a close loop. Therefore, a simple Python code is programmed to obtain effective properties of Representative Volume Element (RVE) under Periodic Boundary Conditions (PBCs). It can compute the mechanical properties of a composite with a periodic structure, in two or three dimensions. The computation method is based on the Asymptotic Homogenization Theory (AHT). With simple modifications, the basic Python code may be extended to the computation of the effective properties of more complex microstructure. Moreover, the code provides a convenient platform upon the optimization for the material and geometric composite design. The user may experiment with various algorithms and tackle a wide range of problems. To verify the effectiveness and reliability of the code, a three-dimensional case is employed to illuminate the code. Finally numerical results obtained by the code agree well with the available theoretical and experimental results
- Given a rectilinear domain $\mathcal{P}$ of $h$ pairwise-disjoint rectilinear obstacles with a total of $n$ vertices in the plane, we study the problem of computing bicriteria rectilinear shortest paths between two points $s$ and $t$ in $\mathcal{P}$. Three types of bicriteria rectilinear paths are considered: minimum-link shortest paths, shortest minimum-link paths, and minimum-cost paths where the cost of a path is a non-decreasing function of both the number of edges and the length of the path. The one-point and two-point path queries are also considered. Algorithms for these problems have been given previously. Our contributions are threefold. First, we find a critical error in all previous algorithms. Second, we correct the error in a not-so-trivial way. Third, we further improve the algorithms so that they are even faster than the previous (incorrect) algorithms when $h$ is relatively small. For example, for the minimum-link shortest paths, we obtain the following results. Our algorithm computes a minimum-link shortest $s$-$t$ path in $O(n+h\log^{3/2} h)$ time. For the one-point queries, we build a data structure of size $O(n+ h\log h)$ in $O(n+h\log^{3/2} h)$ time for a source point $s$, such that given any query point $t$, a minimum-link shortest $s$-$t$ path can be determined in $O(\log n)$ time. For the two-point queries, with $O(n+h^2\log^2 h)$ time and space preprocessing, a minimum-link shortest $s$-$t$ path can be determined in $O(\log n+\log^2 h)$ time for any two query points $s$ and $t$; alternatively, with $O(n+h^2\cdot \log^{2} h \cdot 4^{\sqrt{\log h}})$ time and $O(n+h^2\cdot \log h \cdot 4^{\sqrt{\log h}})$ space preprocessing, we can answer each two-point query in $O(\log n)$ time.
- Mar 14 2017 cs.CE arXiv:1703.04355v1This study presents a meshless-based local reanalysis (MLR) method. The purpose of this study is to extend reanalysis methods to the Kriging interpolation meshless method due to its high efficiency. In this study, two reanalysis methods: combined approximations CA) and indirect factorization updating (IFU) methods are utilized. Considering the computational cost of meshless methods, the reanalysis method improves the efficiency of the full meshless method significantly. Compared with finite element method (FEM)-based reanalysis methods, the main superiority of meshless-based reanalysis method is to break the limitation of mesh connection. The meshless-based reanalysis is much easier to obtain the stiffness matrix even for solving the mesh distortion problems. However, compared with the FEM-based reanalysis method, the critical challenge is to use much more nodes in the influence domain due to high order interpolation. Therefore, a local reanalysis method which only needs to calculate the local stiffness matrix in the influence domain is suggested to improve the efficiency further. Several typical numerical examples are tested and the performance of the suggested method is verified.
- Let $s$ be a point in a polygonal domain $\mathcal{P}$ of $h-1$ holes and $n$ vertices. We consider a quickest visibility query problem. Given a query point $q$ in $\mathcal{P}$, the goal is to find a shortest path in $\mathcal{P}$ to move from $s$ to see $q$ as quickly as possible. Previously, Arkin et al. (SoCG 2015) built a data structure of size $O(n^22^{\alpha(n)}\log n)$ that can answer each query in $O(K\log^2 n)$ time, where $\alpha(n)$ is the inverse Ackermann function and $K$ is the size of the visibility polygon of $q$ in $\mathcal{P}$ (and $K$ can be $\Theta(n)$ in the worst case). In this paper, we present a new data structure of size $O(n\log h + h^2)$ that can answer each query in $O(h\log h\log n)$ time. Our result improves the previous work when $h$ is relatively small. In particular, if $h$ is a constant, then our result even matches the best result for the simple polygon case (i.e., $h=1$), which is optimal. As a by-product, we also have a new algorithm for a shortest-path-to-segment query problem. Given a query line segment $\tau$ in $\mathcal{P}$, the query seeks a shortest path from $s$ to all points of $\tau$. Previously, Arkin et al. gave a data structure of size $O(n^22^{\alpha(n)}\log n)$ that can answer each query in $O(\log^2 n)$ time, and another data structure of size $O(n^3\log n)$ with $O(\log n)$ query time. We present a data structure of size $O(n)$ with query time $O(h\log \frac{n}{h})$, which also favors small values of $h$ and is optimal when $h=O(1)$.
- Mar 08 2017 cs.CY arXiv:1703.02497v1Despite being popularly referred to as the ultimate solution for all problems of our current electric power system, smart grid is still a growing and unstable concept. It is usually considered as a set of advanced features powered by promising technological solutions. In this paper, we describe smart grid as a socio-technical transition and illustrate the evolutionary path on which a smart grid can be realized. Through this conceptual lens, we reveal the role of big data, and how it can fuel the organic growth of smart grid. We also provide a rough estimate of how much data will be potentially generated from different data sources, which helps clarify the big data challenges during the evolutionary process.
- Mar 06 2017 cs.CV arXiv:1703.01086v2This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks (RRPN), which is designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in orientation. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to the feature map for a text region classifier. The whole framework is built upon region proposal based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection comparing with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets, and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
- Mar 03 2017 physics.med-ph cs.CV arXiv:1703.00797v1Brain CT has become a standard imaging tool for emergent evaluation of brain condition, and measurement of midline shift (MLS) is one of the most important features to address for brain CT assessment. We present a simple method to estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons and our automated system were asked to measure MLS and MLS/ICWMAX in the same sets of axial CT images obtained from 41 patients admitted to ICU under neurosurgical service. A weighted midline (WML) was plotted based on individual pixel intensities, with higher weighted given to the darker portions. The MLS could then be measured as the distance between the WML and ideal midline (IML) near the foramen of Monro. The average processing time to output an automatic MLS measurement was around 10 seconds. Our automated system achieved an overall accuracy of 90.24% when the CT images were calibrated automatically, and performed better when the calibrations of head rotation were done manually (accuracy: 92.68%). MLS/ICWMAX and MLS both gave results in same confusion matrices and produced similar ROC curve results. We demonstrated a simple, fast and accurate automated system of MLS measurement and introduced a new parameter (MLS/ICWMAX) as a good alternative to MLS in terms of estimating the degree of brain deformation, especially when non-DICOM images (e.g. JPEG) are more easily accessed.
- This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.
- Feb 22 2017 cs.SY arXiv:1702.06265v1In this paper, we investigate the task-space consensus problem for multiple robotic systems with both the uncertain kinematics and dynamics in the case of existence of constant communication delays. We propose an observer-based adaptive controller to achieve the manipulable consensus without relying on the measurement of task-space velocities, and also formalize the concept of manipulability to quantify the degree of adjustability of the consensus value. The proposed new control scheme employs a new distributed observer that does not rely on the joint velocity, and a new kinematic parameter adaptation law with a distributed adaptive kinematic regressor matrix that is driven by both the observation and consensus errors. In addition, it is shown that the proposed controller has the separation property, which yields an adaptive kinematic controller that is applicable to most industrial/commercial robots. The performance of the proposed observer-based adaptive schemes are shown by numerical simulations.
- Feb 22 2017 cs.CL arXiv:1702.06239v1Argument component detection (ACD) is an important sub-task in argumentation mining. ACD aims at detecting and classifying different argument components in natural language texts. Historical annotations (HAs) are important features the human annotators consider when they manually perform the ACD task. However, HAs are largely ignored by existing automatic ACD techniques. Reinforcement learning (RL) has proven to be an effective method for using HAs in some natural language processing tasks. In this work, we propose a RL-based ACD technique, and evaluate its performance on two well-annotated corpora. Results suggest that, in terms of classification accuracy, HAs-augmented RL outperforms plain RL by at most 17.85%, and outperforms the state-of-the-art supervised learning algorithm by at most 11.94%.
- We study a demand response problem from operator's perspective with realistic settings, in which the operator faces uncertainty and limited communication. Specifically, the operator does not know the cost function of consumers and cannot have multiple rounds of information exchange with consumers. We formulate an optimization problem for the operator to minimize its operational cost considering time-varying demand response targets and responses of consumers. We develop a joint online learning and pricing algorithm. In each time slot, the operator sends out a price signal to all consumers and estimates the cost functions of consumers based on their noisy responses. We measure the performance of our algorithm using regret analysis and show that our online algorithm achieves logarithmic regret with respect to the operating horizon. In addition, our algorithm employs linear regression to estimate the aggregate response of consumers, making it easy to implement in practice. Simulation experiments validate the theoretic results and show that the performance gap between our algorithm and the offline optimality decays quickly.
- Feb 09 2017 cs.CV physics.med-ph arXiv:1702.02223v1The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images. Because CNN does not need tumor segmentation or feature calculation, it is more convenient and more objective than the classical methods. However, CNN does not make use of the import diagnostic features, which have been proved more discriminative than the texture features for classifying small-sized lymph nodes. Therefore, incorporating the diagnostic features into CNN is a promising direction for future research.
- Feb 08 2017 cs.CG arXiv:1702.01836v1We study approximation algorithms for the following geometric version of the maximum coverage problem: Let $\mathcal{P}$ be a set of $n$ weighted points in the plane. Let $D$ represent a planar object, such as a rectangle, or a disk. We want to place $m$ copies of $D$ such that the sum of the weights of the points in $\mathcal{P}$ covered by these copies is maximized. For any fixed $\varepsilon>0$, we present efficient approximation schemes that can find a $(1-\varepsilon)$-approximation to the optimal solution. In particular, for $m=1$ and for the special case where $D$ is a rectangle, our algorithm runs in time $O(n\log (\frac{1}{\varepsilon}))$, improving on the previous result. For $m>1$ and the rectangular case, our algorithm runs in $O(\frac{n}{\varepsilon}\log (\frac{1}{\varepsilon})+\frac{m}{\varepsilon}\log m +m(\frac{1}{\varepsilon})^{O(\min(\sqrt{m},\frac{1}{\varepsilon}))})$ time. For a more general class of shapes (including disks, polygons with $O(1)$ edges), our algorithm runs in $O(n(\frac{1}{\varepsilon})^{O(1)}+\frac{m}{\epsilon}\log m + m(\frac{1}{\varepsilon})^{O(\min(m,\frac{1}{\varepsilon^2}))})$ time.
- Kriging or Gaussian Process Regression is applied in many fields as a non-linear regression model as well as a surrogate model in the field of evolutionary computation. However, the computational and space complexity of Kriging, that is cubic and quadratic in the number of data points respectively, becomes a major bottleneck with more and more data available nowadays. In this paper, we propose a general methodology for the complexity reduction, called cluster Kriging, where the whole data set is partitioned into smaller clusters and multiple Kriging models are built on top of them. In addition, four Kriging approximation algorithms are proposed as candidate algorithms within the new framework. Each of these algorithms can be applied to much larger data sets while maintaining the advantages and power of Kriging. The proposed algorithms are explained in detail and compared empirically against a broad set of existing state-of-the-art Kriging approximation methods on a well-defined testing framework. According to the empirical study, the proposed algorithms consistently outperform the existing algorithms. Moreover, some practical suggestions are provided for using the proposed algorithms.
- Feb 02 2017 cs.CV arXiv:1702.00254v3We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
- It is known that certain structures of the signal in addition to the standard notion of sparsity (called structured sparsity) can improve the sample complexity in several compressive sensing applications. Recently, Hegde et al. proposed a framework, called approximation-tolerant model-based compressive sensing, for recovering signals with structured sparsity. Their framework requires two oracles, the head- and the tail-approximation projection oracles. The two oracles should return approximate solutions in the model which is closest to the query signal. In this paper, we consider two structured sparsity models and obtain improved projection algorithms. The first one is the tree sparsity model, which captures the support structure in the wavelet decomposition of piecewise-smooth signals. We propose a linear time $(1-\epsilon)$-approximation algorithm for head-approximation projection and a linear time $(1+\epsilon)$-approximation algorithm for tail-approximation projection. The best previous result is an $\tilde{O}(n\log n)$ time bicriterion approximation algorithm (meaning that their algorithm may return a solution of sparsity larger than $k$) by Hegde et al. Our result provides an affirmative answer to the open problem mentioned in the survey of Hegde and Indyk. As a corollary, we can recover a constant approximate $k$-sparse signal. The other is the Constrained Earth Mover Distance (CEMD) model, which is useful to model the situation where the positions of the nonzero coefficients of a signal do not change significantly as a function of spatial (or temporal) locations. We obtain the first single criterion constant factor approximation algorithm for the head-approximation projection. The previous best known algorithm is a bicriterion approximation. Using this result, we can get a faster constant approximation algorithm with fewer measurements for the recovery problem in CEMD model.
- Jan 16 2017 cs.LG arXiv:1701.03647v1Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by pairwise constraints and the process of encoding is conducted under these guidances. The pairwise constraints are encoded in hidden layer features of pcGRBM. Then, some pairwise hidden features of pcGRBM flock together and another part of them are separated by the guidances. In order to deal with real-valued data, the binary visible units are replaced by linear units with Gausian noise in the pcGRBM model. In the learning process of pcGRBM, the pairwise constraints are iterated transitions between visible and hidden units during CD learning procedure. Then, the proposed model is inferred by approximative gradient descent method and the corresponding learning algorithm is designed in this paper. In order to compare the availability of pcGRBM and traditional RBMs with Gaussian visible units, the features of the pcGRBM and RBMs hidden layer are used as input 'data' for K-means, spectral clustering (SP) and affinity propagation (AP) algorithms, respectively. A thorough experimental evaluation is performed with sixteen image datasets of Microsoft Research Asia Multimedia (MSRA-MM). The experimental results show that the clustering performance of K-means, SP and AP algorithms based on pcGRBM model are significantly better than traditional RBMs. In addition, the pcGRBM model for clustering task shows better performance than some semi-supervised clustering algorithms.
- Jan 09 2017 cs.MM arXiv:1701.01500v2A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., $1920 \times 1080$, $1280 \times 720$, $960 \times 540$ and $640 \times 360$). For each of the 880 video clips, we encode it using the H.264 codec with $QP=1, \cdots, 51$ and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)". This work describes the subjective test procedure, detection and removal of outlying measured data, and the properties of collected JND data. Finally, the significance and implications of the VideoSet to future video coding research and standardization efforts are pointed out. All source/coded video clips as well as measured JND data included in the VideoSet are available to the public in the IEEE DataPort.
- Phase retrieval(PR) problem is a kind of ill-condition inverse problem which is arising in various of applications. Based on the Wirtinger flow(WF) method, a reweighted Wirtinger flow(RWF) method is proposed to deal with PR problem. RWF finds the global optimum by solving a series of sub-PR problems with changing weights. Theoretical analyses illustrate that the RWF has a geometric convergence from a deliberate initialization when the weights are bounded by 1 and $\frac{10}{9}$. Numerical testing shows RWF has a lower sampling complexity compared with WF. As an essentially adaptive truncated Wirtinger flow(TWF) method, RWF performs better than TWF especially when the ratio between sampling number $m$ and length of signal $n$ is small.
- We determine the cycle structure of linear feedback shift register with arbitrary monic characteristic polynomial over any finite field. For each cycle, a method to find a state and a new way to represent the state are proposed.
- Dec 22 2016 cs.LG arXiv:1612.06856v2This paper formulates the problem of learning discriminative features (\textiti.e., segments) from networked time series data considering the linked information among time series. For example, social network users are considered to be social sensors that continuously generate social signals (tweets) represented as a time series. The discriminative segments are often referred to as \emphshapelets in a time series. Extracting shapelets for time series classification has been widely studied. However, existing works on shapelet selection assume that the time series are independent and identically distributed (i.i.d.). This assumption restricts their applications to social networked time series analysis, since a user's actions can be correlated to his/her social affiliations. In this paper we propose a new Network Regularized Least Squares (NetRLS) feature selection model that combines typical time series data and user network data for analysis. Experiments on real-world networked time series Twitter and DBLP data demonstrate the performance of the proposed method. NetRLS performs better than LTS, the state-of-the-art time series feature selection approach, on real-world data.
- Dec 15 2016 cs.CV arXiv:1612.04755v1In this article, we propose a super-resolution method to resolve the problem of image low spatial because of the limitation of imaging devices. We make use of the strong non-linearity mapped ability of the back-propagation neural networks(BPNN). Training sample images are got by undersampled method. The elements chose as the inputs of the BPNN are pixels referred to Non-local means(NL-Means). Making use of the self-similarity of the images, those inputs are the pixels which are pixels gained from modified NL-means which is specific for super-resolution. Besides, small change on core function of NL-means has been applied in the method we use in this article so that we can have a clearer edge in the shrunk image. Experimental results gained from the Peak Signal to Noise Ratio(PSNR) and the Equivalent Number of Look(ENL), indicate that adding the similar pixels as inputs will increase the results than not taking them into consideration.
- Dec 09 2016 cs.CV arXiv:1612.02742v1Hand detection is essential for many hand related tasks, e.g. parsing hand pose, understanding gesture, which are extremely useful for robotics and human-computer interaction. However, hand detection in uncontrolled environments is challenging due to the flexibility of wrist joint and cluttered background. We propose a deep learning based approach which detects hands and calibrates in-plane rotation under supervision at the same time. To guarantee the recall, we propose a context aware proposal generation algorithm which significantly outperforms the selective search. We then design a convolutional neural network(CNN) which handles object rotation explicitly to jointly solve the object detection and rotation estimation tasks. Experiments show that our method achieves better results than state-of-the-art detection models on widely-used benchmarks such as Oxford and Egohands database. We further show that rotation estimation and classification can mutually benefit each other.
- Dec 06 2016 cs.CV arXiv:1612.01345v1Current person re-identification (re-id) methods assume that (1) pre-labelled training data are available for every camera pair, (2) the gallery size for re-identification is moderate. Both assumptions scale poorly to real-world applications when camera network size increases and gallery size becomes large. Human verification of automatic model ranked re-id results becomes inevitable. In this work, a novel human-in-the-loop re-id model based on Human Verification Incremental Learning (HVIL) is formulated which does not require any pre-labelled training data to learn a model, therefore readily scalable to new camera pairs. This HVIL model learns cumulatively from human feedback to provide instant improvement to re-id ranking of each probe on-the-fly enabling the model scalable to large gallery sizes. We further formulate a Regularised Metric Ensemble Learning (RMEL) model to combine a series of incrementally learned HVIL models into a single ensemble model to be used when human feedback becomes unavailable.
- Dec 06 2016 cs.CV arXiv:1612.01341v1Existing person re-identification models are poor for scaling up to large data required in real-world applications due to: (1) Complexity: They employ complex models for optimal performance resulting in high computational cost for training at a large scale; (2) Inadaptability: Once trained, they are unsuitable for incremental update to incorporate any new data available. This work proposes a truly scalable solution to re-id by addressing both problems. Specifically, a Highly Efficient Regression (HER) model is formulated by embedding the Fisher's criterion to a ridge regression model for very fast re-id model learning with scalable memory/storage usage. Importantly, this new HER model supports faster than real-time incremental model updates therefore making real-time active learning feasible in re-id with human-in-the-loop. Extensive experiments show that such a simple and fast model not only outperforms notably the state-of-the-art re-id methods, but also is more scalable to large data with additional benefits to active learning for reducing human labelling effort in re-id deployment.
- We propose a construction of de Bruijn sequences by the cycle joining method from linear feedback shift registers (LFSRs) with arbitrary characteristic polynomial $f(x)$. We study in detail the cycle structure of the set $\Omega(f(x))$ that contains all sequences produced by a specific LFSR on distinct inputs and provide an efficient way to find a state of each cycle. Our structural results lead to an efficient algorithm to find all conjugate pairs between any two cycles, yielding the adjacency graph. The approach provides a practical method to generate a large class of de Bruijn sequences. Many recently-proposed constructions of de Bruijn sequences are shown to be special cases of our construction.
- Understanding how brain functions has been an intriguing topic for years. With the recent progress on collecting massive data and developing advanced technology, people have become interested in addressing the challenge of decoding brain wave data into meaningful mind states, with many machine learning models and algorithms being revisited and developed, especially the ones that handle time series data because of the nature of brain waves. However, many of these time series models, like HMM with hidden state in discrete space or State Space Model with hidden state in continuous space, only work with one source of data and cannot handle different sources of information simultaneously. In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms. We apply this model to decode the mind state of students during lectures based on their brain waves and reach a significant better results compared to traditional methods.
- Nov 30 2016 cs.CR arXiv:1611.09418v1We propose a novel hierarchical online intrusion detection system (HOIDS) for supervisory control and data acquisition (SCADA) networks based on machine learning algorithms. By utilizing the server-client topology while keeping clients distributed for global protection, high detection rate is achieved with minimum network impact. We implement accurate models of normal-abnormal binary detection and multi-attack identification based on logistic regression and quasi-Newton optimization algorithm using the Broyden-Fletcher-Goldfarb-Shanno approach. The detection system is capable of accelerating detection by information gain based feature selection or principle component analysis based dimension reduction. By evaluating our system using the KDD99 dataset and the industrial control system dataset, we demonstrate that HOIDS is highly scalable, efficient and cost effective for securing SCADA infrastructures.
- Nov 30 2016 cs.CG arXiv:1611.09485v1We consider a problem of dispersing points on disjoint intervals on a line. Given n pairwise disjoint intervals sorted on a line, we want to find a point in each interval such that the minimum pairwise distance of these points is maximized. Based on a greedy strategy, we present a linear time algorithm for the problem. Further, we also solve in linear time the cycle version of the problem where the intervals are given on a cycle.
- Nov 24 2016 cs.CL arXiv:1611.07954v1Reading comprehension is a question answering task where the answer is to be found in a given passage about entities and events not mentioned in general knowledge sources. A significant number of neural architectures for this task (neural readers) have recently been developed and evaluated on large cloze-style datasets. We present experiments supporting the existence of logical structure in the hidden state vectors of "aggregation readers" such as the Attentive Reader and Stanford Reader. The logical structure of aggregation readers reflects the architecture of "explicit reference readers" such as the Attention-Sum Reader, the Gated Attention Reader and the Attention-over-Attention Reader. This relationship between aggregation readers and explicit reference readers presents a case study in emergent logical structure. In an independent contribution, we show that the addition of linguistics features to the input to existing neural readers significantly boosts performance yielding the best results to date on the Who-did-What datasets.
- Nov 23 2016 cs.CL arXiv:1611.07206v1In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.
- Nov 22 2016 cs.SI physics.soc-ph arXiv:1611.06645v1Network representation learning (NRL) aims to build low-dimensional vectors for vertices in a network. Most existing NRL methods focus on learning representations from local context of vertices (such as their neighbors). Nevertheless, vertices in many complex networks also exhibit significant global patterns widely known as communities. It's a common sense that vertices in the same community tend to connect densely, and usually share common attributes. These patterns are expected to improve NRL and benefit relevant evaluation tasks, such as link prediction and vertex classification. In this work, we propose a novel NRL model by introducing community information of vertices to learn more discriminative network representations, named as Community-enhanced Network Representation Learning (CNRL). CNRL simultaneously detects community distribution of each vertex and learns embeddings of both vertices and communities. In this way, we can obtain more informative representation of a vertex accompanying with its community information. In experiments, we evaluate the proposed CNRL model on vertex classification, link prediction, and community detection using several real-world datasets. The results demonstrate that CNRL significantly and consistently outperforms other state-of-the-art methods. Meanwhile, the detected meaningful communities verify our assumptions on the correlations among vertices, sequences, and communities.
- Nov 16 2016 cs.DB arXiv:1611.04689v2We study the similarity search problem which aims to find the similar query results according to a set of given data and a query string. To balance the result number and result quality, we combine query result diversity with query relaxation. Relaxation guarantees the number of the query results, returning more relevant elements to the query if the results are too few, while the diversity tries to reduce the similarity among the returned results. By making a trade-off of similarity and diversity, we improve the user experience. To achieve this goal, we define a novel goal function combining similarity and diversity. Aiming at this goal, we propose three algorithms. Among them, algorithms genGreedy and genCluster perform relaxation first and select part of the candidates to diversify. The third algorithm CB2S splits the dataset into smaller pieces using the clustering algorithm of k-means and processes queries in several small sets to retrieve more diverse results. The balance of similarity and diversity is determined through setting a threshold, which has a default value and can be adjusted according to users' preference. The performance and efficiency of our system are demonstrated through extensive experiments based on various datasets.
- Nov 15 2016 cs.DB arXiv:1611.04288v1A challenge for data imputation is the lack of knowledge. In this paper, we attempt to address this challenge by involving extra knowledge from web. To achieve high-performance web-based imputation, we use the dependency, i.e.FDs and CFDs, to impute as many as possible values automatically and fill in the other missing values with the minimal access of web, whose cost is relatively large. To make sufficient use of dependencies, We model the dependency set on the data as a graph and perform automatical imputation and keywords generation for web-based imputation based on such graph model. With the generated keywords, we design two algorithms to extract values for imputation from the search results. Extensive experimental results based on real-world data collections show that the proposed approach could impute missing values efficiently and effectively compared to existing approach.
- In the Fifth generation (5G) wireless communication systems, a majority of the traffic demands is contributed by various multimedia applications. To support the future 5G multimedia communication systems, the massive multiple-input multiple-output (MIMO) technique is recognized as a key enabler due to its high spectral efficiency. The massive antennas and radio frequency (RF) chains not only improve the implementation cost of 5G wireless communication systems but also result in an intense mutual coupling effect among antennas because of the limited space for deploying antennas. To reduce the cost, an optimal equivalent precoding matrix with the minimum number of RF chains is proposed for 5G multimedia massive MIMO communication systems considering the mutual coupling effect. Moreover, an upper bound of the effective capacity is derived for 5G multimedia massive MIMO communication systems. Two antenna receive diversity gain models are built and analyzed. The impacts of the antenna spacing, the number of antennas, the quality of service (QoS) statistical exponent, and the number of independent incident directions on the effective capacity of 5G multimedia massive MIMO communication systems are analyzed. Comparing with the conventional zero-forcing precoding matrix, simulation results demonstrate that the proposed optimal equivalent precoding matrix can achieve a higher achievable rate for 5G multimedia massive MIMO communication systems.
- Nov 10 2016 cs.CV arXiv:1611.02767v1Neural networks have shown to be a practical way of building a very complex mapping between a pre-specified input space and output space. For example, a convolutional neural network (CNN) mapping an image into one of a thousand object labels is approaching human performance in this particular task. However the mapping (neural network) does not automatically lend itself to other forms of queries, for example, to detect/reconstruct object instances, to enforce top-down signal on ambiguous inputs, or to recover object instances from occlusion. One way to address these queries is a backward pass through the network that fuses top-down and bottom-up information. In this paper, we show a way of building such a backward pass by defining a generative model of the neural network's activations. Approximate inference of the model would naturally take the form of a backward pass through the CNN layers, and it addresses the aforementioned queries in a unified framework.
- Nov 04 2016 cs.CC arXiv:1611.00975v2Holant problem is a general framework to study the computational complexity of counting problems. We prove a complexity dichotomy theorem for Holant problems over Boolean domain with non-negative weights. It is the first complete Holant dichotomy where constraint functions are not necessarily symmetric. Holant problems are indeed read-twice $\#$CSPs. Intuitively, some $\#$CSPs that are $\#$P-hard become tractable when restricting to read-twice instances. To capture them, we introduce the Block-rank-one condition. It turns out that the condition leads to a clear separation. If a function set $\mathcal{F}$ satisfies the condition, then $\mathcal{F}$ is of affine type or product type. Otherwise (a) $\mathrm{Holant}(\mathcal{F})$ is $\#$P-hard; or (b) every function in $\mathcal{F}$ is a tensor product of functions of arity at most 2; or (c) $\mathcal{F}$ is transformable to a product type by some real orthogonal matrix. Holographic transformations play an important role in both the hardness proof and the characterization of tractability.
- Hybrid methods that utilize both content and rating information are commonly used in many recommender systems. However, most of them use either handcrafted features or the bag-of-words representation as a surrogate for the content information but they are neither effective nor natural enough. To address this problem, we develop a collaborative recurrent autoencoder (CRAE) which is a denoising recurrent autoencoder (DRAE) that models the generation of content sequences in the collaborative filtering (CF) setting. The model generalizes recent advances in recurrent deep learning from i.i.d. input to non-i.i.d. (CF-based) input and provides a new denoising scheme along with a novel learnable pooling scheme for the recurrent autoencoder. To do this, we first develop a hierarchical Bayesian model for the DRAE and then generalize it to the CF setting. The synergy between denoising and CF enables CRAE to make accurate recommendations while learning to fill in the blanks in sequences. Experiments on real-world datasets from different domains (CiteULike and Netflix) show that, by jointly modeling the order-aware generation of sequences for the content information and performing CF for the ratings, CRAE is able to significantly outperform the state of the art on both the recommendation task based on ratings and the sequence generation task based on content information.
- Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponential-family distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as second-order representations for the associated tasks such as link prediction. Experiments on real-world datasets show that NPN can achieve state-of-the-art performance.
- Nov 01 2016 cs.DB arXiv:1610.09506v1In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and even impossible to access all data sources. Data Source selection should consider both efficiency and effectiveness issues. For efficiency, the approach should achieve high performance and be scalability to fit large data source amount. From effectiveness aspect, data quality and overlapping of sources are to be considered, since data quality varies much from data sources, with significant differences in the accuracy and coverage of the data provided, and the overlapping of sources can even lower the quality of data integrated from selected data sources. In this paper, we study source selection problem in \textitBig Data Era and propose methods which can scale to datasets with up to millions of data sources and guarantee the quality of results. Motivated by this, we propose a new object function taking the expected number of true values a source can provide as a criteria to evaluate the contribution of a data source. Based on our proposed index we present a scalable algorithm and two pruning strategies to improve the efficiency without sacrificing precision. Experimental results on both real world and synthetic data sets show that our methods can select sources providing a large proportion of true values efficiently and can scale to massive data sources.