results for au:Wang_H in:cs

- Mar 28 2017 cs.CV arXiv:1703.08912v1In this paper, we will investigate the contribution of color names for salient object detection. Each input image is first converted to the color name space, which is consisted of 11 probabilistic channels. By exploring the topological structure relationship between the figure and the ground, we obtain a saliency map through a linear combination of a set of sequential attention maps. To overcome the limitation of only exploiting the surroundedness cue, two global cues with respect to color names are invoked for guiding the computation of another weighted saliency map. Finally, we integrate the two saliency maps into a unified framework to infer the saliency result. In addition, an improved post-processing procedure is introduced to effectively suppress the background while uniformly highlight the salient objects. Experimental results show that the proposed model produces more accurate saliency maps and performs well against 23 saliency models in terms of three evaluation metrics on three public datasets.
- This paper studies physical layer security in a wireless ad hoc network with numerous legitimate transmitter-receiver pairs and eavesdroppers. A hybrid full-/half-duplex receiver deployment strategy is proposed to secure legitimate transmissions, by letting a fraction of legitimate receivers work in the full-duplex (FD) mode sending jamming signals to confuse eavesdroppers upon their information receptions, and letting the other receivers work in the half-duplex mode just receiving their desired signals. The objective of this paper is to choose properly the fraction of FD receivers for achieving the optimal network security performance. Both accurate expressions and tractable approximations for the connection outage probability and the secrecy outage probability of an arbitrary legitimate link are derived, based on which the area secure link number, network-wide secrecy throughput and network-wide secrecy energy efficiency are optimized respectively. Various insights into the optimal fraction are further developed and its closed-form expressions are also derived under perfect self-interference cancellation or in a dense network. It is concluded that the fraction of FD receivers triggers a non-trivial trade-off between reliability and secrecy, and the proposed strategy can significantly enhance the network security performance.
- Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. A holy-grail solution towards generic image manipulation should be able to efficiently alter an input image with any personalized signals (even signals unseen during training), such as diverse paintings and arbitrary descriptive attributes. However, existing methods are either inefficient to simultaneously process multiple signals (let alone generalize to unseen signals), or unable to handle signals from other modalities. In this paper, we make the first attempt to address the zero-shot image manipulation task. We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a fully-differentiable architecture that jointly optimizes an image-transformation network (TNet) and a parameter network (PNet). The PNet learns to generate key transformation parameters for the TNet given any guiding signal while the TNet performs fast zero-shot image manipulation according to both signal-dependent parameters from the PNet and signal-invariant parameters from the TNet itself. Extensive experiments show that our ZM-Net can perform high-quality image manipulation conditioned on different forms of guiding signals (e.g. style images and attributes) in real-time (tens of milliseconds per image) even for unseen signals. Moreover, a large-scale style dataset with over 20,000 style images is also constructed to promote further research.
- Mar 20 2017 cs.CV arXiv:1703.05870v2Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR.
- Mar 14 2017 cs.CE arXiv:1703.03930v1Multiscale optimization is an attractive research field recently. For the most of optimization tools, design parameters should be updated during a close loop. Therefore, a simple Python code is programmed to obtain effective properties of Representative Volume Element (RVE) under Periodic Boundary Conditions (PBCs). It can compute the mechanical properties of a composite with a periodic structure, in two or three dimensions. The computation method is based on the Asymptotic Homogenization Theory (AHT). With simple modifications, the basic Python code may be extended to the computation of the effective properties of more complex microstructure. Moreover, the code provides a convenient platform upon the optimization for the material and geometric composite design. The user may experiment with various algorithms and tackle a wide range of problems. To verify the effectiveness and reliability of the code, a three-dimensional case is employed to illuminate the code. Finally numerical results obtained by the code agree well with the available theoretical and experimental results
- Given a rectilinear domain $\mathcal{P}$ of $h$ pairwise-disjoint rectilinear obstacles with a total of $n$ vertices in the plane, we study the problem of computing bicriteria rectilinear shortest paths between two points $s$ and $t$ in $\mathcal{P}$. Three types of bicriteria rectilinear paths are considered: minimum-link shortest paths, shortest minimum-link paths, and minimum-cost paths where the cost of a path is a non-decreasing function of both the number of edges and the length of the path. The one-point and two-point path queries are also considered. Algorithms for these problems have been given previously. Our contributions are threefold. First, we find a critical error in all previous algorithms. Second, we correct the error in a not-so-trivial way. Third, we further improve the algorithms so that they are even faster than the previous (incorrect) algorithms when $h$ is relatively small. For example, for the minimum-link shortest paths, we obtain the following results. Our algorithm computes a minimum-link shortest $s$-$t$ path in $O(n+h\log^{3/2} h)$ time. For the one-point queries, we build a data structure of size $O(n+ h\log h)$ in $O(n+h\log^{3/2} h)$ time for a source point $s$, such that given any query point $t$, a minimum-link shortest $s$-$t$ path can be determined in $O(\log n)$ time. For the two-point queries, with $O(n+h^2\log^2 h)$ time and space preprocessing, a minimum-link shortest $s$-$t$ path can be determined in $O(\log n+\log^2 h)$ time for any two query points $s$ and $t$; alternatively, with $O(n+h^2\cdot \log^{2} h \cdot 4^{\sqrt{\log h}})$ time and $O(n+h^2\cdot \log h \cdot 4^{\sqrt{\log h}})$ space preprocessing, we can answer each two-point query in $O(\log n)$ time.
- Mar 14 2017 cs.CE arXiv:1703.04355v1This study presents a meshless-based local reanalysis (MLR) method. The purpose of this study is to extend reanalysis methods to the Kriging interpolation meshless method due to its high efficiency. In this study, two reanalysis methods: combined approximations CA) and indirect factorization updating (IFU) methods are utilized. Considering the computational cost of meshless methods, the reanalysis method improves the efficiency of the full meshless method significantly. Compared with finite element method (FEM)-based reanalysis methods, the main superiority of meshless-based reanalysis method is to break the limitation of mesh connection. The meshless-based reanalysis is much easier to obtain the stiffness matrix even for solving the mesh distortion problems. However, compared with the FEM-based reanalysis method, the critical challenge is to use much more nodes in the influence domain due to high order interpolation. Therefore, a local reanalysis method which only needs to calculate the local stiffness matrix in the influence domain is suggested to improve the efficiency further. Several typical numerical examples are tested and the performance of the suggested method is verified.
- Let $s$ be a point in a polygonal domain $\mathcal{P}$ of $h-1$ holes and $n$ vertices. We consider a quickest visibility query problem. Given a query point $q$ in $\mathcal{P}$, the goal is to find a shortest path in $\mathcal{P}$ to move from $s$ to see $q$ as quickly as possible. Previously, Arkin et al. (SoCG 2015) built a data structure of size $O(n^22^{\alpha(n)}\log n)$ that can answer each query in $O(K\log^2 n)$ time, where $\alpha(n)$ is the inverse Ackermann function and $K$ is the size of the visibility polygon of $q$ in $\mathcal{P}$ (and $K$ can be $\Theta(n)$ in the worst case). In this paper, we present a new data structure of size $O(n\log h + h^2)$ that can answer each query in $O(h\log h\log n)$ time. Our result improves the previous work when $h$ is relatively small. In particular, if $h$ is a constant, then our result even matches the best result for the simple polygon case (i.e., $h=1$), which is optimal. As a by-product, we also have a new algorithm for a shortest-path-to-segment query problem. Given a query line segment $\tau$ in $\mathcal{P}$, the query seeks a shortest path from $s$ to all points of $\tau$. Previously, Arkin et al. gave a data structure of size $O(n^22^{\alpha(n)}\log n)$ that can answer each query in $O(\log^2 n)$ time, and another data structure of size $O(n^3\log n)$ with $O(\log n)$ query time. We present a data structure of size $O(n)$ with query time $O(h\log \frac{n}{h})$, which also favors small values of $h$ and is optimal when $h=O(1)$.
- Mar 08 2017 cs.CY arXiv:1703.02497v1Despite being popularly referred to as the ultimate solution for all problems of our current electric power system, smart grid is still a growing and unstable concept. It is usually considered as a set of advanced features powered by promising technological solutions. In this paper, we describe smart grid as a socio-technical transition and illustrate the evolutionary path on which a smart grid can be realized. Through this conceptual lens, we reveal the role of big data, and how it can fuel the organic growth of smart grid. We also provide a rough estimate of how much data will be potentially generated from different data sources, which helps clarify the big data challenges during the evolutionary process.
- Mar 06 2017 cs.CV arXiv:1703.01086v2This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks (RRPN), which is designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in orientation. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to the feature map for a text region classifier. The whole framework is built upon region proposal based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection comparing with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets, and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
- Mar 03 2017 physics.med-ph cs.CV arXiv:1703.00797v1Brain CT has become a standard imaging tool for emergent evaluation of brain condition, and measurement of midline shift (MLS) is one of the most important features to address for brain CT assessment. We present a simple method to estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons and our automated system were asked to measure MLS and MLS/ICWMAX in the same sets of axial CT images obtained from 41 patients admitted to ICU under neurosurgical service. A weighted midline (WML) was plotted based on individual pixel intensities, with higher weighted given to the darker portions. The MLS could then be measured as the distance between the WML and ideal midline (IML) near the foramen of Monro. The average processing time to output an automatic MLS measurement was around 10 seconds. Our automated system achieved an overall accuracy of 90.24% when the CT images were calibrated automatically, and performed better when the calibrations of head rotation were done manually (accuracy: 92.68%). MLS/ICWMAX and MLS both gave results in same confusion matrices and produced similar ROC curve results. We demonstrated a simple, fast and accurate automated system of MLS measurement and introduced a new parameter (MLS/ICWMAX) as a good alternative to MLS in terms of estimating the degree of brain deformation, especially when non-DICOM images (e.g. JPEG) are more easily accessed.
- This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.
- Feb 22 2017 cs.SY arXiv:1702.06265v1In this paper, we investigate the task-space consensus problem for multiple robotic systems with both the uncertain kinematics and dynamics in the case of existence of constant communication delays. We propose an observer-based adaptive controller to achieve the manipulable consensus without relying on the measurement of task-space velocities, and also formalize the concept of manipulability to quantify the degree of adjustability of the consensus value. The proposed new control scheme employs a new distributed observer that does not rely on the joint velocity, and a new kinematic parameter adaptation law with a distributed adaptive kinematic regressor matrix that is driven by both the observation and consensus errors. In addition, it is shown that the proposed controller has the separation property, which yields an adaptive kinematic controller that is applicable to most industrial/commercial robots. The performance of the proposed observer-based adaptive schemes are shown by numerical simulations.
- Feb 22 2017 cs.CL arXiv:1702.06239v1Argument component detection (ACD) is an important sub-task in argumentation mining. ACD aims at detecting and classifying different argument components in natural language texts. Historical annotations (HAs) are important features the human annotators consider when they manually perform the ACD task. However, HAs are largely ignored by existing automatic ACD techniques. Reinforcement learning (RL) has proven to be an effective method for using HAs in some natural language processing tasks. In this work, we propose a RL-based ACD technique, and evaluate its performance on two well-annotated corpora. Results suggest that, in terms of classification accuracy, HAs-augmented RL outperforms plain RL by at most 17.85%, and outperforms the state-of-the-art supervised learning algorithm by at most 11.94%.
- We study a demand response problem from operator's perspective with realistic settings, in which the operator faces uncertainty and limited communication. Specifically, the operator does not know the cost function of consumers and cannot have multiple rounds of information exchange with consumers. We formulate an optimization problem for the operator to minimize its operational cost considering time-varying demand response targets and responses of consumers. We develop a joint online learning and pricing algorithm. In each time slot, the operator sends out a price signal to all consumers and estimates the cost functions of consumers based on their noisy responses. We measure the performance of our algorithm using regret analysis and show that our online algorithm achieves logarithmic regret with respect to the operating horizon. In addition, our algorithm employs linear regression to estimate the aggregate response of consumers, making it easy to implement in practice. Simulation experiments validate the theoretic results and show that the performance gap between our algorithm and the offline optimality decays quickly.
- Feb 09 2017 cs.CV physics.med-ph arXiv:1702.02223v1The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images. Because CNN does not need tumor segmentation or feature calculation, it is more convenient and more objective than the classical methods. However, CNN does not make use of the import diagnostic features, which have been proved more discriminative than the texture features for classifying small-sized lymph nodes. Therefore, incorporating the diagnostic features into CNN is a promising direction for future research.
- Feb 08 2017 cs.CG arXiv:1702.01836v1We study approximation algorithms for the following geometric version of the maximum coverage problem: Let $\mathcal{P}$ be a set of $n$ weighted points in the plane. Let $D$ represent a planar object, such as a rectangle, or a disk. We want to place $m$ copies of $D$ such that the sum of the weights of the points in $\mathcal{P}$ covered by these copies is maximized. For any fixed $\varepsilon>0$, we present efficient approximation schemes that can find a $(1-\varepsilon)$-approximation to the optimal solution. In particular, for $m=1$ and for the special case where $D$ is a rectangle, our algorithm runs in time $O(n\log (\frac{1}{\varepsilon}))$, improving on the previous result. For $m>1$ and the rectangular case, our algorithm runs in $O(\frac{n}{\varepsilon}\log (\frac{1}{\varepsilon})+\frac{m}{\varepsilon}\log m +m(\frac{1}{\varepsilon})^{O(\min(\sqrt{m},\frac{1}{\varepsilon}))})$ time. For a more general class of shapes (including disks, polygons with $O(1)$ edges), our algorithm runs in $O(n(\frac{1}{\varepsilon})^{O(1)}+\frac{m}{\epsilon}\log m + m(\frac{1}{\varepsilon})^{O(\min(m,\frac{1}{\varepsilon^2}))})$ time.
- Kriging or Gaussian Process Regression is applied in many fields as a non-linear regression model as well as a surrogate model in the field of evolutionary computation. However, the computational and space complexity of Kriging, that is cubic and quadratic in the number of data points respectively, becomes a major bottleneck with more and more data available nowadays. In this paper, we propose a general methodology for the complexity reduction, called cluster Kriging, where the whole data set is partitioned into smaller clusters and multiple Kriging models are built on top of them. In addition, four Kriging approximation algorithms are proposed as candidate algorithms within the new framework. Each of these algorithms can be applied to much larger data sets while maintaining the advantages and power of Kriging. The proposed algorithms are explained in detail and compared empirically against a broad set of existing state-of-the-art Kriging approximation methods on a well-defined testing framework. According to the empirical study, the proposed algorithms consistently outperform the existing algorithms. Moreover, some practical suggestions are provided for using the proposed algorithms.
- Feb 02 2017 cs.CV arXiv:1702.00254v2We perform fast vehicle detection from traffic surveillance cameras. The classic cascade object detection is revisited and a novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by apply different feature fusion techniques, the initial boxes can be refined in terms of both localization and recognition, leading to evolved boxes. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
- It is known that certain structures of the signal in addition to the standard notion of sparsity (called structured sparsity) can improve the sample complexity in several compressive sensing applications. Recently, Hegde et al. proposed a framework, called approximation-tolerant model-based compressive sensing, for recovering signals with structured sparsity. Their framework requires two oracles, the head- and the tail-approximation projection oracles. The two oracles should return approximate solutions in the model which is closest to the query signal. In this paper, we consider two structured sparsity models and obtain improved projection algorithms. The first one is the tree sparsity model, which captures the support structure in the wavelet decomposition of piecewise-smooth signals. We propose a linear time $(1-\epsilon)$-approximation algorithm for head-approximation projection and a linear time $(1+\epsilon)$-approximation algorithm for tail-approximation projection. The best previous result is an $\tilde{O}(n\log n)$ time bicriterion approximation algorithm (meaning that their algorithm may return a solution of sparsity larger than $k$) by Hegde et al. Our result provides an affirmative answer to the open problem mentioned in the survey of Hegde and Indyk. As a corollary, we can recover a constant approximate $k$-sparse signal. The other is the Constrained Earth Mover Distance (CEMD) model, which is useful to model the situation where the positions of the nonzero coefficients of a signal do not change significantly as a function of spatial (or temporal) locations. We obtain the first single criterion constant factor approximation algorithm for the head-approximation projection. The previous best known algorithm is a bicriterion approximation. Using this result, we can get a faster constant approximation algorithm with fewer measurements for the recovery problem in CEMD model.
- Jan 16 2017 cs.LG arXiv:1701.03647v1Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by pairwise constraints and the process of encoding is conducted under these guidances. The pairwise constraints are encoded in hidden layer features of pcGRBM. Then, some pairwise hidden features of pcGRBM flock together and another part of them are separated by the guidances. In order to deal with real-valued data, the binary visible units are replaced by linear units with Gausian noise in the pcGRBM model. In the learning process of pcGRBM, the pairwise constraints are iterated transitions between visible and hidden units during CD learning procedure. Then, the proposed model is inferred by approximative gradient descent method and the corresponding learning algorithm is designed in this paper. In order to compare the availability of pcGRBM and traditional RBMs with Gaussian visible units, the features of the pcGRBM and RBMs hidden layer are used as input 'data' for K-means, spectral clustering (SP) and affinity propagation (AP) algorithms, respectively. A thorough experimental evaluation is performed with sixteen image datasets of Microsoft Research Asia Multimedia (MSRA-MM). The experimental results show that the clustering performance of K-means, SP and AP algorithms based on pcGRBM model are significantly better than traditional RBMs. In addition, the pcGRBM model for clustering task shows better performance than some semi-supervised clustering algorithms.
- Jan 09 2017 cs.MM arXiv:1701.01500v2A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., $1920 \times 1080$, $1280 \times 720$, $960 \times 540$ and $640 \times 360$). For each of the 880 video clips, we encode it using the H.264 codec with $QP=1, \cdots, 51$ and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)". This work describes the subjective test procedure, detection and removal of outlying measured data, and the properties of collected JND data. Finally, the significance and implications of the VideoSet to future video coding research and standardization efforts are pointed out. All source/coded video clips as well as measured JND data included in the VideoSet are available to the public in the IEEE DataPort.
- Phase retrieval(PR) problem is a kind of ill-condition inverse problem which is arising in various of applications. Based on the Wirtinger flow(WF) method, a reweighted Wirtinger flow(RWF) method is proposed to deal with PR problem. RWF finds the global optimum by solving a series of sub-PR problems with changing weights. Theoretical analyses illustrate that the RWF has a geometric convergence from a deliberate initialization when the weights are bounded by 1 and $\frac{10}{9}$. Numerical testing shows RWF has a lower sampling complexity compared with WF. As an essentially adaptive truncated Wirtinger flow(TWF) method, RWF performs better than TWF especially when the ratio between sampling number $m$ and length of signal $n$ is small.
- We determine the cycle structure of linear feedback shift register with arbitrary monic characteristic polynomial over any finite field. For each cycle, a method to find a state and a new way to represent the state are proposed.
- Dec 22 2016 cs.LG arXiv:1612.06856v2This paper formulates the problem of learning discriminative features (\textiti.e., segments) from networked time series data considering the linked information among time series. For example, social network users are considered to be social sensors that continuously generate social signals (tweets) represented as a time series. The discriminative segments are often referred to as \emphshapelets in a time series. Extracting shapelets for time series classification has been widely studied. However, existing works on shapelet selection assume that the time series are independent and identically distributed (i.i.d.). This assumption restricts their applications to social networked time series analysis, since a user's actions can be correlated to his/her social affiliations. In this paper we propose a new Network Regularized Least Squares (NetRLS) feature selection model that combines typical time series data and user network data for analysis. Experiments on real-world networked time series Twitter and DBLP data demonstrate the performance of the proposed method. NetRLS performs better than LTS, the state-of-the-art time series feature selection approach, on real-world data.
- Dec 15 2016 cs.CV arXiv:1612.04755v1In this article, we propose a super-resolution method to resolve the problem of image low spatial because of the limitation of imaging devices. We make use of the strong non-linearity mapped ability of the back-propagation neural networks(BPNN). Training sample images are got by undersampled method. The elements chose as the inputs of the BPNN are pixels referred to Non-local means(NL-Means). Making use of the self-similarity of the images, those inputs are the pixels which are pixels gained from modified NL-means which is specific for super-resolution. Besides, small change on core function of NL-means has been applied in the method we use in this article so that we can have a clearer edge in the shrunk image. Experimental results gained from the Peak Signal to Noise Ratio(PSNR) and the Equivalent Number of Look(ENL), indicate that adding the similar pixels as inputs will increase the results than not taking them into consideration.
- Dec 09 2016 cs.CV arXiv:1612.02742v1Hand detection is essential for many hand related tasks, e.g. parsing hand pose, understanding gesture, which are extremely useful for robotics and human-computer interaction. However, hand detection in uncontrolled environments is challenging due to the flexibility of wrist joint and cluttered background. We propose a deep learning based approach which detects hands and calibrates in-plane rotation under supervision at the same time. To guarantee the recall, we propose a context aware proposal generation algorithm which significantly outperforms the selective search. We then design a convolutional neural network(CNN) which handles object rotation explicitly to jointly solve the object detection and rotation estimation tasks. Experiments show that our method achieves better results than state-of-the-art detection models on widely-used benchmarks such as Oxford and Egohands database. We further show that rotation estimation and classification can mutually benefit each other.
- Dec 06 2016 cs.CV arXiv:1612.01345v1Current person re-identification (re-id) methods assume that (1) pre-labelled training data are available for every camera pair, (2) the gallery size for re-identification is moderate. Both assumptions scale poorly to real-world applications when camera network size increases and gallery size becomes large. Human verification of automatic model ranked re-id results becomes inevitable. In this work, a novel human-in-the-loop re-id model based on Human Verification Incremental Learning (HVIL) is formulated which does not require any pre-labelled training data to learn a model, therefore readily scalable to new camera pairs. This HVIL model learns cumulatively from human feedback to provide instant improvement to re-id ranking of each probe on-the-fly enabling the model scalable to large gallery sizes. We further formulate a Regularised Metric Ensemble Learning (RMEL) model to combine a series of incrementally learned HVIL models into a single ensemble model to be used when human feedback becomes unavailable.
- Dec 06 2016 cs.CV arXiv:1612.01341v1Existing person re-identification models are poor for scaling up to large data required in real-world applications due to: (1) Complexity: They employ complex models for optimal performance resulting in high computational cost for training at a large scale; (2) Inadaptability: Once trained, they are unsuitable for incremental update to incorporate any new data available. This work proposes a truly scalable solution to re-id by addressing both problems. Specifically, a Highly Efficient Regression (HER) model is formulated by embedding the Fisher's criterion to a ridge regression model for very fast re-id model learning with scalable memory/storage usage. Importantly, this new HER model supports faster than real-time incremental model updates therefore making real-time active learning feasible in re-id with human-in-the-loop. Extensive experiments show that such a simple and fast model not only outperforms notably the state-of-the-art re-id methods, but also is more scalable to large data with additional benefits to active learning for reducing human labelling effort in re-id deployment.
- We propose a construction of de Bruijn sequences by the cycle joining method from linear feedback shift registers (LFSRs) with arbitrary characteristic polynomial $f(x)$. We study in detail the cycle structure of the set $\Omega(f(x))$ that contains all sequences produced by a specific LFSR on distinct inputs and provide an efficient way to find a state of each cycle. Our structural results lead to an efficient algorithm to find all conjugate pairs between any two cycles, yielding the adjacency graph. The approach provides a practical method to generate a large class of de Bruijn sequences. Many recently-proposed constructions of de Bruijn sequences are shown to be special cases of our construction.
- Understanding how brain functions has been an intriguing topic for years. With the recent progress on collecting massive data and developing advanced technology, people have become interested in addressing the challenge of decoding brain wave data into meaningful mind states, with many machine learning models and algorithms being revisited and developed, especially the ones that handle time series data because of the nature of brain waves. However, many of these time series models, like HMM with hidden state in discrete space or State Space Model with hidden state in continuous space, only work with one source of data and cannot handle different sources of information simultaneously. In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms. We apply this model to decode the mind state of students during lectures based on their brain waves and reach a significant better results compared to traditional methods.
- Nov 30 2016 cs.CR arXiv:1611.09418v1We propose a novel hierarchical online intrusion detection system (HOIDS) for supervisory control and data acquisition (SCADA) networks based on machine learning algorithms. By utilizing the server-client topology while keeping clients distributed for global protection, high detection rate is achieved with minimum network impact. We implement accurate models of normal-abnormal binary detection and multi-attack identification based on logistic regression and quasi-Newton optimization algorithm using the Broyden-Fletcher-Goldfarb-Shanno approach. The detection system is capable of accelerating detection by information gain based feature selection or principle component analysis based dimension reduction. By evaluating our system using the KDD99 dataset and the industrial control system dataset, we demonstrate that HOIDS is highly scalable, efficient and cost effective for securing SCADA infrastructures.
- Nov 30 2016 cs.CG arXiv:1611.09485v1We consider a problem of dispersing points on disjoint intervals on a line. Given n pairwise disjoint intervals sorted on a line, we want to find a point in each interval such that the minimum pairwise distance of these points is maximized. Based on a greedy strategy, we present a linear time algorithm for the problem. Further, we also solve in linear time the cycle version of the problem where the intervals are given on a cycle.
- Nov 24 2016 cs.CL arXiv:1611.07954v1Reading comprehension is a question answering task where the answer is to be found in a given passage about entities and events not mentioned in general knowledge sources. A significant number of neural architectures for this task (neural readers) have recently been developed and evaluated on large cloze-style datasets. We present experiments supporting the existence of logical structure in the hidden state vectors of "aggregation readers" such as the Attentive Reader and Stanford Reader. The logical structure of aggregation readers reflects the architecture of "explicit reference readers" such as the Attention-Sum Reader, the Gated Attention Reader and the Attention-over-Attention Reader. This relationship between aggregation readers and explicit reference readers presents a case study in emergent logical structure. In an independent contribution, we show that the addition of linguistics features to the input to existing neural readers significantly boosts performance yielding the best results to date on the Who-did-What datasets.
- Nov 23 2016 cs.CL arXiv:1611.07206v1In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.
- Nov 22 2016 cs.SI physics.soc-ph arXiv:1611.06645v1Network representation learning (NRL) aims to build low-dimensional vectors for vertices in a network. Most existing NRL methods focus on learning representations from local context of vertices (such as their neighbors). Nevertheless, vertices in many complex networks also exhibit significant global patterns widely known as communities. It's a common sense that vertices in the same community tend to connect densely, and usually share common attributes. These patterns are expected to improve NRL and benefit relevant evaluation tasks, such as link prediction and vertex classification. In this work, we propose a novel NRL model by introducing community information of vertices to learn more discriminative network representations, named as Community-enhanced Network Representation Learning (CNRL). CNRL simultaneously detects community distribution of each vertex and learns embeddings of both vertices and communities. In this way, we can obtain more informative representation of a vertex accompanying with its community information. In experiments, we evaluate the proposed CNRL model on vertex classification, link prediction, and community detection using several real-world datasets. The results demonstrate that CNRL significantly and consistently outperforms other state-of-the-art methods. Meanwhile, the detected meaningful communities verify our assumptions on the correlations among vertices, sequences, and communities.
- Nov 16 2016 cs.DB arXiv:1611.04689v2We study the similarity search problem which aims to find the similar query results according to a set of given data and a query string. To balance the result number and result quality, we combine query result diversity with query relaxation. Relaxation guarantees the number of the query results, returning more relevant elements to the query if the results are too few, while the diversity tries to reduce the similarity among the returned results. By making a trade-off of similarity and diversity, we improve the user experience. To achieve this goal, we define a novel goal function combining similarity and diversity. Aiming at this goal, we propose three algorithms. Among them, algorithms genGreedy and genCluster perform relaxation first and select part of the candidates to diversify. The third algorithm CB2S splits the dataset into smaller pieces using the clustering algorithm of k-means and processes queries in several small sets to retrieve more diverse results. The balance of similarity and diversity is determined through setting a threshold, which has a default value and can be adjusted according to users' preference. The performance and efficiency of our system are demonstrated through extensive experiments based on various datasets.
- Nov 15 2016 cs.DB arXiv:1611.04288v1A challenge for data imputation is the lack of knowledge. In this paper, we attempt to address this challenge by involving extra knowledge from web. To achieve high-performance web-based imputation, we use the dependency, i.e.FDs and CFDs, to impute as many as possible values automatically and fill in the other missing values with the minimal access of web, whose cost is relatively large. To make sufficient use of dependencies, We model the dependency set on the data as a graph and perform automatical imputation and keywords generation for web-based imputation based on such graph model. With the generated keywords, we design two algorithms to extract values for imputation from the search results. Extensive experimental results based on real-world data collections show that the proposed approach could impute missing values efficiently and effectively compared to existing approach.
- In the Fifth generation (5G) wireless communication systems, a majority of the traffic demands is contributed by various multimedia applications. To support the future 5G multimedia communication systems, the massive multiple-input multiple-output (MIMO) technique is recognized as a key enabler due to its high spectral efficiency. The massive antennas and radio frequency (RF) chains not only improve the implementation cost of 5G wireless communication systems but also result in an intense mutual coupling effect among antennas because of the limited space for deploying antennas. To reduce the cost, an optimal equivalent precoding matrix with the minimum number of RF chains is proposed for 5G multimedia massive MIMO communication systems considering the mutual coupling effect. Moreover, an upper bound of the effective capacity is derived for 5G multimedia massive MIMO communication systems. Two antenna receive diversity gain models are built and analyzed. The impacts of the antenna spacing, the number of antennas, the quality of service (QoS) statistical exponent, and the number of independent incident directions on the effective capacity of 5G multimedia massive MIMO communication systems are analyzed. Comparing with the conventional zero-forcing precoding matrix, simulation results demonstrate that the proposed optimal equivalent precoding matrix can achieve a higher achievable rate for 5G multimedia massive MIMO communication systems.
- Nov 10 2016 cs.CV arXiv:1611.02767v1Neural networks have shown to be a practical way of building a very complex mapping between a pre-specified input space and output space. For example, a convolutional neural network (CNN) mapping an image into one of a thousand object labels is approaching human performance in this particular task. However the mapping (neural network) does not automatically lend itself to other forms of queries, for example, to detect/reconstruct object instances, to enforce top-down signal on ambiguous inputs, or to recover object instances from occlusion. One way to address these queries is a backward pass through the network that fuses top-down and bottom-up information. In this paper, we show a way of building such a backward pass by defining a generative model of the neural network's activations. Approximate inference of the model would naturally take the form of a backward pass through the CNN layers, and it addresses the aforementioned queries in a unified framework.
- Nov 04 2016 cs.CC arXiv:1611.00975v2Holant problem is a general framework to study the computational complexity of counting problems. We prove a complexity dichotomy theorem for Holant problems over Boolean domain with non-negative weights. It is the first complete Holant dichotomy where constraint functions are not necessarily symmetric. Holant problems are indeed read-twice $\#$CSPs. Intuitively, some $\#$CSPs that are $\#$P-hard become tractable when restricting to read-twice instances. To capture them, we introduce the Block-rank-one condition. It turns out that the condition leads to a clear separation. If a function set $\mathcal{F}$ satisfies the condition, then $\mathcal{F}$ is of affine type or product type. Otherwise (a) $\mathrm{Holant}(\mathcal{F})$ is $\#$P-hard; or (b) every function in $\mathcal{F}$ is a tensor product of functions of arity at most 2; or (c) $\mathcal{F}$ is transformable to a product type by some real orthogonal matrix. Holographic transformations play an important role in both the hardness proof and the characterization of tractability.
- Hybrid methods that utilize both content and rating information are commonly used in many recommender systems. However, most of them use either handcrafted features or the bag-of-words representation as a surrogate for the content information but they are neither effective nor natural enough. To address this problem, we develop a collaborative recurrent autoencoder (CRAE) which is a denoising recurrent autoencoder (DRAE) that models the generation of content sequences in the collaborative filtering (CF) setting. The model generalizes recent advances in recurrent deep learning from i.i.d. input to non-i.i.d. (CF-based) input and provides a new denoising scheme along with a novel learnable pooling scheme for the recurrent autoencoder. To do this, we first develop a hierarchical Bayesian model for the DRAE and then generalize it to the CF setting. The synergy between denoising and CF enables CRAE to make accurate recommendations while learning to fill in the blanks in sequences. Experiments on real-world datasets from different domains (CiteULike and Netflix) show that, by jointly modeling the order-aware generation of sequences for the content information and performing CF for the ratings, CRAE is able to significantly outperform the state of the art on both the recommendation task based on ratings and the sequence generation task based on content information.
- Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponential-family distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as second-order representations for the associated tasks such as link prediction. Experiments on real-world datasets show that NPN can achieve state-of-the-art performance.
- Nov 01 2016 cs.DB arXiv:1610.09506v1In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and even impossible to access all data sources. Data Source selection should consider both efficiency and effectiveness issues. For efficiency, the approach should achieve high performance and be scalability to fit large data source amount. From effectiveness aspect, data quality and overlapping of sources are to be considered, since data quality varies much from data sources, with significant differences in the accuracy and coverage of the data provided, and the overlapping of sources can even lower the quality of data integrated from selected data sources. In this paper, we study source selection problem in \textitBig Data Era and propose methods which can scale to datasets with up to millions of data sources and guarantee the quality of results. Motivated by this, we propose a new object function taking the expected number of true values a source can provide as a criteria to evaluate the contribution of a data source. Based on our proposed index we present a scalable algorithm and two pruning strategies to improve the efficiency without sacrificing precision. Experimental results on both real world and synthetic data sets show that our methods can select sources providing a large proportion of true values efficiently and can scale to massive data sources.
- Nov 01 2016 cs.DB arXiv:1610.09500v1Entity resolution (ER) is the problem of identifying and merging records that refer to the same real-world entity. In many scenarios, raw records are stored under heterogeneous environment. Specifically, the schemas of records may differ from each other. To leverage such records better, most existing work assume that schema matching and data exchange have been done to convert records under different schemas to those under a predefined schema. However, we observe that schema matching would lose information in some cases, which could be useful or even crucial to ER. To leverage sufficient information from heterogeneous sources, in this paper, we address several challenges of ER on heterogeneous records and show that none of existing similarity metrics or their transformations could be applied to find similar records under heterogeneous settings. Motivated by this, we design the similarity function and propose a novel framework to iteratively find records which refer to the same entity. Regarding efficiency, we build an index to generate candidates and accelerate similarity computation. Evaluations on real-world datasets show the effectiveness and efficiency of our methods.
- Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user's writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user's intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.
- Oct 27 2016 cs.CL arXiv:1610.08431v3Progress in text understanding has been driven by large datasets that test particular capabilities, like recent datasets for reading comprehension (Hermann et al., 2015). We focus here on the LAMBADA dataset (Paperno et al., 2016), a word prediction task requiring broader context than the immediate sentence. We view LAMBADA as a reading comprehension problem and apply comprehension models based on neural networks. Though these models are constrained to choose a word from the context, they improve the state of the art on LAMBADA from 7.3% to 49%. We analyze 100 instances, finding that neural network readers perform well in cases that involve selecting a name from the context based on dialogue or discourse cues but struggle when coreference resolution or external knowledge is needed.
- With the recent emergence of 5G era, heterogeneous cellular networks (HCNs) have invoked a popular research interest. In this paper, we provide a comprehensive analysis for multiantenna transmissions in a multi-tier downlink HCN. We first propose a reliability-oriented threshold-based mobile association policy, where each user connects to the strongest base station from which this user can obtain the largest truncated long-term received power. Under our mobile association policy, we derive analytical expressions for the exact outage probability of an arbitrary randomly located user, along with computationally convenient lower and upper bounds. Asymptotic analysis on the outage probability shows that introducing a large access threshold into mobile association significantly decreases the outage probability. We further investigate the spectrum efficiency and the energy efficiency of the HCN. Our theoretic analysis and numerical validations show that both the spectrum and energy efficiencies can be improved by properly choosing the access threshold.
- In this paper, we study the benefits of full-duplex (FD) receiver jamming in enhancing the physical-layer security of a two-tier decentralized wireless network with each tier deployed with a large number of pairs of a single-antenna transmitter and a multi-antenna receiver. In the underlying tier, the transmitter sends unclassified information, and the receiver works in the halfduplex (HD) mode receiving the desired signal. In the overlaid tier, the transmitter deliveries confidential information in the presence of randomly located eavesdroppers, and the receiver works in the FD mode radiating jamming signals to confuse eavesdroppers and receiving the desired signal simultaneously. We provide a comprehensive performance analysis and network design under a stochastic geometry framework. Specifically, we consider the scenarios where each FD receiver uses single- and multi-antenna jamming, and analyze the connection probability and the secrecy outage probability of a typical FD receiver by providing accurate expressions and more tractable approximations for the two metrics. We further determine the optimal deployment of the FD-mode tier in order to maximize networkwide secrecy throughput subject to constraints including the given dual probabilities and the network-wide throughput of the HD-mode tier. Numerical results are demonstrated to verify our theoretical findings, and show that network-wide secrecy throughput is significantly improved by properly deploying the FD-mode tier.
- Epidemic propagation on complex networks has been widely investigated, mostly with invariant parameters. However, the process of epidemic propagation is not always constant. Epidemics can be affected by various perturbations, and may bounce back to its original state, which is considered resilient. Here, we study the resilience of epidemics on networks, by introducing a different infection rate ${\lambda_{2}}$ during SIS (susceptible-infected-susceptible) epidemic propagation to model perturbations (control state), whereas the infection rate is ${\lambda_{1}}$ in the rest of time. Through simulations and theoretical analysis, we find that even for ${\lambda_{2}<\lambda_{c}}$, epidemics eventually could bounce back if control duration is below a threshold. This critical control time for epidemic resilience, i.e., ${cd_{max}}$ can be predicted by the diameter (${d}$) of the underlying network, with the quantitative relation ${cd_{max}\sim d^{\alpha}}$. Our findings can help to design a better mitigation strategy for epidemics.
- Oct 18 2016 cs.NE arXiv:1610.05231v1Various variants of the well known Covariance Matrix Adaptation Evolution Strategy (CMA-ES) have been proposed recently, which improve the empirical performance of the original algorithm by structural modifications. However, in practice it is often unclear which variation is best suited to the specific optimization problem at hand. As one approach to tackle this issue, algorithmic mechanisms attached to CMA-ES variants are considered and extracted as functional \emphmodules, allowing for combinations of them. This leads to a configuration space over ES structures, which enables the exploration of algorithm structures and paves the way toward novel algorithm generation. Specifically, eleven modules are incorporated in this framework with two or three alternative configurations for each module, resulting in $4\,608$ algorithms. A self-adaptive Genetic Algorithm (GA) is used to efficiently evolve effective ES-structures for given classes of optimization problems, outperforming any classical CMA-ES variants from literature. The proposed approach is evaluated on noiseless functions from BBOB suite. Furthermore, such an observation is again confirmed on different function groups and dimensionality, indicating the feasibility of ES configuration on real-world problem classes.
- In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task. Voice conversion has been widely studied due to its potential applications such as personalized speech synthesis and speech enhancement. Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest choice among all techniques for SC, as long as a source-target parallel speech corpus is given. ENMF-based SC systems usually need a large amount of bases (exemplars) to ensure the quality of the converted speech. However, a small and effective dictionary is desirable but hard to obtain via dictionary update, in particular when high-dimensional features such as STRAIGHT spectra are used. Therefore, we propose a dictionary update framework for NMF by means of an encoder-decoder reformulation. Regarding NMF as an encoder-decoder network makes it possible to exploit the whole parallel corpus more effectively and efficiently when applied to SC. Our experiments demonstrate significant gains of the proposed system with small dictionaries over conventional ENMF-based systems with dictionaries of same or much larger size.
- We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.
- Oct 11 2016 cs.CV arXiv:1610.02760v1In this paper, a novel model of 3D elastic mesh is presented for image segmentation. The model is inspired by stress and strain in physical elastic objects, while the repulsive force and elastic force in the model are defined slightly different from the physical force to suit the segmentation problem well. The self-balancing mechanism in the model guarantees the stability of the method in segmentation. The shape of the elastic mesh at balance state is used for region segmentation, in which the sign distribution of the points'z coordinate values is taken as the basis for segmentation. The effectiveness of the proposed method is proved by analysis and experimental results for both test images and real world images.
- Wireless networks with the consideration of social relationships among network nodes are highly appealing for lots of important data communication services. Ensuring the security of such networks is of great importance to facilitate their applications in supporting future social-based services with strong security guarantee. This paper explores the physical layer security-based secure communication in a finite Poisson network with social friendships among nodes, for which a social friendship-based cooperative jamming scheme is proposed. The jamming scheme consists of a Local Friendship Circle (LFC) and a Long-range Friendship Annulus (LFA), where all legitimate nodes in the LFC serve as jammers, but the legitimate nodes in the LFA are selected as jammers through three location-based policies. To understand both the security and reliability performance of the proposed jamming scheme, we first model the sum interference at any location in the network by deriving its Laplace transform under two typical path loss scenarios. With the help of the interference Laplace transform results, we then derive the exact expression for the transmission outage probability (TOP) and determine both the upper and lower bounds on the secrecy outage probability (SOP), such that the overall outage performances of the proposed jamming scheme can be depicted. Finally, we present extensive numerical results to validate the theoretical analysis of TOP and SOP and also to illustrate the impacts of the friendship-based cooperative jamming on the network performances.
- In this paper, we study the interactions among interconnected autonomous microgrids, and propose a joint energy trading and scheduling strategy. Each interconnected microgrid not only schedules its local power supply and demand, but also trades energy with other microgrids in a distribution network. Specifically, microgrids with excessive renewable generations can trade with other microgrids in deficit of power supplies for mutual benefits. Since interconnected microgrids operate autonomously, they aim to optimize their own performance and expect to gain benefits through energy trading. We design an incentive mechanism using Nash bargaining theory to encourage proactive energy trading and fair benefit sharing. We solve the bargaining problem by decomposing it into two sequential problems on social cost minimization and trading benefit sharing, respectively. For practical implementation, we propose a decentralized solution method with minimum information exchange overhead. Numerical studies based on realistic data demonstrate that the total cost of the interconnected-microgrids operation can be reduced by up to 13.2% through energy trading, and an individual participating microgrid can achieve up to 29.4% reduction in its cost through energy trading.
- Given $n$ intervals on a line $\ell$, we consider the problem of moving these intervals on $\ell$ such that no two intervals overlap and the maximum moving distance of the intervals is minimized. The difficulty for solving the problem lies in determining the order of the intervals in an optimal solution. By interesting observations, we show that it is sufficient to consider at most $n$ "candidate" lists of ordered intervals. Further, although explicitly maintaining these lists takes $\Omega(n^2)$ time and space, by more observations and a pruning technique, we present an algorithm that can compute an optimal solution in $O(n\log n)$ time and $O(n)$ space. We also prove an $\Omega(n\log n)$ time lower bound for solving the problem, which implies the optimality of our algorithm.
- Sep 22 2016 cs.CV arXiv:1609.06441v1To dynamically detect the facial landmarks in the video, we propose a novel hybrid framework termed as detection-tracking-detection (DTD). First, the face bounding box is achieved from the first frame of the video sequence based on a traditional face detection method. Then, a landmark detector detects the facial landmarks, which is based on a cascaded deep convolution neural network (DCNN). Next, the face bounding box in the current frame is estimated and validated after the facial landmarks in the previous frame are tracked based on the median flow. Finally, the facial landmarks in the current frame are exactly detected from the validated face bounding box via the landmark detector. Experimental results indicate that the proposed framework can detect the facial landmarks in the video sequence more effectively and with lower consuming time compared to the frame-by-frame method via the DCNN.
- Multimodal sentiment analysis is drawing an increasing amount of attention these days. It enables mining of opinions in video reviews and surveys which are now available aplenty on online platforms like YouTube. However, the limited number of high-quality multimodal sentiment data samples may introduce the problem of the sentiment being dependent on the individual specific features in the dataset. This results in a lack of generalizability of the trained models for classification on larger online platforms. In this paper, we first examine the data and verify the existence of this dependence problem. Then we propose a Select-Additive Learning (SAL) procedure that improves the generalizability of trained discriminative neural networks. SAL is a two-phase learning method. In Selection phase, it selects the confounding learned representation. In Addition phase, it forces the classifier to discard confounded representations by adding Gaussian noise. In our experiments, we show how SAL improves the generalizability of state-of-the-art models. We increase prediction accuracy significantly in all three modalities (text, audio, video), as well as in their fusion. We show how SAL, even when trained on one dataset, achieves good accuracy across test datasets.
- Preference orderings are orderings of a set of items according to the preferences (of judges). Such orderings arise in a variety of domains, including group decision making, consumer marketing, voting and machine learning. Measuring the mutual information and extracting the common patterns in a set of preference orderings are key to these areas. In this paper we deal with the representation of sets of preference orderings, the quantification of the degree to which judges agree on their ordering of the items (i.e. the concordance), and the efficient, meaningful description of such sets. We propose to represent the orderings in a subsequence-based feature space and present a new algorithm to calculate the size of the set of all common subsequences - the basis of a quantification of concordance, not only for pairs of orderings but also for sets of orderings. The new algorithm is fast and storage efficient with a time complexity of only $O(Nn^2)$ for the orderings of $n$ items by $N$ judges and a space complexity of only $O(\min\{Nn,n^2\})$. Also, we propose to represent the set of all $N$ orderings through a smallest set of covering preferences and present an algorithm to construct this smallest covering set. The source code for the algorithms is available at https://github.com/zhiweiuu/secs
- Sep 15 2016 cs.DS arXiv:1609.04350v1Graphs are commonly used to represent objects, such as images and text, for pattern classification. In a dynamic world, an object may continuously evolve over time, and so does the graph extracted from the underlying object. These changes in graph structure with respect to the temporal order present a new representation of the graph, in which an object corresponds to a set of time-variant graphs. In this paper, we formulate a novel time-variant graph classification task and propose a new graph feature, called a \emphgraph-shapelet pattern, for learning and classifying time-variant graphs. Graph-shapelet patterns are compact and discriminative \emphgraph transformation subsequences. A graph-shapelet pattern can be regarded as a graphical extension of a \emphshapelet -- a class of discriminative features designed for vector-based temporal data classification. To discover graph-shapelet patterns, we propose to convert a time-variant graph sequence into time-series data and use the discovered shapelets to find \emphgraph transformation subsequences as graph-shapelet patterns. By converting each graph-shapelet pattern into a unique tokenized graph transformation sequence, we can measure the similarity between two graph-shapelet patterns and therefore classify time-variant graphs. Experiments on both synthetic and real-world data demonstrate the superior performance of the proposed algorithms.
- The read channel of a Flash memory cell degrades after repetitive program and erase (P/E) operations. This degradation is often modeled as a function of the number of P/E cycles. In contrast, this paper models the degradation as a function of the cumulative effect of the charge written and erased from the cell. Based on this modeling approach, this paper dynamically allocates voltage using lower-voltage write thresholds at the beginning of the device lifetime and increasing the thresholds as needed to maintain the mutual information of the read channel in the face of degradation. The paper introduces the technique in an idealized setting and then removes ideal assumptions about channel knowledge and available voltage resolution to conclude with a practical scheme with performance close to that of the idealized setting.
- Sep 07 2016 cs.IR arXiv:1609.01331v1In this work, we propose a joint audio-video fingerprint Automatic Content Recognition (ACR) technology for media retrieval. The problem is focused on how to balance the query accuracy and the size of fingerprint, and how to allocate the bits of the fingerprint to video frames and audio frames to achieve the best query accuracy. By constructing a novel concept called Coverage, which is highly correlated to the query accuracy, we are able to form a rate-coverage model to translate the original problem into an optimization problem that can be resolved by dynamic programming. To the best of our knowledge, this is the first work that uses joint audio-video fingerprint ACR technology for media retrieval with a theoretical problem formulation. Experimental results indicate that compared to reference algorithms, the proposed method has up to 25% query accuracy improvement while using 60% overall bit-rates, and 25% bit-rate reduction while achieving 85% accuracy, and it significantly outperforms the solution with single audio or video source fingerprint.
- Sep 02 2016 cs.MM arXiv:1609.00045v1Crowdsourced Live Streaming (CLS), most notably Twitch.tv, has seen explosive growth in its popularity in the past few years. In such systems, any user can lively broadcast video content of interest to others, e.g., from a game player to many online viewers. To fulfill the demands from both massive and heterogeneous broadcasters and viewers, expensive server clusters have been deployed to provide video ingesting and transcoding services. Despite the existence of highly popular channels, a significant portion of the channels is indeed unpopular. Yet as our measurement shows, these broadcasters are consuming considerable system resources; in particular, 25% (resp. 30%) of bandwidth (resp. computation) resources are used by the broadcasters who do not have any viewers at all. In this paper, we closely examine the challenge of handling unpopular live-broadcasting channels in CLS systems and present a comprehensive solution for service partitioning on hybrid cloud. The trace-driven evaluation shows that our hybrid cloud-assisted design can smartly assign ingesting and transcoding tasks to the elastic cloud virtual machines, providing flexible system deployment cost-effectively.
- Backscatter wireless communication is an emerging technique widely used in low-cost and low-power wireless systems, especially in passive radio frequency identification (RFID) systems. Recently, the requirement of high data rates, data reliability, and security drives the development of RFID systems, which motivates our investigation on the physical layer security of a multiple-input multiple-output (MIMO) RFID system. In this paper, we propose a noise-injection precoding strategy to safeguard the system security with the resource-constrained nature of the backscatter system taken into consideration. We first consider a multi-antenna RFID tag case and investigate the secrecy rate maximization (SRM) problem by jointly optimizing the energy supply power and the precoding matrix of the injected artificial noise at the RFID reader. We exploit the alternating optimization method and the sequential parametric convex approximation method, respectively, to tackle the non-convex SRM problem and show an interesting fact that the two methods are actually equivalent for our SRM problem with the convergence of a Karush-Kuhn-Tucker (KKT) point. To facilitate the practical implementation for resource-constrained RFID devices, we propose a fast algorithm based on projected gradient. We also consider a single-antenna RFID tag case and develop a low-complexity algorithm which yields the global optimal solution. Simulation results show the superiority of our proposed algorithms in terms of the secrecy rate and computational complexity.
- While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This paper proposes a general framework for Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this paper, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.
- This letter proposes a novel dual user selection scheme for uplink transmission with multiple users, where a jamming user and a served user are jointly selected to improve the secrecy performance. Specifically, the jamming user transmits jamming signal with a certain rate, so that the base station (BS) can decode the jamming signal before detecting the secret information signal. By carefully selecting the jamming user and the served user, it makes the eavesdropper decode the jamming signal with a low probability meanwhile the BS can achieve a high receive signal-to-noise ratio (SNR). Therefore, the uplink transmission achieves the dual secrecy improvement by jamming and scheduling. It is shown that the proposed scheme can significantly improve the security of the uplink transmission.
- Aug 22 2016 cs.CL arXiv:1608.05457v1We have constructed a new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each problem is formed from two independent articles --- an article given as the passage to be read and a separate article on the same events used to form the question. Second, we avoid anonymization --- each choice is a person named entity. Third, the problems have been filtered to remove a fraction that are easily solved by simple baselines, while remaining 84% solvable by humans. We report performance benchmarks of standard systems and propose the WDW dataset as a challenge task for the community.
- In this paper, we investigate secure communications in uplink transmissions, where there are a base station (BS) with M receive antennas, K mobile users each with a single antenna, and an eavesdropper with N receive antennas. The closed-form expressions of the achievable ergodic secrecy sumrates (ESSR) for a random k users selection scheme in the high and low SNR regimes are presented. It is shown that the scaling behavior of ESSR with respect to the number of served users k can be quite different under different system configurations, determined by the numbers of the BS antennas and that of the eavesdropper antennas. In order to achieve a multiuser gain, two low-complexity user selection schemes are proposed under different assumptions on the eavesdropper's channel state information (CSI). The closed-form expressions of the achievable ESSRs and the multiuser secrecy gains of the two schemes are also presented in both low and high SNR regimes. We observe that, as k increases, the multiuser secrecy gain increases, while the ESSR may decrease. Therefore, when N is much larger than M, serving one user with the strongest channel (TDMA-like) is a favourable secrecy scheme, where the ESSR scales with root square of 2 log K.
- This paper studies the impact of artificial noise (AN) on the secrecy performance of a target cell in multi-cell cellular networks. Although AN turns out to be an efficient approach for securing a pointto-point/single cell confidential transmission, it would increase the inter-cell interference in a multi-cell cellular network, which may degrade the network reliability and secrecy performance. For analyzing the average secrecy performance of the target cell which is of significant interest, we employ a hybrid cellular deployment model, where the target cell is a circle of fixed size and the base stations (BSs) outside the target cell are modeled as a homogeneous Poisson point process (PPP). We investigate the impact of AN on the reliability and security of users in the target cell in the presence of pilot contamination using a stochastic geometry approach. The analytical results of the average connection outage and the secrecy outage of its cellular user (CU) in the target cell are given, which facilitates the evaluation of the average secrecy throughput of a randomly chosen CU in the target cell. It shows that with an optimized power allocation between the desired signals and AN, the AN scheme is an efficient solution for securing the communications in a multi-cell cellular network.