results for au:Zhang_Z in:cs

- Massive multiple-input multiple-output (M-MIMO) technique brings better energy efficiency and coverage but higher computational complexity than small-scale MIMO. For linear detections such as minimum mean square error (MMSE), prohibitive complexity lies in solving large-scale linear equations. For a better trade-off between bit-error-rate (BER) performance and computational complexity, iterative linear algorithms like conjugate gradient (CG) have been applied and have shown their feasibility in recent years. In this paper, residual-based detection (RBD) algorithms are proposed for M-MIMO detection, including minimal residual (MINRES) algorithm, generalized minimal residual (GMRES) algorithm, and conjugate residual (CR) algorithm. RBD algorithms focus on the minimization of residual norm per iteration, whereas most existing algorithms focus on the approximation of exact signal. Numerical results have shown that, for $64$-QAM $128\times 8$ MIMO, RBD algorithms are only $0.13$ dB away from the exact matrix inversion method when BER$=10^{-4}$. Stability of RBD algorithms has also been verified in various correlation conditions. Complexity comparison has shown that, CR algorithm require $87\%$ less complexity than the traditional method for $128\times 60$ MIMO. The unified hardware architecture is proposed with flexibility, which guarantees a low-complexity implementation for a family of RBD M-MIMO detectors.
- Feb 14 2018 cs.AR arXiv:1802.04576v1With the ever-growing storage density, high-speed, and low-cost data access, flash memory has inevitably become popular. Multi-level cell (MLC) NAND flash memory, which can well balance the data density and memory stability, has occupied the largest market share of flash memory. With the aggressive memory scaling, however, the reliability decays sharply owing to multiple interferences. Therefore, the control system should be embedded with a suitable error correction code (ECC) to guarantee the data integrity and accuracy. We proposed the pre-check scheme which is a multi-strategy polar code scheme to strike a balance between reasonable frame error rate (FER) and decoding latency. Three decoders namely binary-input, quantized-soft, and pure-soft decoders are embedded in this scheme. Since the calculation of soft log-likelihood ratio (LLR) inputs needs multiple sensing operations and optional quantization boundaries, a 2-bit quantized hard-decision decoder is proposed to outperform the hard-decoded LDPC bit-flipping decoder with fewer sensing operations. We notice that polar codes have much lower computational complexity compared to LDPC codes. The stepwise maximum mutual information (SMMI) scheme is also proposed to obtain overlapped boundaries without exhausting search. The mapping scheme using Gray code is employed and proved to achieve better raw error performance compared to other alternatives. Hardware architectures are also given in this paper.
- Feb 13 2018 cs.CV arXiv:1802.03750v1We present Fast-Downsampling MobileNet (FD-MobileNet), an efficient and accurate network for very limited computational budgets (e.g., 10-140 MFLOPs). Our key idea is applying an aggressive downsampling strategy to MobileNet framework. In FD-MobileNet, we perform 32$\times$ downsampling within 12 layers, only half the layers in the original MobileNet. This design brings three advantages: (i) It remarkably reduces the computational cost. (ii) It increases the information capacity and achieves significant performance improvements. (iii) It is engineering-friendly and provides fast actual inference speed. Experiments on ILSVRC 2012 and PASCAL VOC 2007 datasets demonstrate that FD-MobileNet consistently outperforms MobileNet and achieves comparable results with ShuffleNet under different computational budgets, for instance, surpassing MobileNet by 5.5% on the ILSVRC 2012 top-1 accuracy and 3.6% on the VOC 2007 mAP under a complexity of 12 MFLOPs. On an ARM-based device, FD-MobileNet achieves 1.11$\times$ inference speedup over MobileNet and 1.82$\times$ over ShuffleNet under the same complexity.
- Feb 12 2018 cs.DC arXiv:1802.03152v1Although the public cloud still occupies the largest portion of the total cloud infrastructure, the private cloud is attracting increasing interest because of its better security and privacy control. According to previous research, a high upfront cost is among the most serious challenges associated with private cloud computing. Virtual machine placement (VMP) is a critical operation for cloud computing, as it improves performance and reduces cost. Extensive VMP methods have been researched, but few have been designed to reduce the upfront cost of private clouds. To fill this gap, in this paper, a heterogeneous and multidimensional clairvoyant dynamic bin packing (CDBP) model, in which the scheduler can conduct more efficient scheduling with additional time information to reduce the size of the datacenter and, thereby, decrease the upfront cost, is applied. An innovative branch-and-bound algorithm with the divide-and-conquer strategy (DCBB) is proposed to reduce the number of servers (#servers), with fast processing speed. In addition, some algorithms based on first fit (FF) and the ant colony system (ACS) are modified to apply them to the CDBP model. Experiments are conducted on generated and real-world data to check the performance and efficiency of the algorithms. The results confirm that the DCBB can make a tradeoff between performance and efficiency and also achieves a much faster convergence speed than that of other search-based algorithms. Furthermore, the DCBB yields the optimal solution under real-world workloads in much less runtime (by an order of magnitude) than required by the original branch-and-bound (BB) algorithm.
- Multi-antenna non-orthogonal multiple access (NOMA) is a promising technique to significantly improve the spectral efficiency and support massive access, which has received considerable interests from academic and industry. This article first briefly introduces the basic idea of conventional multi-antenna NOMA technique, and then discusses the key limitations, namely, the high complexity of successive interference cancellation(SIC) and the lack of fairness between the user with a strong channel gain and the user with a weak channel gain. To address these problems, this article proposes a novel spatial modulation (SM) assisted multi-antenna NOMA technique, which avoids the use of SIC and is able to completely cancel intra-cluster interference. Furthermore, simulation results are provided to validate the effectiveness of the proposed novel technique compared to the conventional multi-antenna NOMA. Finally, this article points out the key challenges and sheds light on the future research directions of the SM assisted multi-antenna NOMA technique.
- Current flow closeness centrality (CFCC) has a better discriminating ability than the ordinary closeness centrality based on shortest paths. In this paper, we extend this notion to a group of vertices in a weighted graph, and then study the problem of finding a subset $S$ of $k$ vertices to maximize its CFCC $C(S)$, both theoretically and experimentally. We show that the problem is NP-hard, but propose two greedy algorithms for minimizing the reciprocal of $C(S)$ with provable guarantees using the monotoncity and supermodularity. The first is a deterministic algorithm with an approximation factor $(1-\frac{k}{k-1}\cdot\frac{1}{e})$ and cubic running time; while the second is a randomized algorithm with a $(1-\frac{k}{k-1}\cdot\frac{1}{e}-\epsilon)$-approximation and nearly-linear running time for any $\epsilon > 0$. Extensive experiments on model and real networks demonstrate that our algorithms are effective and efficient, with the second algorithm being scalable to massive networks with more than a million vertices.
- Feb 06 2018 cs.CV arXiv:1802.00853v1In this paper, we address the incremental classifier learning problem, which suffers from catastrophic forgetting. The main reason for catastrophic forgetting is that the past data are not available during learning. Typical approaches keep some exemplars for the past classes and use distillation regularization to retain the classification capability on the past classes and balance the past and new classes. However, there are four main problems with these approaches. First, the loss function is not efficient for classification. Second, there is unbalance problem between the past and new classes. Third, the size of pre-decided exemplars is usually limited and they might not be distinguishable from unseen new classes. Forth, the exemplars may not be allowed to be kept for a long time due to privacy regulations. To address these problems, we propose (a) a new loss function to combine the cross-entropy loss and distillation loss, (b) a simple way to estimate and remove the unbalance between the old and new classes , and (c) using Generative Adversarial Networks (GANs) to generate historical data and select representative exemplars during generation. We believe that the data generated by GANs have much less privacy issues than real images because GANs do not directly copy any real image patches. We evaluate the proposed method on CIFAR-100, Flower-102, and MS-Celeb-1M-Base datasets and extensive experiments demonstrate the effectiveness of our method.
- Feb 05 2018 cs.RO arXiv:1802.00463v1A human-in-the-loop system is proposed to enable collaborative manipulation tasks for person with physical disabilities. Studies have shown the cognitive burden of subject is reduced with increased autonomy of assistive system. Our framework is aim to communicate high-level intent from subject and generate desired manipulations. We elaborate a framework by incorporating a tongue-drive system and a 7 DoF robotic arm through an virtual interface. The assistive system processes sensor input for interpreting users environment, and the subject provides an ego-centric visual feedback via interface to guide action loop for achieving the tasks. Extensive experiments are performed on our framework and we show our coupled feedback loops is able to effectively simplify complex manipulation tasks.
- We first present a comprehensive review of various random walk metrics used in the literature and express them in a consistent framework. We then introduce fundamental tensor -- a generalization of the well-known fundamental matrix -- and show that classical random walk metrics can be derived from it in a unified manner. We provide a collection of useful relations for random walk metrics that are useful and insightful for network studies. To demonstrate the usefulness and efficacy of the proposed fundamental tensor in network analysis, we present four important applications: 1) unification of network centrality measures, 2) characterization of (generalized) network articulation points, 3) identification of network most influential nodes, and 4) fast computation of network reachability after failures.
- Jan 23 2018 cs.CV arXiv:1801.06742v2Sufficient training data is normally required to train deeply learned models. However, the number of pedestrian images per ID in person re-identification (re-ID) datasets is usually limited, since manually annotations are required for multiple camera views. To produce more data for training deeply learned models, generative adversarial network (GAN) can be leveraged to generate samples for person re-ID. However, the samples generated by vanilla GAN usually do not have labels. So in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated images. With MpRL, the generated samples will be used as supplementary of real training data to train a deep model in a semi-supervised learning fashion. Considering data bias between generated and real samples, MpRL utilizes different contributions from predefined training classes. The contribution-based virtual labels are automatically assigned to generated samples to reduce ambiguous prediction in training. Meanwhile, MpRL only relies on predefined training classes without using extra classes. Furthermore, to reduce over-fitting, a regularized manner is applied to MpRL to regularize the learning process. To verify the effectiveness of MpRL, two state-of-the-art convolutional neural networks (CNNs) are adopted in our experiments. Experiments demonstrate that by assigning MpRL to generated samples, we can further improve the person re-ID performance on three datasets i.e., Market-1501, DukeMTMCreID, and CUHK03. The proposed method obtains +6.29%, +6.30% and +5.58% improvements in rank-1 accuracy over a strong CNN baseline respectively, and outperforms the state-of-the- art methods.
- Jan 23 2018 cs.CV arXiv:1801.06790v1Incorporating encoding-decoding nets with adversarial nets has been widely adopted in image generation tasks. We observe that the state-of-the-art achievements were obtained by carefully balancing the reconstruction loss and adversarial loss, and such balance shifts with different network structures, datasets, and training strategies. Empirical studies have demonstrated that an inappropriate weight between the two losses may cause instability, and it is tricky to search for the optimal setting, especially when lacking prior knowledge on the data and network. This paper gives the first attempt to relax the need of manual balancing by proposing the concept of \textitdecoupled learning, where a novel network structure is designed that explicitly disentangles the backpropagation paths of the two losses. Experimental results demonstrate the effectiveness, robustness, and generality of the proposed method. The other contribution of the paper is the design of a new evaluation metric to measure the image quality of generative models. We propose the so-called \textitnormalized relative discriminative score (NRDS), which introduces the idea of relative comparison, rather than providing absolute estimates like existing metrics.
- Jan 23 2018 cs.CL arXiv:1801.06613v2Web 2.0 has brought with it numerous user-produced data revealing one's thoughts, experiences, and knowledge, which are a great source for many tasks, such as information extraction, and knowledge base construction. However, the colloquial nature of the texts poses new challenges for current natural language processing techniques, which are more adapt to the formal form of the language. Ellipsis is a common linguistic phenomenon that some words are left out as they are understood from the context, especially in oral utterance, hindering the improvement of dependency parsing, which is of great importance for tasks relied on the meaning of the sentence. In order to promote research in this area, we are releasing a Chinese dependency treebank of 319 weibos, containing 572 sentences with omissions restored and contexts reserved.
- Jan 23 2018 cs.CV arXiv:1801.06732v2Image forgery detection is the task of detecting and localizing forged parts in tampered images. Previous works mostly focus on high resolution images using traces of resampling features, demosaicing features or sharpness of edges. However, a good detection method should also be applicable to low resolution images because compressed or resized images are common these days. To this end, we propose a Shallow Convolutional Neural Network(SCNN), capable of distinguishing the boundaries of forged regions from original edges in low resolution images. SCNN is designed to utilize the information of chroma and saturation. Based on SCNN, two approaches that are named Sliding Windows Detection (SWD) and Fast SCNN, respectively, are developed to detect and localize image forgery region. In this paper, we substantiate that Fast SCNN can detect drastic change of chroma and saturation. In image forgery detection experiments Our model is evaluated on the CASIA 2.0 dataset. The results show that Fast SCNN performs well on low resolution images and achieves significant improvements over the state-of-the-art.
- Jan 17 2018 cs.CV arXiv:1801.05302v1Inferring and Executing Programs for Visual Reasoning proposes a model for visual reasoning that consists of a program generator and an execution engine to avoid end-to-end models. To show that the model actually learns which objects to focus on to answer the questions, the authors give a visualization of the norm of the gradient of the sum of the predicted answer scores with respect to the final feature map. However, the authors do not evaluate the efficiency of focus map. This paper purposed a method for evaluating it. We generate several kinds of questions to test different keywords. We infer focus maps from the model by asking these questions and evaluate them by comparing with the segmentation graph. Furthermore, this method can be applied to any model if focus maps can be inferred from it. By evaluating focus map of different models on the CLEVR dataset, we will show that CLEVR-iep model has learned where to focus more than end-to-end models.
- We propose to use second order Reed-Muller (RM) sequence for user identification in 5G grant-free access. The benefits of RM sequences mainly lie in two folds, (i) support of much larger user space, hence lower collision probability and (ii) lower detection complexity. These two features are essential to meet the massive connectivity ($10^7$ links/km$^2$), ultra-reliable and low-latency requirements in 5G, e.g., one-shot transmission ($\leq 1$ms) with $\leq 10^{-4}$ packet error rate. However, the non-orthogonality introduced during sequence space expansion leads to worse detection performance. In this paper, we propose a noise-resilient detection algorithm along with a layered sequence construction to meet the harsh requirements. Link-level simulations in both narrow-band and OFDM-based scenarios show that RM sequences are suitable for 5G.
- Jan 12 2018 cs.SI physics.soc-ph arXiv:1801.03618v2Community structures detection is one of the fundamental problems in complex network analysis towards understanding the topology structures of the network and the functions of it. Nonnegative matrix factorization (NMF) is a widely used method for community detection, and modularity Q and modularity density D are criteria to evaluate the quality of community structures. In this paper, we establish the connections between Q, D and NMF for the first time. Q maximization can be approximately reformulated under the framework of NMF with Frobenius norm, especially when $n$ is large, and D maximization can also be reformulated under the framework of NMF. Q minimization can be reformulated under the framework of NMF with Kullback-Leibler divergence. We propose new methods for community structures detection based on the above findings, and the experimental results on synthetic networks demonstrate their effectiveness.
- Jan 10 2018 cs.CV arXiv:1801.02722v1We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit. This network performs object localization first, which is then used as a cue to guide the training of the segmentation network. We test the proposed method on a segmentation task of small objects on a clinical dataset of ultrasound images. We show that by jointly learning for detection and segmentation, the proposed network is able to improve the segmentation accuracy compared to only learning for segmentation.
- Suppose a database containing $M$ records is replicated across $N$ servers, and a user wants to privately retrieve one record by accessing the servers such that identity of the retrieved record is secret against any up to $T$ servers. A scheme designed for this purpose is called a $T$-private information retrieval ($T$-PIR) scheme. Three indexes are concerned for PIR schemes: (1)rate, indicating the amount of retrieved information per unit of downloaded data. The highest achievable rate is characterized by the capacity; (2) sub-packetization, reflexing the implementation complexity for linear schemes; (3) field size. We consider linear schemes over a finite field. In this paper, a general $T$-PIR scheme simultaneously attaining the optimality of almost all of the three indexes is presented. Specifically, we design a linear capacity-achieving $T$-PIR scheme with sub-packetization $\!dn^{M-1}\!$ over a finite field $\mathbb{F}_q$, $q\geq N$. The sub-packetization $\!dn^{M-1}\!$, where $\!d\!=\!{\rm gcd}(N,T)\!$ and $\!n\!=\!N/d$, has been proved to be optimal in our previous work. The field size of all existing capacity-achieving $T$-PIR schemes must be larger than $Nt^{M-2}$ where $t=T/d$, while our scheme reduces the field size by an exponential factor.
- This paper employs deep learning in detecting the traffic accident from social media data. First, we thoroughly investigate the 1-year over 3 million tweet contents in two metropolitan areas: Northern Virginia and New York City. Our results show that paired tokens can capture the association rules inherent in the accident-related tweets and further increase the accuracy of the traffic accident detection. Second, two deep learning methods: Deep Belief Network (DBN) and Long Short-Term Memory (LSTM) are investigated and implemented on the extracted token. Results show that DBN can obtain an overall accuracy of 85% with about 44 individual token features and 17 paired token features. The classification results from DBN outperform those of Support Vector Machines (SVMs) and supervised Latent Dirichlet allocation (sLDA). Finally, to validate this study, we compare the accident-related tweets with both the traffic accident log on freeways and traffic data on local roads from 15,000 loop detectors. It is found that nearly 66% of the accident-related tweets can be located by the accident log and more than 80% of them can be tied to nearby abnormal traffic data. Several important issues of using Twitter to detect traffic accidents have been brought up by the comparison including the location and time bias, as well as the characteristics of influential users and hashtags.
- The vast majority of real-world networks are scale-free, loopy, and sparse, with a power-law degree distribution and a constant average degree. In this paper, we study first-order consensus dynamics in binary scale-free networks, where vertices are subject to white noise. We focus on the coherence of networks characterized in terms of the $H_2$-norm, which quantifies how closely agents track the consensus value. We first provide a lower bound of coherence of a network in terms of its average degree, which is independent of the network order. We then study the coherence of some sparse, scale-free real-world networks, which approaches a constant. We also study numerically the coherence of BarabÃ¡si-Albert networks and high-dimensional random Apollonian networks, which also converges to a constant when the networks grow. Finally, based on the connection of coherence and the Kirchhoff index, we study analytically the coherence of two deterministically-growing sparse networks and obtain the exact expressions, which tend to small constants. Our results indicate that the effect of noise on the consensus dynamics in power-law networks is negligible. We argue that scale-free topology, together with loopy structure, is responsible for the strong robustness with respect to noisy consensus dynamics in power-law networks.
- Dec 29 2017 cs.CV arXiv:1712.09531v1Although many methods perform well in single camera tracking, multi-camera tracking remains a challenging problem with less attention. DukeMTMC is a large-scale, well-annotated multi-camera tracking benchmark which makes great progress in this field. This report is dedicated to briefly introduce our method on DukeMTMC and show that simple hierarchical clustering with well-trained person re-identification features can get good results on this dataset.
- Dec 29 2017 cs.GT arXiv:1712.09846v2Crowdsourcing has emerged as a paradigm for leveraging human intelligence and activity to solve a wide range of tasks. However, strategic workers will find enticement in their self-interest to free-ride and attack in a crowdsourcing contest dilemma game. Hence, incentive mechanisms are of great importance to overcome the inefficiency of the socially undesirable equilibrium. Existing incentive mechanisms are not effective in providing incentives for cooperation in crowdsourcing competitions due to the following features: heterogeneous workers compete against each other in a crowdsourcing platform with imperfect monitoring. In this paper, we take these features into consideration, and develop a novel game-theoretic design of rating protocols, which integrates binary rating labels with differential pricing to maximize the requester's utility, by extorting selfish workers and enforcing cooperation among them. By quantifying necessary and sufficient conditions for the sustainable social norm, we formulate the problem of maximizing the revenue of the requester among all sustainable rating protocols, provide design guidelines for optimal rating protocols, and design a low-complexity algorithm to select optimal design parameters which are related to differential punishments and pricing schemes. Simulation results demonstrate how intrinsic parameters impact on design parameters, as well as the performance gain of the proposed rating protocol.
- Dec 29 2017 cs.GT arXiv:1712.09848v1Despite the increasing popularity and successful examples of crowdsourcing, its openness overshadows important episodes when elaborate sabotage derailed or severely hindered collective efforts. A service exchange dilemma arises when non-cooperation among self-interested users, and zero social welfare is obtained at myopic equilibrium. Traditional rating protocols are not effective to overcome the inefficiency of the socially undesirable equilibrium due to specific features of crowdsourcing: a large number of anonymous users having asymmetric service requirements, different service capabilities, and dynamically joining/leaving a crowdsourcing platform with imperfect monitoring. In this paper, we develop the first game-theoretic design of the two-sided rating protocol to stimulate cooperation among self-interested users, which consists of a recommended strategy and a rating update rule. The recommended strategy recommends a desirable behavior from three predefined plans according to intrinsic parameters, while the rating update rule involves the update of ratings of both users, and uses differential punishments that punish users with different ratings differently. By quantifying necessary and sufficient conditions for a sustainable social norm, we formulate the problem of designing an optimal two-sided rating protocol that maximizes the social welfare among all sustainable protocols, provide design guidelines for optimal two-sided rating protocols and a low-complexity algorithm to select optimal design parameters in an alternate manner. Finally, illustrative results show the validity and effectiveness of our proposed protocol designed for service exchange dilemma in crowdsourcing.
- Dec 27 2017 cs.GT arXiv:1712.08807v1In this paper, we study the incentive mechanism design for real-time data aggregation, which holds a large spectrum of crowdsensing applications. Despite extensive studies on static incentive mechanisms, none of these are applicable to real-time data aggregation due to their incapability of maintaining PUs' long-term participation. We emphasize that, to maintain PUs' long-term participation, it is of significant importance to protect their privacy as well as to provide them a desirable cumulative compensation. Thus motivated, in this paper, we propose LEPA, an efficient incentive mechanism to stimulate long-term participation in real-time data aggregation. Specifically, we allow PUs to preserve their privacy by reporting noisy data, the impact of which on the aggregation accuracy is quantified with proper privacy and accuracy measures. Then, we provide a framework that jointly optimizes the incentive schemes in different time slots to ensure desirable cumulative compensation for PUs and thereby prevent PUs from leaving the system halfway. Considering PUs' strategic behaviors and combinatorial nature of the sensing tasks, we propose a computationally efficient on-line auction with close-to-optimal performance in presence of NP-hardness of winner user selection. We further show that the proposed on-line auction satisfies desirable properties of truthfulness and individual rationality. The performance of LEPA is validated by both theoretical analysis and extensive simulations.
- Dec 27 2017 cs.CV arXiv:1712.09300v1Zero-Shot Learning (ZSL) is typically achieved by resorting to a class semantic embedding space to transfer the knowledge from the seen classes to unseen ones. Capturing the common semantic characteristics between the visual modality and the class semantic modality (e.g., attributes or word vector) is a key to the success of ZSL. In this paper, we present a novel approach called Latent Space Encoding (LSE) for ZSL based on an encoder-decoder framework, which learns a highly effective latent space to well reconstruct both the visual space and the semantic embedding space. For each modality, the encoderdecoder framework jointly maximizes the recoverability of the original space from the latent space and the predictability of the latent space from the original space, thus making the latent space feature-aware. To relate the visual and class semantic modalities together, their features referring to the same concept are enforced to share the same latent codings. In this way, the semantic relations of different modalities are generalized with the latent representations. We also show that the proposed encoder-decoder framework is easily extended to more modalities. Extensive experimental results on four benchmark datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of the proposed approach on several ZSL tasks, including traditional ZSL, generalized ZSL, and zero-shot retrieval (ZSR).
- The hierarchical graphs and SierpiÅ„ski graphs are constructed iteratively, which have the same number of vertices and edges at any iteration, but exhibit quite different structural properties: the hierarchical graphs are non-fractal and small-world, while the SierpiÅ„ski graphs are fractal and "large-world". Both graphs have found broad applications. In this paper, we study consensus problems in hierarchical graphs and SierpiÅ„ski graphs, focusing on three important quantities of consensus problems, that is, convergence speed, delay robustness, and coherence for first-order (and second-order) dynamics, which are, respectively, determined by algebraic connectivity, maximum eigenvalue, and sum of reciprocal (and square of reciprocal) of each nonzero eigenvalue of Laplacian matrix. For both graphs, based on the explicit recursive relation of eigenvalues at two successive iterations, we evaluate the second smallest eigenvalue, as well as the largest eigenvalue, and obtain the closed-form solutions to the sum of reciprocals (and square of reciprocals) of all nonzero eigenvalues. We also compare our obtained results for consensus problems on both graphs and show that they differ in all quantities concerned, which is due to the marked difference of their topological structures.
- Dec 13 2017 cs.DC arXiv:1712.04161v1Distributed software-defined networks (SDN), consisting of multiple inter-connected network domains, each managed by one SDN controller, is an emerging networking architecture that offers balanced centralized control and distributed operations. Under such networking paradigm, most existing works focus on designing sophisticated controller-synchronization strategies to improve joint controller-decision-making for inter-domain routing. However, there is still a lack of fundamental understanding of how the performance of distributed SDN is related to network attributes, thus impossible to justify the necessity of complicated strategies. In this regard, we analyze and quantify the performance enhancement of distributed SDN architectures, influenced by intra-/inter-domain synchronization levels and network structural properties. Based on a generic weighted network model, we establish analytical methods for performance estimation under four synchronization scenarios with increasing synchronization cost. Moreover, two of these synchronization scenarios correspond to extreme cases, i.e., minimum/maximum synchronization, which are, therefore, capable of bounding the performance of distributed SDN with any given synchronization levels. Our theoretical results reveal how network performance is related to synchronization levels and inter-domain connections, the accuracy of which are confirmed by simulations based on both real and synthetic networks. To the best of our knowledge, this is the first work quantifying the performance of distributed SDN analytically, which provides fundamental guidance for future SDN protocol designs and performance estimation.
- In successive cancellation (SC) polar decoding, an incorrect estimate of any prior unfrozen bit may bring about severe error propagation in the following decoding, thus it is desirable to find out and correct an error as early as possible. In this paper, we first construct a critical set $S$ of unfrozen bits, which with high probability (typically $>99\%$) includes the bit where the first error happens. Then we develop a progressive multi-level bit-flipping decoding algorithm to correct multiple errors over the multiple-layer critical sets each of which is constructed using the remaining undecoded subtree associated with the previous layer. The \emphlevel in fact indicates the number of \emphindependent errors that could be corrected. We show that as the level increases, the block error rate (BLER) performance of the proposed progressive bit flipping decoder competes with the corresponding cyclic redundancy check (CRC) aided successive cancellation list (CA-SCL) decoder, e.g., a level 4 progressive bit-flipping decoder is comparable to the CA-SCL decoder with a list size of $L=32$. Furthermore, the average complexity of the proposed algorithm is much lower than that of a SCL decoder (and is similar to that of SC decoding) at medium to high signal to noise ratio (SNR).
- Consider the problem of private information retrieval (PIR) over a distributed storage system where $M$ records are stored across $N$ servers by using an $[N,K]$ MDS code. For simplicity, this problem is usually referred as the coded-PIR problem. The capacity of coded-PIR with privacy against any individual server was determined by Banawan and Ulukus in 2016, i.e., $\mathcal{C}_{\tiny C-PIR}=(1+\frac{K}{N}+\dots+\frac{K^{M-1}}{N^{M-1}})^{-1}$. They also presented a linear capacity-achieving scheme with sub-packetization $KN^{M}$. In this paper we focus on minimizing the sub-packetization for linear capacity-achieving coded-PIR schemes. We prove that the sub-packetization for all linear capacity-achieving coded-PIR schemes in the nontrivial cases (i.e. $N>K\geq 1$ and $M>1$) must be no less than $Kn^{M-1}$, where $n=N/{\rm gcd}(N,K)$. Moreover, we design a linear capacity-achieving coded-PIR scheme with sub-packetization $Kn^{M-1}$ for all $N>K\geq 1$ and $M>1$. Therefore, $Kn^{M-1}$ is the optimal sub-packetization for linear capacity-achieving coded-PIR schemes.
- Numerical simulation of ultrasonic wave propagation provides an efficient tool for crack identification in structures, while it requires a high resolution and expensive time calculation cost in both time integration and spatial discretization. Wavelet finite element model provides a highorder finite element model and gives a higher accuracy on spatial discretization, B-Spline wavelet interval (BSWI) has been proved to be one of the most commonly used wavelet finite element model with the advantage of getting the same accuracy but with fewer element so that the calculation cost is much lower than traditional finite element method and other high-order element methods. Precise Integration Method provides a higher resolution in time integration and has been proved to be a stable time integration method with a much lower cut-off error for same and even smaller time step. In this paper, a wavelet finite element model combined with precise integration method is presented for the numerical simulation of ultrasonic wave propagation and crack identification in 1D structures. Firstly, the wavelet finite element based on BSWI is constructed for rod and beam structures. Then Precise Integrated Method is introduced with application for the wave propagation in 1D structures. Finally, numerical examples of ultrasonic wave propagation in rod and beam structures are conducted for verification. Moreover, crack identification in both rod and beam structures are studied based on the new model.
- Dec 04 2017 cs.CV arXiv:1712.00433v1We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a location-agnostic module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a location-agnostic module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.6 on VOC2007 test and an mmAP of 32.8 on COCO test-dev with an inference speed of 36.7 milliseconds per image on a Titan X Pascal GPU. With a lower resolution version, we achieve an mAP of 79.5 on VOC2007 with an inference speed of 14.7 milliseconds per image.
- This two-part paper considers strategic topology switching for the second-order multi-agent system under attack. In Part II, we propose a strategy on switching topologies to reveal zero-dynamics attack. We first study the detectability of zero-dynamics attack for the second-order multi-agent sys- tem under switching topology, which has requirements on the switching times and switching topologies. Based on the strategy on switching times proposed in Part I and the strategy on switching topologies proposed in Part II, a decentralized strategic topology-switching algorithm is derived. The primary advantages of the algorithm are: 1) in achieving consensus in the absence of attacks, control protocol does not need velocity measurements and the algorithm has no constraint on the magnitude of coupling strength; 2) in revealing zero-dynamics attack, the algorithm has no constraint on the size of misbehaving-agent set; 3) in revealing zero-dynamics attack, if the Xor graph generated by every two-consecutive topologies has distinct eigenvalues, only one output is enough for the algorithm. Simulation examples are provided to verify the effectiveness of the strategic topology- switching algorithm.
- This two-part paper considers strategic topology switching for the second-order multi-agent system under attack. In Part I, we propose a strategy on switching times that enables the strategic topology-switching algorithm proposed in Part II to reach the second-order consensus in the absence of attacks. The control protocol introduced to the multi-agent system is governed only by the relative positions of agents. Based on the stability of switched linear systems, the strategy on the dwell time of topology-switching signal is derived. The primary advantages of the strategy in achieving the second-order consensus are: 1) the control protocol relies only on relative position measurements, no velocity measurements are needed; 2) the strategy has no constraint on the magnitude of coupling strength. Simulations are provided to verify the effectiveness of strategic topology switching in achieving the second-order consensus.
- Dec 01 2017 cs.CV arXiv:1711.11575v1Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector.
- This paper develops a novel methodology for using symbolic knowledge in deep learning. From first principles, we derive a semantic loss function that bridges between neural output vectors and logical constraints. This loss function captures how close the neural network is to satisfying the constraints on its output. An experimental evaluation shows that our semantic loss function effectively guides the learner to achieve (near-)state-of-the-art results on semi-supervised multi-class classification. Moreover, it significantly increases the ability of the neural network to predict structured objects, such as rankings and paths. These discrete concepts are tremendously difficult to learn, and benefit from a tight integration of deep learning and symbolic reasoning methods.
- Nov 30 2017 cs.CV arXiv:1711.10684v1Road extraction from aerial images has been a hot research topic in the field of remote sensing image analysis. In this letter, a semantic segmentation neural network which combines the strengths of residual learning and U-Net is proposed for road area extraction. The network is built with residual units and has similar architecture to that of U-Net. The benefits of this model is two-fold: first, residual units ease training of deep networks. Second, the rich skip connections within the network could facilitate information propagation, allowing us to design networks with fewer parameters however better performance. We test our network on a public road dataset and compare it with U-Net and other two state of the art deep learning based road extraction methods. The proposed approach outperforms all the comparing methods, which demonstrates its superiority over recently developed state of the arts.
- Nov 28 2017 cs.SI arXiv:1711.09541v1Singular Value Decomposition (SVD) is a popular approach in various network applications, such as link prediction and network parameter characterization. Incremental SVD approaches are proposed to process newly changed nodes and edges in dynamic networks. However, incremental SVD approaches suffer from serious error accumulation inevitably due to approximation on incremental updates. SVD restart is an effective approach to reset the aggregated error, but when to restart SVD for dynamic networks is not addressed in literature. In this paper, we propose TIMERS, Theoretically Instructed Maximum-Error-bounded Restart of SVD, a novel approach which optimally sets the restart time in order to reduce error accumulation in time. Specifically, we monitor the margin between reconstruction loss of incremental updates and the minimum loss in SVD model. To reduce the complexity of monitoring, we theoretically develop a lower bound of SVD minimum loss for dynamic networks and use the bound to replace the minimum loss in monitoring. By setting a maximum tolerated error as a threshold, we can trigger SVD restart automatically when the margin exceeds this threshold.We prove that the time complexity of our method is linear with respect to the number of local dynamic changes, and our method is general across different types of dynamic networks. We conduct extensive experiments on several synthetic and real dynamic networks. The experimental results demonstrate that our proposed method significantly outperforms the existing methods by reducing 27% to 42% in terms of the maximum error for dynamic network reconstruction when fixing the number of restarts. Our method reduces the number of restarts by 25% to 50% when fixing the maximum error tolerated.
- Nov 28 2017 cs.CY arXiv:1711.09723v1Border crossing delays between New York State and Southern Ontario cause problems like enormous economic loss and massive environmental pollutions. In this area, there are three border-crossing ports: Peace Bridge (PB), Rainbow Bridge (RB) and Lewiston-Queenston Bridge (LQ) at Niagara Frontier border. The goals of this paper are to figure out whether the distributions of bi-national wait times for commercial and passenger vehicles are evenly distributed among the three ports and uncover the hidden significant influential factors that result in the possible insufficient utilization. The historical border wait time data from 7:00 to 21:00 between 08/22/2016 and 06/20/2017 are archived, as well as the corresponding temporal and weather data. For each vehicle type towards each direction, a Decision Tree is built to identify the various border delay patterns over the three bridges. We find that for the passenger vehicles to the USA, the convenient connections between the Canada freeways with USA I-190 by LQ and PB may cause these two bridges more congested than RB, especially when it is a holiday in Canada. For the passenger vehicles in the other bound, RB is much more congested than LQ and PB in some cases, and the visitors to Niagara Falls in the USA in summer may be a reason. For the commercial trucks to the USA, the various delay patterns show PB is always more congested than LQ. Hour interval and weekend are the most significant factors appearing in all the four Decision Trees. These Decision Trees can help the authorities to make specific routing suggestions when the corresponding conditions are satisfied.
- Nov 28 2017 cs.CV arXiv:1711.09280v2Depth is one of the keys that make neural networks succeed in the task of large-scale image recognition. The state-of-the-art network architectures usually increase the depths by cascading convolutional layers or building blocks. In this paper, we present an alternative method to increase the depth. Our method is by introducing computation orderings to the channels within convolutional layers or blocks, based on which we gradually compute the outputs in a channel-wise manner. The added orderings not only increase the depths and the learning capacities of the networks without any additional computation costs, but also eliminate the overlap singularities so that the networks are able to converge faster and perform better. Experiments show that the networks based on our method achieve the state-of-the-art performances on CIFAR and ImageNet datasets.
- Nov 23 2017 cs.CV arXiv:1711.08102v2Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks: audio-to-visual, visual-to-audio, audio-to-audio and visual-to-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.
- Nov 23 2017 cs.GR arXiv:1711.08126v2There are increasing real-time live applications in virtual reality, where it plays an important role in capturing and retargetting 3D human pose. But it is still challenging to estimate accurate 3D pose from consumer imaging devices such as depth camera. This paper presents a novel cascaded 3D full-body pose regression method to estimate accurate pose from a single depth image at 100 fps. The key idea is to train cascaded regressors based on Gradient Boosting algorithm from pre-recorded human motion capture database. By incorporating hierarchical kinematics model of human pose into the learning procedure, we can directly estimate accurate 3D joint angles instead of joint positions. The biggest advantage of this model is that the bone length can be preserved during the whole 3D pose estimation procedure, which leads to more effective features and higher pose estimation accuracy. Our method can be used as an initialization procedure when combining with tracking methods. We demonstrate the power of our method on a wide range of synthesized human motion data from CMU mocap database, Human3.6M dataset and real human movements data captured in real time. In our comparison against previous 3D pose estimation methods and commercial system such as Kinect 2017, we achieve the state-of-the-art accuracy.
- Nov 23 2017 cs.CV arXiv:1711.08097v2Video caption refers to generating a descriptive sentence for a specific short video clip automatically, which has achieved remarkable success recently. However, most of the existing methods focus more on visual information while ignoring the synchronized audio cues. We propose three multimodal deep fusion strategies to maximize the benefits of visual-audio resonance information. The first one explores the impact on cross-modalities feature fusion from low to high order. The second establishes the visual-audio short-term dependency by sharing weights of corresponding front-end networks. The third extends the temporal dependency to long-term through sharing multimodal memory across visual and audio modalities. Extensive experiments have validated the effectiveness of our three cross-modalities fusion strategies on two benchmark datasets, including Microsoft Research Video to Text (MSRVTT) and Microsoft Video Description (MSVD). It is worth mentioning that sharing weight can coordinate visual-audio feature fusion effectively and achieve the state-of-art performance on both BELU and METEOR metrics. Furthermore, we first propose a dynamic multimodal feature fusion framework to deal with the part modalities missing case. Experimental results demonstrate that even in the audio absence mode, we can still obtain comparable results with the aid of the additional audio modality inference module.
- Understanding the global optimality in deep learning (DL) has been attracting more and more attention recently. Conventional DL solvers, however, have not been developed intentionally to seek for such global optimality. In this paper we propose a novel approximation algorithm, BPGrad, towards optimizing deep models globally via branch and pruning. Our BPGrad algorithm is based on the assumption of Lipschitz continuity in DL, and as a result it can adaptively determine the step size for current gradient given the history of previous updates, wherein theoretically no smaller steps can achieve the global optimality. We prove that, by repeating such branch-and-pruning procedure, we can locate the global optimality within finite iterations. Empirically an efficient solver based on BPGrad for DL is proposed as well, and it outperforms conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the tasks of object recognition, detection, and segmentation.
- Nov 21 2017 cs.CV arXiv:1711.07319v1The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
- We revisit the classification problem and focus on nonlinear methods for classification on manifolds. For multivariate datasets lying on an embedded nonlinear Riemannian manifold within the higher-dimensional space, our aim is to acquire a classification boundary between the classes with labels. Motivated by the principal flow [Panaretos, Pham and Yao, 2014], a curve that moves along a path of the maximum variation of the data, we introduce the principal boundary. From the classification perspective, the principal boundary is defined as an optimal curve that moves in between the principal flows traced out from two classes of the data, and at any point on the boundary, it maximizes the margin between the two classes. We estimate the boundary in quality with its direction supervised by the two principal flows. We show that the principal boundary yields the usual decision boundary found by the support vector machine, in the sense that locally, the two boundaries coincide. By means of examples, we illustrate how to find, use and interpret the principal boundary.
- By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.
- Nov 20 2017 cs.CV arXiv:1711.06505v2In Taobao, the largest e-commerce platform in China, billions of items are provided and typically displayed with their images. For better user experience and business effectiveness, Click Through Rate (CTR) prediction in online advertising system exploits abundant user historical behaviors to identify whether a user is interested in a candidate ad. Enhancing behavior representations with user behavior images will bring user's visual preference and can greatly help CTR prediction. So we propose to model user preference jointly with user behavior ID features and behavior images. However, comparing with utilizing candidate ad image in CTR prediction which only introduces one image in one sample, training with user behavior images brings tens to hundreds of images in one sample, giving rise to a great challenge in both communication and computation. With the well-known Parameter Server (PS) framework, implementing such model needs to communicate the raw image features, leading to unacceptable communication load. It indicates PS is not suitable for this scenario. In this paper, we propose a novel and efficient distributed machine learning paradigm called Advanced Model Server (AMS). In AMS, the forward/backward process can also happen in the server side, and only high level semantic features with much smaller size need to be sent to workers. AMS thus dramatically reduces the communication load, which enables the arduous joint training process. Based on AMS, the methods of effectively combining the images and ID features are carefully studied, and then we propose a Deep Image CTR Model. Our approach is shown to achieve significant improvements in both online and offline evaluations, and has been deployed in Taobao display advertising system serving the main traffic.
- Nov 17 2017 cs.CL arXiv:1711.06061v1Machine translation is going through a radical revolution, driven by the explosive development of deep learning techniques using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this paper, we consider a special case in machine translation problems, targeting to translate natural language into Structural Query Language (SQL) for data retrieval over relational database. Although generic CNN and RNN learn the grammar structure of SQL when trained with sufficient samples, the accuracy and training efficiency of the model could be dramatically improved, when the translation model is deeply integrated with the grammar rules of SQL. We present a new encoder-decoder framework, with a suite of new approaches, including new semantic features fed into the encoder as well as new grammar-aware states injected into the memory of decoder. These techniques help the neural network focus on understanding semantics of the operations in natural language and save the efforts on SQL grammar learning. The empirical evaluation on real world database and queries show that our approach outperform state-of-the-art solution by a significant margin.
- We study the problem of multiset prediction. The goal of multiset prediction is to train a predictor that maps an input to a multiset consisting of multiple items. Unlike existing problems in supervised learning, such as classification, ranking and sequence generation, there is no known order among items in a target multiset, and each item in the multiset may appear more than once, making this problem extremely challenging. In this paper, we propose a novel multiset loss function by viewing this problem from the perspective of sequential decision making. The proposed multiset loss function is empirically evaluated on two families of datasets, one synthetic and the other real, with varying levels of difficulty, against various baseline loss functions including reinforcement learning, sequence, and aggregated distribution matching loss functions. The experiments reveal the effectiveness of the proposed loss function over the others.
- Nov 16 2017 cs.CL arXiv:1711.05350v1In this paper, we describe an effective convolutional neural network framework for identifying the expert in question answering community. This approach uses the convolutional neural network and combines user feature representations with question feature representations to compute scores that the user who gets the highest score is the expert on this question. Unlike prior work, this method does not measure expert based on measure answer content quality to identify the expert but only require question sentence and user embedding feature to identify the expert. Remarkably, Our model can be applied to different languages and different domains. The proposed framework is trained on two datasets, The first dataset is Stack Overflow and the second one is Zhihu. The Top-1 accuracy results of our experiments show that our framework outperforms the best baseline framework for expert identification.
- Humans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual attention, and is used for multi-label image classification on a novel multiset task, demonstrating that it achieves high precision and recall while localizing objects with its attention. Unlike conventional multi-label image classification models, the model supports multiset prediction due to a reinforcement-learning based training process that allows for arbitrary label permutation and multiple instances per label.
- Nov 15 2017 cs.CV arXiv:1711.04451v1It is very attractive to formulate vision in terms of pattern theory \citeMumford2010pattern, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and can easily be fooled by adding occluding objects. It is natural to wonder whether by better understanding deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal representations of a deep network using vehicle images from the PASCAL3D+ dataset. We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles. To analyze this we annotate these vehicles by their semantic parts to create a new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised part detectors. We show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines (SVM). We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like SVM and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called VehicleOcclusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.
- Automatic Term Extraction deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that no existing methods can consistently outperforms others in all domains. This work adopts a different strategy towards this problem as we propose to 'enhance' existing ATE methods instead of 'replace' them. We introduce SemRe-Rank, a generic method based on the concept of incorporating semantic relatedness - an often overlooked venue - into an existing ATE method to further improve its performance. SemRe-Rank applies a personalized PageRank process to a semantic relatedness graph of words to compute their 'semantic importance' scores, which are then used to revise the scores of term candidates computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all methods and across all datasets. The best performing variants of SemRe-Rank have achieved, on some datasets, an improvement of 0.15 (on a scale of 0 ~ 1.0) in terms of the precision in the top ranked K term candidates, and an improvement of 0.28 in terms of overall F1.
- Nov 08 2017 cs.DC arXiv:1711.02659v1The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiment's framework and is used to serialize the experiment's data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform their end-stage analysis, reading from intermediate files they generate from experiment data. RIO is thus incredibly flexible: it must serve as a file format for archival (optimized for space) and for working data (optimized for read speed). To date, most of the technical work has focused on improving the former use case. We present work designed to help improve RIO for analysis. We analyze the real-world impact of LZ4 to decrease decompression times (and the corresponding cost in disk space). We introduce new APIs that read RIO data in bulk, removing the per-event overhead of a C++ function call. We compare the performance with the existing RIO APIs for simple structure data and show how this can be complimentary with efforts to improve the parallelism of the RIO stack.
- A central task in the field of quantum computing is to find applications where quantum computer could provide exponential speedup over any classical computer. Machine learning represents an important field with broad applications where quantum computer may offer significant speedup. Several quantum algorithms for discriminative machine learning have been found based on efficient solving of linear algebraic problems, with potential exponential speedup in runtime under the assumption of effective input from a quantum random access memory. In machine learning, generative models represent another large class which is widely used for both supervised and unsupervised learning. Here, we propose an efficient quantum algorithm for machine learning based on a quantum generative model. We prove that our proposed model is exponentially more powerful to represent probability distributions compared with classical generative models and has exponential speedup in training and inference at least for some instances under a reasonable assumption in computational complexity theory. Our result opens a new direction for quantum machine learning and offers a remarkable example in which a quantum algorithm shows exponential improvement over any classical algorithm in an important application field.
- Nov 07 2017 cs.CV arXiv:1711.01991v2Convolutional neural networks have demonstrated their powerful ability on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. I.e., clean images, with imperceptible perturbations added, can easily cause convolutional neural networks to fail. In this paper, we propose to utilize randomization to mitigate adversarial effects. Specifically, we use two randomization operations: random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner. Extensive experiments demonstrate that the proposed randomization method is very effective at defending against both single-step and iterative attacks. Our method also enjoys the following advantages: 1) no additional training or fine-tuning, 2) very few additional computations, 3) compatible with other adversarial defense methods. By combining the proposed randomization method with an adversarially trained model, it achieves a normalized score of 0.924 (ranked No.2 among 107 defense teams) in the NIPS 2017 adversarial examples defense challenge, which is far better than using adversarial training alone with a normalized score of 0.773 (ranked No.56). The code is public available at https://github.com/cihangxie/NIPS2017_adv_challenge_defense.
- Nov 07 2017 cs.NI arXiv:1711.01683v1Mobile Edge Computing (MEC) as an emerging paradigm utilizing cloudlet or fog nodes to extend remote cloud computing to the edge of the network, is foreseen as a key technology towards next generation wireless networks. By offloading computation intensive tasks from resource constrained mobile devices to fog nodes or the remote cloud, the energy of mobile devices can be saved and the computation capability can be enhanced. For fog nodes, they can rent the resource rich remote cloud to help them process incoming tasks from mobile devices. In this architecture, the benefit of short computation and computation delay of mobile devices can be fully exploited. However, existing studies mostly assume fog nodes possess unlimited computing capacity, which is not practical, especially when fog nodes are also energy constrained mobile devices. To provide incentive of fog nodes and reduce the computation cost of mobile devices, we provide a cost effective offloading scheme in mobile edge computing with the cooperation between fog nodes and the remote cloud with task dependency constraint. The mobile devices have limited budget and have to determine which task should be computed locally or sent to the fog. To address this issue, we first formulate the offloading problem as a task finish time inimization problem with given budgets of mobile devices, which is NP-hard. We then devise two more algorithms to study the network performance. Simulation results show that the proposed greedy algorithm can achieve the near optimal performance. On average, the Brute Force method and the greedy algorithm outperform the simulated annealing algorithm by about 28.13% on the application finish time.
- Nov 06 2017 cs.LG arXiv:1711.00939v1The interpolation, prediction, and feature analysis of fine-gained air quality are three important topics in the area of urban air computing. The solutions to these topics can provide extremely useful information to support air pollution control, and consequently generate great societal and technical impacts. Most of the existing work solves the three problems separately by different models. In this paper, we propose a general and effective approach to solve the three problems in one model called the Deep Air Learning (DAL). The main idea of DAL lies in embedding feature selection and semi-supervised learning in different layers of the deep learning network. The proposed approach utilizes the information pertaining to the unlabeled spatio-temporal data to improve the performance of the interpolation and the prediction, and performs feature selection and association analysis to reveal the main relevant features to the variation of the air quality. We evaluate our approach with extensive experiments based on real data sources obtained in Beijing, China. Experiments show that DAL is superior to the peer models from the recent literature when solving the topics of interpolation, prediction and feature analysis of fine-gained air quality.
- Nov 06 2017 cs.DB arXiv:1711.01046v1Elasticity is highly desirable for stream processing systems to guarantee low latency against workload dynamics, such as surges in data arrival rate and fluctuations in data distribution. Existing systems achieve elasticity following a resource-centric approach that uses dynamic key partitioning across the parallel instances, i.e. executors, to balance the workload and scale operators. However, such operator-level key repartitioning needs global synchronization and prohibits rapid elasticity. To address this problem, we propose an executor-centric approach, whose core idea is to avoid operator-level key repartitioning while implementing each executor as the building block of elasticity. Following this new approach, we design the Elasticutor framework with two level of optimizations: i) a novel implementation of executors, i.e., elastic executors, that perform elastic multi-core execution via efficient intra-executor load balancing and executor scaling and ii) a global model-based scheduler that dynamically allocates CPU cores to executors based on the instantaneous workloads. We implemented a prototype of Elasticutor and conducted extensive experiments. Our results show that Elasticutor doubles the throughput and achieves an average processing latency up to 2 orders of magnitude lower than previous methods, for a dynamic workload of real-world applications.
- With the demand of high data rate and low latency in fifth generation (5G), deep neural network decoder (NND) has become a promising candidate due to its capability of one-shot decoding and parallel computing. In this paper, three types of NND, i.e., multi-layer perceptron (MLP), convolution neural network (CNN) and recurrent neural network (RNN), are proposed with the same parameter magnitude. The performance of these deep neural networks are evaluated through extensive simulation. Numerical results show that RNN has the best decoding performance, yet at the price of the highest computational overhead. Moreover, we find there exists a saturation length for each type of neural network, which is caused by their restricted learning abilities.
- Incentive mechanism plays a critical role in privacy-aware crowdsensing. Most previous studies on co-design of incentive mechanism and privacy preservation assume a trustworthy fusion center (FC). Very recent work has taken steps to relax the assumption on trustworthy FC and allows participatory users (PUs) to add well calibrated noise to their raw sensing data before reporting them, whereas the focus is on the equilibrium behavior of data subjects with binary data. Making a paradigm shift, this paper aim to quantify the privacy compensation for continuous data sensing while allowing FC to directly control PUs. There are two conflicting objectives in such scenario: FC desires better quality data in order to achieve higher aggregation accuracy whereas PUs prefer adding larger noise for higher privacy-preserving levels (PPLs). To achieve a good balance therein, we design an efficient incentive mechanism to REconcile FC's Aggregation accuracy and individual PU's data Privacy (REAP). Specifically, we adopt the celebrated notion of differential privacy to measure PUs' PPLs and quantify their impacts on FC's aggregation accuracy. Then, appealing to Contract Theory, we design an incentive mechanism to maximize FC's aggregation accuracy under a given budget. The proposed incentive mechanism offers different contracts to PUs with different privacy preferences, by which FC can directly control PUs. It can further overcome the information asymmetry, i.e., the FC typically does not know each PU's precise privacy preference. We derive closed-form solutions for the optimal contracts in both complete information and incomplete information scenarios. Further, the results are generalized to the continuous case where PUs' privacy preferences take values in a continuous domain. Extensive simulations are provided to validate the feasibility and advantages of our proposed incentive mechanism.
- Nov 02 2017 cs.CV arXiv:1711.00139v1We propose an attention mechanism for 3D medical image segmentation. The method, named segmentation-by-detection, is a cascade of a detection module followed by a segmentation module. The detection module enables a region of interest to come to attention and produces a set of object region candidates which are further used as an attention model. Rather than dealing with the entire volume, the segmentation module distills the information from the potential region. This scheme is an efficient solution for volumetric data as it reduces the influence of the surrounding noise which is especially important for medical data with low signal-to-noise ratio. Experimental results on 3D ultrasound data of the femoral head shows superiority of the proposed method when compared with a standard fully convolutional network like the U-Net.
- Nov 02 2017 cs.AI arXiv:1711.00054v1Border crossing delays cause problems like huge economics loss and heavy environmental pollutions. To understand more about the nature of border crossing delay, this study applies a dictionary-based compression algorithm to process the historical Niagara Frontier border wait times data. It can identify the abnormal spatial-temporal patterns for both passenger vehicles and trucks at three bridges connecting US and Canada. Furthermore, it provides a quantitate anomaly score to rank the wait times patterns across the three bridges for each vehicle type and each direction. By analyzing the top three most abnormal patterns, we find that there are at least two factors contributing the anomaly of the patterns. The weekends and holidays may cause unusual heave congestions at the three bridges at the same time, and the freight transportation demand may be uneven from Canada to the USA at Peace Bridge and Lewiston-Queenston Bridge, which may lead to a high anomaly score. By calculating the frequency of the top 5% abnormal patterns by hour of the day, the results show that for cars from the USA to Canada, the frequency of abnormal waiting time patterns is the highest during noon while for trucks in the same direction, it is the highest during the afternoon peak hours. For Canada to US direction, the frequency of abnormal border wait time patterns for both cars and trucks reaches to the peak during the afternoon. The analysis of abnormal spatial-temporal wait times patterns is promising to improve the border crossing management
- Nov 02 2017 cs.CR arXiv:1711.00232v1Wearable devices enable users to collect health data and share them with healthcare providers for improved health service. Since health data contain privacy-sensitive information, unprotected data release system may result in privacy leakage problem. Most of the existing work use differential privacy for private data release. However, they have limitations in healthcare scenarios because they do not consider the unique features of health data being collected from wearables, such as continuous real-time collection and pattern preservation. In this paper, we propose Re-DPoctor, a real-time health data releasing scheme with $w$-day differential privacy where the privacy of health data collected from any consecutive $w$ days is preserved. We improve utility by using a specially-designed partition algorithm to protect the health data patterns. Meanwhile, we improve privacy preservation by applying newly proposed adaptive sampling technique and budget allocation method. We prove that Re-DPoctor satisfies $w$-day differential privacy. Experiments on real health data demonstrate that our method achieves better utility with strong privacy guarantee than existing state-of-the-art methods.
- Suppose a database containing $M$ records is replicated across $N$ servers, and a user wants to privately retrieve one record by accessing the servers such that identity of the retrieved record is secret against any up to $T$ servers. A scheme designed for this purpose is called a private information retrieval (PIR) scheme. In practice, capacity-achieving and small sub-packetization are both desired for PIR schemes, because the former implies the highest download rate and the latter usually means simple realization. For general values of $N,T,M$, the only known capacity-achieving PIR scheme was designed by Sun and Jafar in 2016 with sub-packetization $N^M$. In this paper, we design a linear capacity-achieving PIR scheme with much smaller sub-packetization $dn^{M-1}$, where $d={\rm gcd}(N,T)$ and $n=N/d$. Furthermore, we prove that for any linear capacity-achieving PIR scheme it must have sub-packetization no less than $dn^{M-1}$, implying our scheme has the optimal sub-packetization. Moreover, comparing with Sun and Jafar's scheme, our scheme reduces the field size by a factor of $\frac{1}{Nd^{M-2}}$.
- Nov 01 2017 cs.NI arXiv:1710.11376v2In-network caching is recognized as an effective solution to offload content servers and the network. A cache service provider (SP) always has incentives to better utilize its cache resources by taking into account diverse roles that content providers (CPs) play, e.g., their business models, traffic characteristics, preferences. In this paper, we study the cache resource allocation problem in a Multi-Cache Multi-CP environment. We propose a cache partitioning approach, where each cache can be partitioned into slices with each slice dedicated to a content provider. We propose a content-oblivious request routing algorithm, to be used by individual caches, that optimizes the routing strategy for each CP. We associate with each content provider a utility that is a function of its content delivery performance, and formulate an optimization problem with the objective to maximize the sum of utilities over all content providers. We establish the biconvexity of the problem, and develop decentralized (online) algorithms based on convexity of the subproblem. The proposed model is further extended to bandwidth-constrained and minimum-delay scenarios, for which we prove fundamental properties, and develop efficient algorithms. Finally, we present numerical results to show the efficacy of our mechanism and the convergence of our algorithms.
- Oct 27 2017 cs.CV arXiv:1710.09505v1While deeper and wider neural networks are actively pushing the performance limits of various computer vision and machine learning tasks, they often require large sets of labeled data for effective training and suffer from extremely high computational complexity. In this paper, we will develop a new framework for training deep neural networks on datasets with limited labeled samples using cross-network knowledge projection which is able to improve the network performance while reducing the overall computational complexity significantly. Specifically, a large pre-trained teacher network is used to observe samples from the training data. A projection matrix is learned to project this teacher-level knowledge and its visual representations from an intermediate layer of the teacher network to an intermediate layer of a thinner and faster student network to guide and regulate its training process. Both the intermediate layers from the teacher network and the injection layers from the student network are adaptively selected during training by evaluating a joint loss function in an iterative manner. This knowledge projection framework allows us to use crucial knowledge learned by large networks to guide the training of thinner student networks, avoiding over-fitting, achieving better network performance, and significantly reducing the complexity. Extensive experimental results on benchmark datasets have demonstrated that our proposed knowledge projection approach outperforms existing methods, improving accuracy by up to 4% while reducing network complexity by 4 to 10 times, which is very attractive for practical applications of deep neural networks.
- Oct 25 2017 cs.LG arXiv:1710.08496v1Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if it is hard to approximate the Hessian well and efficiently. As far as we know, there is no effective way to handle this problem. In this paper, we resort to Nesterov's acceleration technique to improve the convergence performance of a class of second-order methods called approximate Newton. We give a theoretical analysis that Nesterov's acceleration technique can improve the convergence performance for approximate Newton just like for first-order methods. We accordingly propose an accelerated regularized sub-sampled Newton. Our accelerated algorithm performs much better than the original regularized sub-sampled Newton in experiments, which validates our theory empirically. Besides, the accelerated regularized sub-sampled Newton has good performance comparable to or even better than classical algorithms.
- Oct 24 2017 cs.CL arXiv:1710.07770v1In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation and text coherence modeling simultaneously. In particular, the model captures the interactions between sentences by computing the similarities of their distributional representations. Further, it can be easily trained in an end-to-end fashion. The proposed model is evaluated on a standard Sentence Ordering task. The experimental results demonstrate its effectiveness and promise in coherence assessment showing a significant improvement over the state-of-the-art by a wide margin.
- Oct 19 2017 cs.CV arXiv:1710.06555v1Person Re-identification (ReID) is to identify the same person across different cameras. It is a challenging task due to the large variations in person pose, occlusion, background clutter, etc How to extract powerful features is a fundamental problem in ReID and is still an open problem today. In this paper, we design a Multi-Scale Context-Aware Network (MSCAN) to learn powerful features over full body and body parts, which can well capture the local context knowledge by stacking multi-scale convolutions in each layer. Moreover, instead of using predefined rigid parts, we propose to learn and localize deformable pedestrian parts using Spatial Transformer Networks (STN) with novel spatial constraints. The learned body parts can release some difficulties, eg pose variations and background clutters, in part-based representation. Finally, we integrate the representation learning processes of full body and body parts into a unified framework for person ReID through multi-class person identification tasks. Extensive evaluations on current challenging large-scale person ReID datasets, including the image-based Market1501, CUHK03 and sequence-based MARS datasets, show that the proposed method achieves the state-of-the-art results.