results for au:Zhou_Y in:cs

- Jun 23 2017 cs.CV arXiv:1706.07346v1Automatic segmentation of an organ and its cystic region is a prerequisite of computer-aided diagnosis. In this paper, we focus on pancreatic cyst segmentation in abdominal CT scan. This task is important and very useful in clinical practice yet challenging due to the low contrast in boundary, the variability in location, shape and the different stages of the pancreatic cancer. Inspired by the high relevance between the location of a pancreas and its cystic region, we introduce extra deep supervision into the segmentation network, so that cyst segmentation can be improved with the help of relatively easier pancreas segmentation. Under a reasonable transformation function, our approach can be factorized into two stages, and each stage can be efficiently optimized via gradient back-propagation throughout the deep networks. We collect a new dataset with 131 pathological samples, which, to the best of our knowledge, is the largest set for pancreatic cyst segmentation. Without human assistance, our approach reports a 63.44% average accuracy, measured by the Dice-Sørensen coefficient (DSC), which is higher than the number (60.46%) without deep supervision.
- Jun 20 2017 cs.DL arXiv:1706.06087v1Precision medicine and health requires the characterization and phenotyping of biological systems and patient datasets using a variety of data formats. This scenario mandates the centralization of various tools and resources in a unified platform to render them Findable, Accessible, Interoperable, and Reusable (FAIR Principles). Leveraging these principles, Aztec provides the scientific community with a new platform that promotes a long-term, sustainable ecosystem of biomedical research software. Aztec is available at https://aztec.bio and its source code is hosted at https://github.com/BD2K-Aztec.
- The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.
- Jun 06 2017 cs.LG arXiv:1706.01026v1We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $n$-armed bandit game. This problem has a wide range of applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our goal is to develop a PAC algorithm, which, with probability at least $1-\delta$, identifies a set of $K$ arms with the aggregate regret at most $\epsilon$. The notion of aggregate regret for multiple-arm identification was first introduced in \citeZhou:14 , which is defined as the difference of the averaged expected rewards between the selected set of arms and the best $K$ arms. In contrast to \citeZhou:14 that only provides instance-independent sample complexity, we introduce a new hardness parameter for characterizing the difficulty of any given instance. We further develop two algorithms and establish the corresponding sample complexity in terms of this hardness parameter. The derived sample complexity can be significantly smaller than state-of-the-art results for a large class of instances and matches the instance-independent lower bound upto a $\log(\epsilon^{-1})$ factor in the worst case. We also prove a lower bound result showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.
- May 26 2017 cs.CL arXiv:1705.08947v1We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets. We show that a single neural TTS system can learn hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality synthesis and preserving the speaker identities almost perfectly.
- May 23 2017 cs.AI arXiv:1705.07339v1The Maximum Balanced Biclique Problem is a well-known graph model with relevant applications in diverse domains. This paper introduces a novel algorithm, which combines an effective constraint-based tabu search procedure and two dedicated graph reduction techniques. We verify the effectiveness of the algorithm on 30 classical random benchmark graphs and 25 very large real-life sparse graphs from the popular Koblenz Network Collection (KONECT). The results show that the algorithm improves the best-known results (new lower bounds) for 10 classical benchmarks and obtains the optimal solutions for 14 KONECT instances.
- May 23 2017 cs.DM arXiv:1705.07338v1The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NP-hard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP. Firstly, we introduce an Upper Bound Propagation procedure to pre-compute an upper bound involving each vertex. Then we extend an existing branch-and-bound algorithm by integrating the pre-computed upper bounds. We also present a set of new valid inequalities induced from the upper bounds to tighten an existing mathematical formulation for MBBP. Lastly, we investigate another exact algorithm scheme which enumerates a subset of balanced bicliques based on our upper bounds. Experiments show that compared to existing approaches, the proposed algorithms and formulations are more efficient in solving a set of random graphs and large real-life instances.
- May 23 2017 cs.CL arXiv:1705.07371v1In this paper, we reformulated the spell correction problem as a machine translation task under the encoder-decoder framework. This reformulation enabled us to use a single model for solving the problem that is traditionally formulated as learning a language model and an error model. This model employs multi-layer recurrent neural networks as an encoder and a decoder. We demonstrate the effectiveness of this model using an internal dataset, where the training data is automatically obtained from user logs. The model offers competitive performance as compared to the state of the art methods but does not require any feature engineering nor hand tuning between models.
- May 16 2017 cs.LG arXiv:1705.04925v1In many modern machine learning applications, structures of underlying mathematical models often yield nonconvex optimization problems. Due to the intractability of nonconvexity, there is a rising need to develop efficient methods for solving general nonconvex problems with certain performance guarantee. In this work, we investigate the accelerated proximal gradient method for nonconvex programming (APGnc). The method compares between a usual proximal gradient step and a linear extrapolation step, and accepts the one that has a lower function value to achieve a monotonic decrease. In specific, under a general nonsmooth and nonconvex setting, we provide a rigorous argument to show that the limit points of the sequence generated by APGnc are critical points of the objective function. Then, by exploiting the Kurdyka-Łojasiewicz (\KL) property for a broad class of functions, we establish the linear and sub-linear convergence rates of the function value sequence generated by APGnc. We further propose a stochastic variance reduced APGnc (SVRG-APGnc), and establish its linear convergence under a special case of the \KL property. We also extend the analysis to the inexact version of these methods and develop an adaptive momentum strategy that improves the numerical performance.
- May 12 2017 cs.AI arXiv:1705.04119v1Critical node problems involve identifying a subset of critical nodes from an undirected graph whose removal results in optimizing a pre-defined measure over the residual graph. As useful models for a variety of practical applications, these problems are computational challenging. In this paper, we study the classic critical node problem (CNP) and introduce an effective memetic algorithm for solving CNP. The proposed algorithm combines a double backbone-based crossover operator (to generate promising offspring solutions), a component-based neighborhood search procedure (to find high-quality local optima) and a rank-based pool updating strategy (to guarantee a healthy population). Specially, the component-based neighborhood search integrates two key techniques, i.e., two-phase node exchange strategy and node weighting scheme. The double backbone-based crossover extends the idea of general backbone-based crossovers. Extensive evaluations on 42 synthetic and real-world benchmark instances show that the proposed algorithm discovers 21 new upper bounds and matches 18 previous best-known upper bounds. We also demonstrate the relevance of our algorithm for effectively solving a variant of the classic CNP, called the cardinality-constrained critical node problem. Finally, we investigate the usefulness of each key algorithmic component.
- Apr 27 2017 cs.AI arXiv:1704.07950v1In this extended abstract, we propose Structured Production Systems (SPS), which extend traditional production systems with well-formed syntactic structures. Due to the richness of structures, structured production systems significantly enhance the expressive power as well as the flexibility of production systems, for instance, to handle uncertainty. We show that different rule application strategies can be reduced into the basic one by utilizing structures. Also, many fundamental approaches in computer science, including automata, grammar and logic, can be captured by structured production systems.
- Persistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting long-term stealthy network activities in the background of normal-user activities, such as stealthy DDoS attack, stealthy network scan, or faked network trend, which cannot be detected by traditional flow cardinality measurement. With big network data, one challenge is to measure the persistent spreads of a massive number of flows without incurring too much memory overhead as such measurement may be performed at the line speed by network processors with fast but small on-chip memory. We propose a highly compact Virtual Intersection HyperLogLog (VI-HLL) architecture for this purpose. It achieves far better memory efficiency than the best prior work of V-Bitmap, and in the meantime drastically extends the measurement range. Theoretical analysis and extensive experiments demonstrate that VI-HLL provides good measurement accuracy even in very tight memory space of less than 1 bit per flow.
- Apr 11 2017 cs.IR arXiv:1704.02552v1Using only implicit data, many recommender systems fail in general to provide a precise set of recommendations to users with limited interaction history. This issue is regarded as the "Cold Start" problem and is typically resolved by switching to content-based approaches where extra costly information is required. In this paper, we use a dimensionality reduction algorithm, Word2Vec (W2V), originally applied in Natural Language Processing problems under the framework of Collaborative Filtering (CF) to tackle the "Cold Start" problem using only implicit data. This combined method is named Embedded Collaborative Filtering (ECF). An experiment is conducted to determine the performance of ECF on two different implicit data sets. We show that the ECF approach outperforms other popular and state-of-the-art approaches in "Cold Start" scenarios.
- Mar 28 2017 cs.CV arXiv:1703.08603v2It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, generally exist for deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations. Based on this idea, we propose a novel algorithm named Dense Adversary Generation (DAG), which generates a large family of adversarial examples, and applies to a wide range of state-of-the-art deep networks for segmentation and detection. We also find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transferability across networks with the same architecture is more significant than in other cases. Besides, summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of black-box adversarial attack.
- In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate $O\left(\frac{1}{\varepsilon^2}\right)$ improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate $O\left(\frac{1}{\varepsilon^4}\right)$.
- Data deduplication is able to effectively identify and eliminate redundant data and only maintain a single copy of files and chunks. Hence, it is widely used in cloud storage systems to save storage space and network bandwidth. However, the occurrence of deduplication can be easily identified by monitoring and analyzing network traffic, which leads to the risk of user privacy leakage. The attacker can carry out a very dangerous side channel attack, i.e., learn-the-remaining-information (LRI) attack, to reveal users' privacy information by exploiting the side channel of network traffic in deduplication. Existing work addresses the LRI attack at the cost of the high bandwidth efficiency of deduplication. In order to address this problem, we propose a simple yet effective scheme, called randomized redundant chunk scheme (RRCS), to significantly mitigate the risk of the LRI attack while maintaining the high bandwidth efficiency of deduplication. The basic idea behind RRCS is to add randomized redundant chunks to mix up the real deduplication states of files used for the LRI attack, which effectively obfuscates the view of the attacker, who attempts to exploit the side channel of network traffic for the LRI attack. Our security analysis shows that RRCS could significantly mitigate the risk of the LRI attack. We implement the RRCS prototype and evaluate it by using three large-scale real-world datasets. Experimental results demonstrate the efficiency and efficacy of RRCS.
- Simulation-based training (SBT) is gaining popularity as a low-cost and convenient training technique in a vast range of applications. However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided automatically in real-time during training. It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in real-time. In this paper, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via back-propagation. We empirically show that the proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for real-time feedback generation in SBT.
- Maximum rank-distance (MRD) codes are extremal codes in the space of $m\times n$ matrices over a finite field, equipped with the rank metric. Up to generalizations, the classical examples of such codes were constructed in the 1970s and are today known as Gabidulin codes. Motivated by several recent approaches to construct MRD codes that are inequivalent to Gabidulin codes, we study the equivalence issue for Gabidulin codes themselves. This shows in particular that the family of Gabidulin codes already contains a huge subset of MRD codes that are pairwise inequivalent, provided that $2\le m\le n-2$.
- Feb 09 2017 cs.CV arXiv:1702.02512v1This paper presents a robust and efficient semi-dense visual odometry solution for RGB-D cameras. The core of our method is a 2D-3D ICP pipeline which estimates the pose of the sensor by registering the projection of a 3D semi-dense map of the reference frame with the 2D semi-dense region extracted in the current frame. The processing is speeded up by efficiently implemented approximate nearest neighbour fields under the Euclidean distance criterion, which permits the use of compact Gauss-Newton updates in the optimization. The registration is formulated as a maximum a posterior problem to deal with outliers and sensor noises, and consequently the equivalent weighted least squares problem is solved by iteratively reweighted least squares method. A variety of robust weight functions are tested and the optimum is determined based on the characteristics of the sensor model. Extensive evaluation on publicly available RGB-D datasets shows that the proposed method predominantly outperforms existing state-of-the-art methods.
- Transportation mode detection with personal devices has been investigated for over ten years due to its importance in monitoring ones' activities, understanding human mobility, and assisting traffic management. However, two main limitations are still preventing it from large-scale deployments: high power consumption, and the lack of high-volume and diverse labeled data. In order to reduce power consumption, existing approaches are sampling using fewer sensors and with lower frequency, which however lead to a lower accuracy. A common way to obtain labeled data is recording the ground truth while collecting data, but such method cannot apply to large-scale deployment due to its inefficiency. To address these issues, we adopt a new low-frequency sampling manner with a hierarchical transportation mode identification algorithm and propose an offline data labeling approach with its manual and automatic implementations. Through a real-world large-scale experiment and comparison with related works, our sampling manner and algorithm are proved to consume much less energy while achieving a competitive accuracy around 85%. The new offline data labeling approach is also validated to be efficient and effective in providing ground truth for model training and testing.
- We present a new class of decentralized first-order methods for nonsmooth and stochastic optimization problems defined over multiagent networks. Considering that communication is a major bottleneck in decentralized optimization, our main goal in this paper is to develop algorithmic frameworks which can significantly reduce the number of inter-node communications. We first propose a decentralized primal-dual method which can find an $\epsilon$-solution both in terms of functional optimality gap and feasibility residual in $O(1/\epsilon)$ inter-node communication rounds when the objective functions are convex and the local primal subproblems are solved exactly. Our major contribution is to present a new class of decentralized primal-dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions. By employing DCS, agents can still find an $\epsilon$-solution in $O(1/\epsilon)$ (resp., $O(1/\sqrt{\epsilon})$) communication rounds for general convex functions (resp., strongly convex functions), while maintaining the $O(1/\epsilon^2)$ (resp., $O(1/\epsilon)$) bound on the total number of intra-node subgradient evaluations. We also present a stochastic counterpart for these algorithms, denoted by SDCS, for solving stochastic optimization problems whose objective function cannot be evaluated exactly. In comparison with existing results for decentralized nonsmooth and stochastic optimization, we can reduce the total number of inter-node communication rounds by orders of magnitude while still maintaining the optimal complexity bounds on intra-node stochastic subgradient evaluations. The bounds on the subgradient evaluations are actually comparable to those required for centralized nonsmooth and stochastic optimization.
- Jan 13 2017 cs.AI arXiv:1701.03322v2First-Order Logic (FOL) is widely regarded as one of the most important foundations for knowledge representation. Nevertheless, in this paper, we argue that FOL has several critical issues for this purpose. Instead, we propose an alternative called assertional logic, in which all syntactic objects are categorized as set theoretic constructs including individuals, concepts and operators, and all kinds of knowledge are formalized by equality assertions. We first present a primitive form of assertional logic that uses minimal assumed knowledge and constructs. Then, we show how to extend it by definitions, which are special kinds of knowledge, i.e., assertions. We argue that assertional logic, although simpler, is more expressive and extensible than FOL. As a case study, we show how assertional logic can be used to unify logic and probability, and more building blocks in AI.
- Jan 10 2017 cs.CV arXiv:1701.01924v1Image blur and image noise are common distortions during image acquisition. In this paper, we systematically study the effect of image distortions on the deep neural network (DNN) image classifiers. First, we examine the DNN classifier performance under four types of distortions. Second, we propose two approaches to alleviate the effect of image distortion: re-training and fine-tuning with noisy images. Our results suggest that, under certain conditions, fine-tuning with noisy images can alleviate much effect due to distorted inputs, and is more practical than re-training.
- Jan 10 2017 cs.CV arXiv:1701.01833v1Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During back-propagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce within-class rotation-invariant deep features while maintaining inter-class discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple state-of-the-art DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks.
- Dec 28 2016 cs.CV arXiv:1612.08230v4Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-Sørensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications.
- Dec 22 2016 cs.CV physics.optics arXiv:1612.07120v1We have designed a single-pixel camera with imaging around corners based on computational ghost imaging. It can obtain the image of an object when the camera cannot look at the object directly. Our imaging system explores the fact that a bucket detector in a ghost imaging setup has no spatial resolution capability. A series of experiments have been designed to confirm our predictions. This camera has potential applications for imaging around corner or other similar environments where the object cannot be observed directly.
- Designing chaotic maps with complex dynamics is a challenging topic. This paper introduces the nonlinear chaotic processing (NCP) model, which contains six basic nonlinear operations. Each operation is a general framework that can use existing chaotic maps as seed maps to generate a huge number of new chaotic maps. The proposed NCP model can be easily extended by introducing new nonlinear operations or by arbitrarily combining existing ones. The properties and chaotic behaviors of the NCP model are investigated. To show its effectiveness and usability, as examples, we provide four new chaotic maps generated by the NCP model and evaluate their chaotic performance using Lyapunov exponent, Shannon entropy, correlation dimension and initial state sensitivity. The experimental results show that these chaotic maps have more complex chaotic behaviors than existing ones.
- Nov 30 2016 cs.CV arXiv:1611.09580v1Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world. However, existing systems for video analysis still lack the ability to handle the problems of scalability, expansibility and error-prone, though great advances have been achieved in a number of visual recognition tasks and surveillance applications, e.g., pedestrian/vehicle detection, people/vehicle counting. Moreover, few algorithms explore the specific values/characteristics in large-scale surveillance videos. To address these problems in large-scale video analysis, we develop a scalable video parsing and evaluation platform through combining some advanced techniques for Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed Filesystem (HDFS). Also, a Web User Interface is designed in the system, to collect users' degrees of satisfaction on the recognition tasks so as to evaluate the performance of the whole system. Furthermore, the highly extensible platform running on the long-term surveillance videos makes it possible to develop more intelligent incremental algorithms to enhance the performance of various visual recognition tasks.
- Nov 28 2016 cs.SI arXiv:1611.08351v1According to NSDUH (National Survey on Drug Use and Health), 20 million Americans consumed drugs in the past few 30 days. Combating illicit drug use is of great interest to public health and law enforcement agencies. Despite of the importance, most of the existing studies on drug uses rely on surveys. Surveys on sensitive topics such as drug use may not be answered truthfully by the people taking them. Selecting a representative sample to survey is another major challenge. In this paper, we explore the possibility of using big multimedia data, including both images and text, from social media in order to discover drug use patterns at fine granularity with respect to demographics. Instagram posts are searched and collected by drug related terms by analyzing the hashtags supplied with each post. A large and dynamic dictionary of frequent drug related slangs is used to find these posts. User demographics are extracted using robust face image analysis algorithms. These posts are then mined to find common trends with regard to the time and location they are posted, and further in terms of age and gender of the drug users. Furthermore, by studying the accounts followed by the users of drug related posts, we extract common interests shared by drug users.
- Generalized twisted Gabidulin codes are one of the few known families of maximum rank matrix codes over finite fields. As a subset of m by n matrices, when m=n, the automorphism group of any generalized twisted Gabidulin code has been completely determined by the authors recently. In this paper, we consider the same problem for m<n. Under certain conditions on their parameters, we determine their middle nuclei and right nuclei, which are important invariants with respect to the equivalence for rank metric codes. Furthermore, we also use them to derive necessary conditions on the automorphisms of generalized twisted Gabidulin codes.
- Oct 11 2016 cs.CV arXiv:1610.02579v1The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. This paper also shows the details of our approach in wining the ImageNet object detection challenge of 2016, with source code provided on \urlhttps://github.com/craftGBD/craftGBD.
- Oct 11 2016 cs.CL arXiv:1610.02806v1We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we propose a way to incorporate attention into the tree topology. Specially, given a pair of sentences, our attentive encoder uses the representation of one sentence, which generated via an RNN, to guide the structural encoding of the other sentence on the dependency parse tree. We evaluate the proposed attentive encoder on three tasks: semantic similarity, paraphrase identification and true-false question selection. Experimental results show that our encoder outperforms all baselines and achieves state-of-the-art results on two tasks.
- Mean-squared-error (MSE) is one of the most widely used performance metrics for the designs and analysis of multi-input-multiple-output (MIMO) communications. Weighted MSE minimization, a more general formulation of MSE minimization, plays an important role in MIMO transceiver optimization. While this topic has a long history and has been extensively studied, existing treatments on the methods in solving the weighted MSE optimization are more or less sporadic and non-systematic. In this paper, we firstly review the two major methodologies, Lagrange multiplier method and majorization theory based method, and their common procedures in solving the weighted MSE minimization. Then some problems and limitations of the methods that were usually neglected or glossed over in existing literature are provided. These problems are fundamental and of critical importance for the corresponding MIMO transceiver optimizations. In addition, a new extended matrix-field weighted MSE model is proposed. Its solutions and applications are discussed in details. Compared with existing models, this new model has wider applications, e.g., nonlinear MIMO transceiver designs and capacity-maximization transceiver designs for general MIMO networks.
- Trajectory analysis is essential in many applications. In this paper, we address the problem of representing motion trajectories in a highly informative way, and consequently utilize it for analyzing trajectories. Our approach first leverages the complete information from given trajectories to construct a thermal transfer field which provides a context-rich way to describe the global motion pattern in a scene. Then, a 3D tube is derived which depicts an input trajectory by integrating its surrounding motion patterns contained in the thermal transfer field. The 3D tube effectively: 1) maintains the movement information of a trajectory, 2) embeds the complete contextual motion pattern around a trajectory, 3) visualizes information about a trajectory in a clear and unified way. We further introduce a droplet-based process. It derives a droplet vector from a 3D tube, so as to characterize the high-dimensional 3D tube information in a simple but effective way. Finally, we apply our tube-and-droplet representation to trajectory analysis applications including trajectory clustering, trajectory classification & abnormality detection, and 3D action recognition. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our approach.
- Aug 31 2016 cs.NE arXiv:1608.08607v2The balance between convergence and diversity is a key issue of evolutionary multi-objective optimization. The recently proposed stable matching-based selection provides a new perspective to handle this balance under the framework of decomposition multi-objective optimization. In particular, the stable matching between subproblems and solutions, which achieves an equilibrium between their mutual preferences, implicitly strikes a balance between the convergence and diversity. Nevertheless, the original stable matching model has a high risk of matching a solution with a unfavorable subproblem which finally leads to an imbalanced selection result. In this paper, we propose an adaptive two-level stable matching-based selection for decomposition multi-objective optimization. Specifically, borrowing the idea of stable matching with incomplete lists, we match each solution with one of its favorite subproblems by restricting the length of its preference list during the first-level stable matching. During the second-level stable matching, the remaining subproblems are thereafter matched with their favorite solutions according to the classic stable matching model. In particular, we develop an adaptive mechanism to automatically set the length of preference list for each solution according to its local competitiveness. The performance of our proposed method is validated and compared with several state-of-the-art evolutionary multi-objective optimization algorithms on 62 benchmark problem instances. Empirical results fully demonstrate the competitive performance of our proposed method on problems with complicated Pareto sets and those with more than three objectives.
- Aug 30 2016 cs.CV arXiv:1608.07724v1Based on life-long observations of physical, chemical, and biologic phenomena in the natural world, humans can often easily picture in their minds what an object will look like in the future. But, what about computers? In this paper, we learn computational models of object transformations from time-lapse videos. In particular, we explore the use of generative models to create depictions of objects at future times. These models explore several different prediction tasks: generating a future state given a single depiction of an object, generating a future state given two depictions of an object at different times, and generating future states recursively in a recurrent framework. We provide both qualitative and quantitative evaluations of the generated results, and also conduct a human evaluation to compare variations of our models.
- Magnetic skyrmions are promising candidates for next-generation information carriers, owing to their small size, topological stability, and ultralow depinning current density. A wide variety of skyrmionic device concepts and prototypes have been proposed, highlighting their potential applications. Here, we report on a bioinspired skyrmionic device with synaptic plasticity. The synaptic weight of the proposed device can be strengthened/weakened by positive/negative stimuli, mimicking the potentiation/depression process of a biological synapse. Both short-term plasticity(STP) and long-term potentiation(LTP) functionalities have been demonstrated for a spiking time-dependent plasticity(STDP) scheme. This proposal suggests new possibilities for synaptic devices for use in spiking neuromorphic computing applications.
- Label ranking aims to learn a mapping from instances to rankings over a finite number of predefined labels. Random forest is a powerful and one of the most successfully general-purpose machine learning algorithms of modern times. In the literature, there seems no research has yet been done in applying random forest to label ranking. In this paper, We present a powerful random forest label ranking method which uses random decision trees to retrieve nearest neighbors that are not only similar in the feature space but also in the ranking space. We have developed a novel two-step rank aggregation strategy to effectively aggregate neighboring rankings discovered by the random forest into a final predicted ranking. Compared with existing methods, the new random forest method has many advantages including its intrinsically scalable tree data structure, highly parallel-able computational architecture and much superior performances. We present extensive experimental results to demonstrate that our new method achieves the best predictive accuracy performances compared with state-of-the-art methods for datasets with complete ranking and datasets with only partial ranking information.
- This paper investigates the compress-and-forward scheme for an uplink cloud radio access network (C-RAN) model, where multi-antenna base-stations (BSs) are connected to a cloud-computing based central processor (CP) via capacity-limited fronthaul links. The BSs compress the received signals with Wyner-Ziv coding and send the representation bits to the CP; the CP performs the decoding of all the users' messages. Under this setup, this paper makes progress toward the optimal structure of the fronthaul compression and CP decoding strategies for the compress-and-forward scheme in C-RAN. On the CP decoding strategy design, this paper shows that under a sum fronthaul capacity constraint, a generalized successive decoding strategy of the quantization and user message codewords that allows arbitrary interleaved order at the CP achieves the same rate region as the optimal joint decoding. Further, it is shown that a practical strategy of successively decoding the quantization codewords first, then the user messages, achieves the same maximum sum rate as joint decoding under individual fronthaul constraints. On the joint optimization of user transmission and BS quantization strategies, this paper shows that if the input distributions are assumed to be Gaussian, then under joint decoding, the optimal quantization scheme for maximizing the achievable rate region is Gaussian. Moreover, Gaussian input and Gaussian quantization with joint decoding achieve to within a constant gap of the capacity region of the Gaussian multiple-input multiple-output (MIMO) uplink C-RAN model. Finally, this paper addresses the computational aspect of optimizing uplink MIMO C-RAN by showing that under fixed Gaussian input, the sum rate maximization problem over the Gaussian quantization noise covariance matrices can be formulated as convex optimization problems, thereby facilitating its efficient solution.
- Aug 16 2016 cs.DM arXiv:1608.04217v1Given a set of $n$ elements separated by a pairwise distance matrix, the minimum differential dispersion problem (Min-Diff DP) aims to identify a subset of m elements (m < n) such that the difference between the maximum sum and the minimum sum of the inter-element distances between any two chosen elements is minimized. We propose an effective iterated local search (denoted by ILS_MinDiff) for Min-Diff DP. To ensure an effective exploration and exploitation of the search space, the proposed ILS_MinDiff algorithm iterates through three sequential search phases: a fast descent-based neighborhood search phase to find a local optimum from a given starting solution, a local optima exploring phase to visit nearby high-quality solutions around a given local optimum, and a local optima escaping phase to move away from the current search region. Experimental results on six data sets of 190 benchmark instances demonstrate that ILS_MinDiff competes favorably with the state-of-the-art algorithms by finding 130 improved best results (new upper bounds).
- Jul 18 2016 cs.CV arXiv:1607.04564v3Vehicle detection and annotation for streaming video data with complex scenes is an interesting but challenging task for urban traffic surveillance. In this paper, we present a fast framework of Detection and Annotation for Vehicles (DAVE), which effectively combines vehicle detection and attributes annotation. DAVE consists of two convolutional neural networks (CNNs): a fast vehicle proposal network (FVPN) for vehicle-like objects extraction and an attributes learning network (ALN) aiming to verify each proposal and infer each vehicle's pose, color and type simultaneously. These two nets are jointly optimized so that abundant latent knowledge learned from the ALN can be exploited to guide FVPN training. Once the system is trained, it can achieve efficient vehicle detection and annotation for real-world traffic surveillance data. We evaluate DAVE on a new self-collected UTS dataset and the public PASCAL VOC2007 car and LISA 2010 datasets, with consistent improvements over existing algorithms.
- Jul 14 2016 cs.CY arXiv:1607.03575v1Ads are an important revenue source for mobile app development, especially for free apps, whose expense can be compensated by ad revenue. The ad benefits also carry with costs. For example, too many ads can interfere the user experience, leading to less user retention and reduced earnings ultimately. In the paper, we aim at understanding the ad costs from users perspective. We utilize app reviews, which are widely recognized as expressions of user perceptions, to identify the ad costs concerned by users. Four types of ad costs, i.e., number of ads, memory/CPU overhead, traffic usage, and bettery consumption, have been discovered from user reviews. To verify whether different ad integration schemes generate different ad costs, we first obtain the commonly used ad schemes from 104 popular apps, and then design a framework named IntelliAd to automatically measure the ad costs of each scheme. To demonstrate whether these costs indeed influence users reactions, we finally observe the correlations between the measured ad costs and the user perceptions. We discover that the costs related to memory/CPU overhead and battery consumption are more concerned by users, while the traffic usage is less concerned by users in spite of its obvious variations among different schemes in the experiments. Our experimental results provide the developers with suggestions on better incorporating ads into apps and, meanwhile, ensuring the user experience.
- For each rank metric code $\mathcal{C}\subseteq \mathbb{K}^{m\times n}$, we associate a translation structure, the kernel of which is shown to be invariant with respect to the equivalence on rank metric codes. When $\mathcal{C}$ is $\mathbb{K}$-linear, we also propose and investigate other two invariants called its middle nucleus and right nucleus. When $\mathbb{K}$ is a finite field $\mathbb{F}_q$ and $\mathcal{C}$ is a maximum rank distance code with minimum distance $d<\min\{m,n\}$ or $\gcd(m,n)=1$, the kernel of the associated translation structure is proved to be $\mathbb{F}_q$. Furthermore, we also show that the middle nucleus of a linear maximum rank distance code over $\mathbb{F}_q$ must be a finite field; its right nucleus also has to be a finite field under the condition $\max\{d,m-d+2\} \geqslant \left\lfloor \frac{n}{2} \right\rfloor +1$. Let $\mathcal{D}$ be the DHO-set associated with a bilinear dimensional dual hyperoval over $\mathbb{F}_2$. The set $\mathcal{D}$ gives rise to a linear rank metric code, and we show that its kernel and right nucleus are is isomorphic to $\mathbb{F}_2$. Also, its middle nucleus must be a finite field containing $\mathbb{F}_q$. Moreover, we also consider the kernel and the nuclei of $\mathcal{D}^k$ where $k$ is a Knuth operation.
- We present an eigenspectrum partitioning scheme without inversion for the recently described real-space electronic transport code, TRANSEC. The primary advantage of TRANSEC is its highly parallel algorithm, which enables studying conductance in large systems. The present scheme adds a new source of parallelization, significantly enhancing TRANSEC's parallel scalability, especially for systems with many electrons. In principle, partitioning could enable super-linear parallel speedup, as we demonstrate in calculations within TRANSEC. In practical cases, we report better than five-fold improvement in CPU time and similar improvements in wall time, compared to previously-published large calculations. Importantly, the suggested scheme is relatively simple to implement. It can be useful for general large Hermitian or weakly non-Hermitian eigenvalue problems, whenever relatively accurate inversion via direct or iterative linear solvers is impractical.
- We study the phase retrieval problem, which solves quadratic system of equations, i.e., recovers a vector $\boldsymbol{x}\in \mathbb{R}^n$ from its magnitude measurements $y_i=|\langle \boldsymbol{a}_i, \boldsymbol{x}\rangle|, i=1,..., m$. We develop a gradient-like algorithm (referred to as RWF representing reshaped Wirtinger flow) by minimizing a nonconvex nonsmooth loss function. In comparison with existing nonconvex Wirtinger flow (WF) algorithm \citecandes2015phase, although the loss function becomes nonsmooth, it involves only the second power of variable and hence reduces the complexity. We show that for random Gaussian measurements, RWF enjoys geometric convergence to a global optimal point as long as the number $m$ of measurements is on the order of $n$, the dimension of the unknown $\boldsymbol{x}$. This improves the sample complexity of WF, and achieves the same sample complexity as truncated Wirtinger flow (TWF) \citechen2015solving, but without truncation in gradient loop. Furthermore, RWF costs less computationally than WF, and runs faster numerically than both WF and TWF. We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal. We further establish performance guarantee of an existing Kaczmarz method for the phase retrieval problem based on its connection to IRWF. We also empirically demonstrate that IRWF outperforms existing ITWF algorithm (stochastic version of TWF) as well as other batch algorithms.
- Apr 26 2016 cs.SI arXiv:1604.07096v1Drug use by people is on the rise and is of great interest to public health agencies and law enforcement agencies. As found by the National Survey on Drug Use and Health, 20 million Americans aged 12 years or older consumed illicit drugs in the past few 30 days. Given their ubiquity in everyday life, drug abuse related studies have received much and constant attention. However, most of the existing studies rely on surveys. Surveys present a fair number of problems because of their nature. Surveys on sensitive topics such as illicit drug use may not be answered truthfully by the people taking them. Selecting a representative sample to survey is another major challenge. In this paper, we explore the possibility of using big data from social media in order to understand illicit drug use behaviors. Instagram posts are collected using drug related terms by analyzing the hashtags supplied with each post. A large and dynamic dictionary of frequent illicit drug related slang is used to find these posts. These posts are studied to find common drug consumption behaviors with regard to time of day and week. Furthermore, by studying the accounts followed by the users of drug related posts, we hope to discover common interests shared by drug users.
- This paper considers the joint fronthaul compression and transmit beamforming design for the uplink cloud radio access network (C-RAN), in which multi-antenna user terminals communicate with a cloud-computing based centralized processor (CP) through multi-antenna base-stations (BSs) serving as relay nodes. A compress-and-forward relaying strategy, named the VMAC scheme, is employed, in which the BSs can either perform single-user compression or Wyner-Ziv coding to quantize the received signals and send the quantization bits to the CP via capacity-limited fronthaul links; the CP performs successive decoding with either successive interference cancellation (SIC) receiver or linear minimum-mean-square-error (MMSE) receiver. Under this setup, this paper investigates the joint optimization of the transmit beamformers at the users and the quantization noise covariance matrices at the BSs for maximizing the network utility. A novel weighted minimum-mean-square-error successive convex approximation (WMMSE-SCA) algorithm is first proposed for maximizing the weighted sum rate under the user transmit power and fronthaul capacity constraints with single-user compression. Assuming a heuristic decompression order, the proposed algorithm is then adapted for optimizing the transmit beamforming and fronthaul compression under Wyner-Ziv coding. This paper also proposes a low-complexity separate design consisting of optimizing transmit beamformers for the Gaussian vector multiple-access channel along with per-antenna quantizers with uniform quantization noise levels across the antennas at each BS. Numerical results show that with optimized beamforming and fronthaul compression, C-RAN can significantly outperform conventional cellular networks. Furthermore, the low complexity separate design already performs very close to the optimized joint design in regime of practical interest.
- Apr 07 2016 cs.NI arXiv:1604.01675v1Unraveling quality of experience (QoE) of video streaming is very challenging in bandwidth shared wireless networks. It is unclear how QoE metrics such as starvation probability and buffering time interact with dynamics of streaming traffic load. In this paper, we collect view records from one of the largest streaming providers in China over two weeks and perform an in-depth measurement study on flow arrival and viewing time that shed light on the real traffic pattern. Our most important observation is that the viewing time of streaming users fits a hyper-exponential distribution quite well. This implies that all the views can be categorized into two classes, short and long views with separated time scales. We then map the measured traffic pattern to bandwidth shared cellular networks and propose an analytical framework to compute the closed-form starvation probability on the basis of ordinary differential equations (ODEs). Our framework can be naturally extended to investigate practical issues including the progressive downloading and the finite video duration. Extensive trace-driven simulations validate the accuracy of our models. Our study reveals that the starvation metrics of the short and long views possess different sensitivities to the scheduling priority at base station. Hence, a better QoE tradeoff between the short and long views has a potential to be leveraged by offering them different scheduling weights. The flow differentiation involves tremendous technical and non-technical challenges because video content is owned by content providers but not the network operators and the viewing time of each session is unknown beforehand. To overcome these difficulties, we propose an online Bayesian approach to infer the viewing time of each incoming flow with the "least" information from content providers.
- Apr 04 2016 cs.AI arXiv:1604.00377v1Grouping problems aim to partition a set of items into multiple mutually disjoint subsets according to some specific criterion and constraints. Grouping problems cover a large class of important combinatorial optimization problems that are generally computationally difficult. In this paper, we propose a general solution approach for grouping problems, i.e., reinforcement learning based local search (RLS), which combines reinforcement learning techniques with descent-based local search. The viability of the proposed approach is verified on a well-known representative grouping problem (graph coloring) where a very simple descent-based coloring algorithm is applied. Experimental studies on popular DIMACS and COLOR02 benchmark graphs indicate that RLS achieves competitive performances compared to a number of well-known coloring algorithms.
- In this paper, we consider the robot motion (or task) planning problem under some given time bounded high level specifications. We use metric interval temporal logic (MITL), a member of the temporal logic family, to represent the task specification and then we provide a constructive way to generate a timed automaton and methods to look for accepting runs on the automaton to find a feasible motion (or path) sequence for the robot to complete the task.
- Mar 17 2016 cs.SE arXiv:1603.05069v1Modern cyber-physical systems (CPS) have a close inter-dependence between software and physical components. Automotive embedded systems are typical CPS, as physical chips, sensors and actuators are physical components and software embedded within are the cyber components. The current stage of embedded systems is highly complex in architecture design for both software and hardware. It is common in industrial practice that high level control algorithm development and low level code implementation on hardware platforms are developed separately with limited shared information. However, software code and hardware architecture become closely related with the increasing complexity. Correlated requirements and dependencies between hardware and software are emerging problems of industrial practice. We demonstrate in this paper a method to link model based system design with real-time simulations and analysis of the architecture model. This allows hardware software co-design and thus early selection of hardware architecture.
- Mar 14 2016 cs.AI arXiv:1603.03511v1In this paper, we propose a set theoretic approach for knowledge representation. While the syntax of an application domain is captured by set theoretic constructs including individuals, concepts and operators, knowledge is formalized by equality assertions. We first present a primitive form that uses minimal assumed knowledge and constructs. Then, assuming naive set theory, we extend it by definitions, which are special kinds of knowledge. Interestingly, we show that the primitive form is expressive enough to define logic operators, not only propositional connectives but also quantifiers.
- Mar 08 2016 cs.SI arXiv:1603.02175v1In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justify why it is used instead of video-based methods, or many latent topic models such as LDA and Collaborative Filtering approaches. We then show the effectiveness of the different demographic and social features, and their combinations and derivatives, in predicting user interest similarity, based on different machine-learning methods for combining multiple features. We propose a hybrid tree-encoded linear model for combining the features, and show that it out-performs other linear and treebased models. Our methods can be used to predict user interest similarity when the ground-truth is not available, e.g. for new users, or inactive users whose interests may have changed from old access data, and is useful for video recommendation. Our study is based on a rich dataset from Tencent, a popular service provider of social networks, video services, and various other services in China.
- While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks. Specifically, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate for validation due to shifting parameter values (especially during initial training epochs). Also, BN cannot be used with batch-size 1 during training. We address these drawbacks by proposing a non-adaptive normalization technique for removing internal covariate shift, that we call Normalization Propagation. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer thus being computationally faster compared with BN. We exploit the observation that the pre-activation before Rectified Linear Units follow Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics for hidden layers.
- Mar 01 2016 cond-mat.mes-hall cs.ET arXiv:1602.08799v1Magnetic skyrmion holds promise as information carriers in the next-generation memory and logic devices, owing to the topological stability, small size and extremely low current needed to drive it. One of the most potential applications of skyrmion is to design racetrack memory (RM), named Sk-RM, instead of utilizing domain wall (DW). However, current studies face some key design challenges, e.g., skyrmion manipulation, data representation and synchronization etc. To address these challenges, we propose here a complementary Sk-RM structure with voltage manipulation. Functionality and performance of the proposed design are investigated with micromagnetic simulations.
- With the rapid growth of Internet technologies, cloud computing and social networks have become ubiquitous. An increasing number of people participate in social networks and massive online social data are obtained. In order to exploit knowledge from copious amounts of data obtained and predict social behavior of users, we urge to realize data mining in social networks. Almost all online websites use cloud services to effectively process the large scale of social data, which are gathered from distributed data centers. These data are so large-scale, high-dimension and widely distributed that we propose a distributed sparse online algorithm to handle them. Additionally, privacy-protection is an important point in social networks. We should not compromise the privacy of individuals in networks, while these social data are being learned for data mining. Thus we also consider the privacy problem in this article. Our simulations shows that the appropriate sparsity of data would enhance the performance of our algorithm and the privacy-preserving method does not significantly hurt the performance of the proposed algorithm.
- Feb 12 2016 cs.DC arXiv:1602.03770v1Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.
- Feb 11 2016 cs.CV arXiv:1602.03346v1Action parsing in videos with complex scenes is an interesting but challenging task in computer vision. In this paper, we propose a generic 3D convolutional neural network in a multi-task learning manner for effective Deep Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase, action localization, classification and attributes learning can be jointly optimized on our appearancemotion data via DAP3D-Net. For an upcoming test video, we can describe each individual action in the video simultaneously as: Where the action occurs, What the action is and How the action is performed. To well demonstrate the effectiveness of the proposed DAP3D-Net, we also contribute a new Numerous-category Aligned Synthetic Action dataset, i.e., NASA, which consists of 200; 000 action clips of more than 300 categories and with 33 pre-defined action attributes in two hierarchical levels (i.e., low-level attributes of basic body part movements and high-level attributes related to action motion). We learn DAP3D-Net using the NASA dataset and then evaluate it on our collected Human Action Understanding (HAU) dataset. Experimental results show that our approach can accurately localize, categorize and describe multiple actions in realistic videos.
- Self-propagating malware (e.g., an Internet worm) exploits security loopholes in software to infect servers and then use them to scan the Internet for more vulnerable servers. While the mechanisms of worm infection and their propagation models are well understood, defense against worms remains an open problem. One branch of defense research investigates the behavioral difference between worm-infected hosts and normal hosts to set them apart. One particular observation is that a worm-infected host, which scans the Internet with randomly selected addresses, has a much higher connection-failure rate than a normal host. Rate-limit algorithms have been proposed to control the spread of worms by traffic shaping based on connection failure rate. However, these rate-limit algorithms can work properly only if it is possible to measure failure rates of individual hosts efficiently and accurately. This paper points out a serious problem in the prior method. To address this problem, we first propose a solution based on a highly efficient double-bitmap data structure, which places only a small memory footprint on the routers, while providing good measurement of connection failure rates whose accuracy can be tuned by system parameters. Furthermore, we propose another solution based on shared register array data structure, achieving better memory efficiency and much larger estimation range than our double-bitmap solution.
- Feb 08 2016 cs.DC arXiv:1602.01959v3In-memory caching of intermediate data and eager combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in distributed data processing systems like Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap, which may quickly saturate the garbage collector, especially when handling a large dataset, and hence would limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which, by automatically analyzing the user-defined functions and data types, obtains the expected lifetime of the data objects, and then allocates and releases memory space accordingly to minimize the garbage collection overhead. In particular, we present Deca, a concrete implementation of our proposal on top of Spark, which transparently decomposes and groups objects with similar lifetimes into byte arrays and releases their space altogether when their lifetimes come to an end. An extensive experimental study using both synthetic and real datasets shows that, in comparing to Spark, Deca is able to 1) reduce the garbage collection time by up to 99.9%, 2) to achieve up to 22.7x speed up in terms of execution time in cases without data spilling and 41.6x speedup in cases with data spilling, and 3) to consume up to 46.6% less memory.
- Jan 14 2016 cond-mat.mes-hall cs.ET arXiv:1601.03085v1Magnetic domain-wall (DW) has been widely investigated for future memory and computing systems. However, energy efficiency and stability become two major challenges of DW-based systems. In this letter, we first propose exploiting skyrmions as on-chip and inter-chip interconnects for DW-based systems, owing to the topological stability, small size and ultra-low depinning current density. In the proposed technique, data are processed in the form of DWs but are transmitted instead in the form of skyrmions. The reversible conversion between a skyrmion and a DW pair can be physically achieved by connecting a wide and a narrow magnetic nanowire. Our proposed technique can realize highly compact, robust and energy-efficient on-chip and inter-chip interconnects for DW-based systems, enabling the system to take advantages of both the DW and skyrmion.
- Association rules is a very important part of data mining. It is used to find the interesting patterns from transaction databases. Apriori algorithm is one of the most classical algorithms of association rules, but it has the bottleneck in efficiency. In this article, we proposed a prefixed-itemset-based data structure for candidate itemset generation, with the help of the structure we managed to improve the efficiency of the classical Apriori algorithm.
- Jan 07 2016 cs.CV arXiv:1601.01145v2We address the vehicle detection and classification problems using Deep Neural Networks (DNNs) approaches. Here we answer to questions that are specific to our application including how to utilize DNN for vehicle detection, what features are useful for vehicle classification, and how to extend a model trained on a limited size dataset, to the cases of extreme lighting condition. Answering these questions we propose our approach that outperforms state-of-the-art methods, and achieves promising results on image with extreme lighting conditions.
- Dec 29 2015 cs.SE arXiv:1512.07950v1Android applications (apps) grow dramatically in recent years. Apps are user interface (UI) centric typically. Rapid UI responsiveness is key consideration to app developers. However, we still lack a handy tool for profiling app performance so as to diagnose performance problems. This paper presents PersisDroid, a tool specifically designed for this task. The key notion of PersisDroid is that the UI-triggered asynchronous executions also contribute to the UI performance, and hence its performance should be properly captured to facilitate performance diagnosis. However, Android allows tremendous ways to start the asynchronous executions, posing a great challenge to profiling such execution. This paper finds that they can be grouped into six categories. As a result, they can be tracked and profiled according to the specifics of each category with a dynamic instrumentation approach carefully tailored for Android. PersisDroid can then properly profile the asynchronous executions in task granularity, which equips it with low-overhead and high compatibility merits. Most importantly, the profiling data can greatly help the developers in detecting and locating performance anomalies. We code and open-source release PersisDroid. The tool is applied in diagnosing 20 open-source apps, and we find 11 of them contain potential performance problems, which shows its effectiveness in performance diagnosis for Android apps.
- Dec 22 2015 cs.AI arXiv:1512.06747v1Accurate and computationally efficient means for classifying human activities have been the subject of extensive research efforts. Most current research focuses on extracting complex features to achieve high classification accuracy. We propose a template selection approach based on Dynamic Time Warping, such that complex feature extraction and domain knowledge is avoided. We demonstrate the predictive capability of the algorithm on both simulated and real smartphone data.
- Dec 22 2015 cs.CR arXiv:1512.06578v1In e-healthcare record systems (EHRS), attribute-based encryption (ABE) appears as a natural way to achieve fine-grained access control on health records. Some proposals exploit key-policy ABE (KP-ABE) to protect privacy in such a way that all users are associated with specific access policies and only the ciphertexts matching the users' access policies can be decrypted. An issue with KP-ABE is that it requires an a priori formulation of access policies during key generation, which is not always practicable in EHRS because the policies to access health records are sometimes determined after key generation. In this paper, we revisit KPABE and propose a dynamic ABE paradigm, referred to as access policy redefinable ABE (APR-ABE). To address the above issue, APR-ABE allows users to redefine their access policies and delegate keys for the redefined ones; hence a priori precise policies are no longer mandatory. We construct an APR-ABE scheme with short ciphertexts and prove its full security in the standard model under several static assumptions.
- Dec 08 2015 cs.CV arXiv:1512.01691v1In this paper we present a framework for secure identification using deep neural networks, and apply it to the task of template protection for face authentication. We use deep convolutional neural networks (CNNs) to learn a mapping from face images to maximum entropy binary (MEB) codes. The mapping is robust enough to tackle the problem of exact matching, yielding the same code for new samples of a user as the code assigned during training. These codes are then hashed using any hash function that follows the random oracle model (like SHA-512) to generate protected face templates (similar to text based password protection). The algorithm makes no unrealistic assumptions and offers high template security, cancelability, and state-of-the-art matching performance. The efficacy of the approach is shown on CMU-PIE, Extended Yale B, and Multi-PIE face databases. We achieve high (~95%) genuine accept rates (GAR) at zero false accept rate (FAR) with up to 1024 bits of template security.
- In this paper, we propose a reachable set based collision avoidance algorithm for unmanned aerial vehicles (UAVs). UAVs have been deployed for agriculture research and management, surveillance and sensor coverage for threat detection and disaster search and rescue operations. It is essential for the aircraft to have on-board collision avoidance capability to guarantee safety. Instead of the traditional approach of collision avoidance between trajectories, we propose a collision avoidance scheme based on reachable sets and tubes. We then formulate the problem as a convex optimization problem seeking suitable control constraint sets for participating aircraft. We have applied the approach on a case study of two quadrotors and two fix-wing aircraft collision avoidance scenario.
- Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e., the outer product of two "sufficient factors" (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency --- communication costs are linear in the parameter matrix's dimensions, rather than quadratic --- without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models.
- Low rank matrix approximation is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns and provides concise representations for the data. Research on low rank approximation usually focus on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrix. Here we are given a $d \times n$ binary matrix $A$ and a small integer $k$. The goal is to find two binary matrices $U$ and $V$ of sizes $d \times k$ and $k \times n$ respectively, so that the Frobenius norm of $A - U V$ is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The $\mathrm{GF}(2)$ model and the Boolean semiring model. Unlike low rank approximation of real matrix which can be efficiently solved by Singular Value Decomposition, approximation of binary matrix is $NP$-hard even for $k=1$. In this paper, we consider the problem of Column Subset Selection (CSS), in which one low rank matrix must be formed by $k$ columns of the data matrix. We characterize the approximation ratio of CSS for binary matrices. For $GF(2)$ model, we show the approximation ratio of CSS is bounded by $\frac{k}{2}+1+\frac{k}{2(2^k-1)}$ and this bound is asymptotically tight. For Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of one low rank matrix are generated from Boolean formulas operating bitwise on columns of the data matrix. We show the approximation ratio of GCSS is bounded by $2^{k-1}+1$, and the exponential dependency on $k$ is inherent.