results for au:Chen_Y in:cs

- Mar 28 2017 cs.CV arXiv:1703.09039v1Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of deep neural network to improve energy-efficiency and throughput without sacrificing performance accuracy or increasing hardware cost are critical to enabling the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various platforms and architectures that support DNNs, and highlight key trends in recent efficient processing techniques that reduce the computation cost of DNNs either solely via hardware design changes or via joint hardware design and network algorithm changes. It will also summarize various development resources that can enable researchers and practitioners to quickly get started on DNN design, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-design, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand trade-offs between various architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand of recent implementation trends and opportunities.
- For robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules. However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.
- Mar 28 2017 cs.CV arXiv:1703.08837v1The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion. While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learned are view-generic and thus potentially less effective in quantifying the feature distortion inherent to each camera view. Learning view-specific feature transformations for re-id (i.e., view-specific re-id), an under-studied approach, becomes an alternative resort for this problem. In this work, we formulate a novel view-specific person re-identification framework from the feature augmentation point of view, called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically, CRAFT performs cross-view adaptation by automatically measuring camera correlation from cross-view visual data distribution and adaptively conducting feature augmentation to transform the original features into a new adaptive space. Through our augmentation framework, view-generic learning algorithms can be readily generalized to learn and optimize view-specific sub-models whilst simultaneously modelling view-generic discrimination information. Therefore, our framework not only inherits the strength of view-generic model learning but also provides an effective way to take into account view specific characteristics. Our CRAFT framework can be extended to jointly learn view-specific feature transformations for person re-id across a large network with more than two cameras, a largely under-investigated but realistic re-id setting. Additionally, we present a domain-generic deep person appearance representation which is designed particularly to be towards view invariant for facilitating cross-view adaptation by CRAFT.
- Mar 28 2017 cs.GT arXiv:1703.08636v1We propose definitions of substitutes and complements for pieces of information ("signals") in the context of a decision or optimization problem, with game-theoretic and algorithmic applications. In a game-theoretic context, substitutes capture diminishing marginal value of information to a rational decision maker. We use the definitions to address the question of how and when information is aggregated in prediction markets. Substitutes characterize "best-possible" equilibria with immediate information aggregation, while complements characterize "worst-possible", delayed aggregation. Game-theoretic applications also include settings such as crowdsourcing contests and Q\&A forums. In an algorithmic context, where substitutes capture diminishing marginal improvement of information to an optimization problem, substitutes imply efficient approximation algorithms for a very general class of (adaptive) information acquisition problems. In tandem with these broad applications, we examine the structure and design of informational substitutes and complements. They have equivalent, intuitive definitions from disparate perspectives: submodularity, geometry, and information theory. We also consider the design of scoring rules or optimization problems so as to encourage substitutability or complementarity, with positive and negative results. Taken as a whole, the results give some evidence that, in parallel with substitutable items, informational substitutes play a natural conceptual and formal role in game theory and algorithms.
- Language understanding is a key component in a spoken dialogue system. In this paper, we investigate how the language understanding module influences the dialogue system performance by conducting a series of systematic experiments on a task-oriented neural dialogue system in a reinforcement learning based setting. The empirical study shows that among different types of language understanding errors, slot-level errors can have more impact on the overall performance of a dialogue system compared to intent-level errors. In addition, our experiments demonstrate that the reinforcement learning based dialogue system is able to learn when and what to confirm in order to achieve better performance and greater robustness.
- Mar 21 2017 cs.CC arXiv:1703.06423v1The dichotomy conjecture for the parameterized embedding problem states that the problem of deciding whether a given graph $G$ from some class $K$ of "pattern graphs" can be embedded into a given graph $H$ (that is, is isomorphic to a subgraph of $H$) is fixed-parameter tractable if $K$ is a class of graphs of bounded tree width and $W[1]$-complete otherwise. Towards this conjecture, we prove that the embedding problem is $W[1]$-complete if $K$ is the class of all grids or the class of all walls.
- Mar 21 2017 cs.CV arXiv:1703.06260v1Image super-resolution using self-optimizing mask via fractional-order gradient interpolation and reconstruction aims to recover detailed information from low-resolution images and reconstruct them into high-resolution images. Due to the limited amount of data and information retrieved from low-resolution images, it is difficult to restore clear, artifact-free images, while still preserving enough structure of the image such as the texture. This paper presents a new single image super-resolution method which is based on adaptive fractional-order gradient interpolation and reconstruction. The interpolated image gradient via optimal fractional-order gradient is first constructed according to the image similarity and afterwards the minimum energy function is employed to reconstruct the final high-resolution image. Fractional-order gradient based interpolation methods provide an additional degree of freedom which helps optimize the implementation quality due to the fact that an extra free parameter $\alpha$-order is being used. The proposed method is able to produce a rich texture detail while still being able to maintain structural similarity even under large zoom conditions. Experimental results show that the proposed method performs better than current single image super-resolution techniques.
- Mar 21 2017 cs.GT arXiv:1703.06824v1In this paper, energy efficient power control for small cells underlaying a macro cellular network is investigated. We formulate the power control problem in self-organizing small cell networks as a non-cooperative game, and propose a distributed energy efficient power control scheme, which allows the small base stations (SBSs) to take individual decisions for attaining the Nash equilibrium (NE) with minimum information exchange. Specially, in the non-cooperative power control game, a non-convex optimization problem is formulated for each SBS to maximize their energy efficiency (EE). By exploiting the properties of parameter-free fractional programming and the concept of perspective function, the non-convex optimization problem for each SBS is transformed into an equivalent constrained convex optimization problem. Then, the constrained convex optimization problem is converted into an unconstrained convex optimization problem by exploiting the mixed penalty function method. The inequality constraints are eliminated by introducing the logarithmic barrier functions and the equality constraint is eliminated by introducing the quadratic penalty function. We also theoretically show the existence and the uniqueness of the NE in the non-cooperative power control game. Simulation results show remarkable improvements in terms of EE by using the proposed scheme.
- Mar 20 2017 cs.CV arXiv:1703.05853v1Computer vision enables a wide range of applications in robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy and/or latency concerns. Accordingly, energy-efficient embedded vision hardware delivering real-time and robust performance is crucial. While deep learning is gaining popularity in several computer vision algorithms, a significant energy consumption difference exists compared to traditional hand-crafted approaches. In this paper, we provide an in-depth analysis of the computation, energy and accuracy trade-offs between learned features such as deep Convolutional Neural Networks (CNN) and hand-crafted features such as Histogram of Oriented Gradients (HOG). This analysis is supported by measurements from two chips that implement these algorithms. Our goal is to understand the source of the energy discrepancy between the two approaches and to provide insight about the potential areas where CNNs can be improved and eventually approach the energy-efficiency of HOG while maintaining its outstanding performance accuracy.
- We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms are either heuristics with no guarantees, or scale poorly (with exponential run time in terms of the number of available tests). Moreover, these methods assume a known distribution over the test outcomes, which is often not the case in practice. We propose an efficient sampling-based online learning framework to address the above issues. First, assuming the distribution over hypotheses is known, we propose a dynamic hypothesis enumeration strategy, which allows efficient information gathering with strong theoretical guarantees. We show that with sufficient amount of samples, one can identify a near-optimal decision with high probability. Second, when the parameters of the hypotheses distribution are unknown, we propose an algorithm which learns the parameters progressively via posterior sampling in an online fashion. We further establish a rigorous bound on the expected regret. We demonstrate the effectiveness of our approach on a real-world interactive troubleshooting application and show that one can efficiently make high-quality decisions with low cost.
- Mar 14 2017 cs.LG arXiv:1703.04318v1Advances in Machine Learning (ML) have led to its adoption as an integral component in many applications, including banking, medical diagnosis, and driverless cars. To further broaden the use of ML models, cloud-based services offered by Microsoft, Amazon, Google, and others have developed ML-as-a-service tools as black-box systems. However, ML classifiers are vulnerable to adversarial examples: inputs that are maliciously modified can cause the classifier to provide adversary-desired outputs. Moreover, it is known that adversarial examples generated on one classifier are likely to cause another classifier to make the same mistake, even if the classifiers have different architectures or are trained on disjoint datasets. This property, which is known as transferability, opens up the possibility of attacking black-box systems by generating adversarial examples on a substitute classifier and transferring the examples to the target classifier. Therefore, the key to protect black-box learning systems against the adversarial examples is to block their transferability. To this end, we propose a training method that, as the input is more perturbed, the classifier smoothly outputs lower confidence on the original label and instead predicts that the input is "invalid". In essence, we augment the output class set with a NULL label and train the classifier to reject the adversarial examples by classifying them as NULL. In experiments, we apply a wide range of attacks based on adversarial examples on the black-box systems. We show that a classifier trained with the proposed method effectively resists against the adversarial examples, while maintaining the accuracy on clean data.
- Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNN's learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.
- Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data as the inputs to perform re-training, where the input data can be poisoned. Although poisoning attack against support vector machines (SVM) has been extensively studied before, there is still very limited knowledge about how such attack can be implemented on neural networks (NN), especially DNNs. In this work, we first examine the possibility of applying traditional gradient-based method (named as the direct gradient method) to generate poisoned data against NNs by leveraging the gradient of the target model w.r.t. the normal data. We then propose a generative method to accelerate the generation rate of the poisoned data: an auto-encoder (generator) used to generate poisoned data is updated by a reward function of the loss, and the target NN model (discriminator) receives the poisoned data to calculate the loss w.r.t. the normal data. Our experiment results show that the generative method can speed up the poisoned data generation rate by up to 239.38x compared with the direct gradient method, with slightly lower model accuracy degradation. A countermeasure is also designed to detect such poisoning attack methods by checking the loss of the target model.
- Mar 07 2017 cs.LO arXiv:1703.01860v1The parameterized model-checking problem for a class of first-order sentences (queries) asks to decide whether a given sentence from the class holds true in a given relational structure (database); the parameter is the length of the sentence. In 1995 Vardi observed a polynomial time algorithm deciding the model-checking problem for queries with a bounded number of variables. We study its parameterized space complexity. For each bound on the quantifier alternation rank the problem becomes complete for the corresponding level of what we call the tree hierarchy, a hierarchy of parameterized complexity classes defined via space bounded alternating machines between parameterized logarithmic space and fixed-parameter tractable time. We observe that a parameterized logarithmic space model-checker for existential bounded variable queries would allow to improve Savitch's classical simulation of nondeterministic logarithmic space in deterministic space $O(\log^2)$. Further, we define a highly space efficient model-checker for queries with a bounded number of variables and bounded quantifier alternation rank. We study its optimality under the assumption that Savitch's theorem is optimal.
- This paper presents an end-to-end learning framework for task-completion neural dialogue systems, which leverages supervised and reinforcement learning with various deep-learning models. The system is able to interface with a structured database, and interact with users for assisting them to access information and complete tasks such as booking movie tickets. Our experiments in a movie-ticket booking domain show the proposed system outperforms a modular-based dialogue system and is more robust to noise produced by other components in the system.
- Mar 03 2017 cs.DS arXiv:1703.00575v1In a recent paper, Braun, Chung and Graham [1] have addressed a single-processor scheduling problem with time restrictions. Given a fixed integer $B \geq 2$, there is a set of jobs to be processed by a single processor subject to the following B-constraint. For any real $x$, no unit time interval $[x, x+1)$ is allowed to intersect more than $B$ jobs. The problem has been shown to be NP-hard when $B$ is part of the input and left as an open question whether it remains NP-hard or not if $B$ is fixed [1, 5, 7]. This paper contributes to answering this question that we prove the problem is NP-hard even when $B=2$. A PTAS is also presented for any constant $B \geq 2$.
- In this paper, we consider the mixture of sparse linear regressions model. Let ${\beta}^{(1)},\ldots,{\beta}^{(L)}\in\mathbb{C}^n$ be $ L $ unknown sparse parameter vectors with a total of $ K $ non-zero coefficients. Noisy linear measurements are obtained in the form $y_i={x}_i^H {\beta}^{(\ell_i)} + w_i$, each of which is generated randomly from one of the sparse vectors with the label $ \ell_i $ unknown. The goal is to estimate the parameter vectors efficiently with low sample and computational costs. This problem presents significant challenges as one needs to simultaneously solve the demixing problem of recovering the labels $ \ell_i $ as well as the estimation problem of recovering the sparse vectors $ {\beta}^{(\ell)} $. Our solution to the problem leverages the connection between modern coding theory and statistical inference. We introduce a new algorithm, Mixed-Coloring, which samples the mixture strategically using query vectors $ {x}_i $ constructed based on ideas from sparse graph codes. Our novel code design allows for both efficient demixing and parameter estimation. The algorithm achieves the order-optimal sample and time complexities of $\Theta(K)$ in the noiseless setting, and near-optimal $\Theta(K\text{polylog}(n))$ complexities in the noisy setting. In one of our experiments, to recover a mixture of two regressions with dimension $n=500$ and sparsity $K=50$, our algorithm is more than $300$ times faster than EM algorithm, with about $ 1/3 $ of its sample cost.
- Process modeling and understanding is fundamental for advanced human-computer interfaces and automation systems. Recent research focused on activity recognition, but little work has focused on process progress detection from sensor data. We introduce a real-time, sensor-based system for modeling, recognizing and estimating the completeness of a process. We implemented a multimodal CNN-LSTM structure to extract the spatio-temporal features from different sensory datatypes. We used a novel deep regression structure for overall completeness estimation. By combining process completeness estimation with a Gaussian mixture model, our system can predict the process phase using the estimated completeness. We also introduce the rectified hyperbolic tangent (rtanh) activation function and conditional loss to help the training process. Using the completeness estimation result and performance speed calculations, we also implemented an online estimator of remaining time. We tested this system using data obtained from a medical process (trauma resuscitation) and sport events (swim competition). Our system outperformed existing implementations for phase prediction during trauma resuscitation and achieved over 80% of process phase detection accuracy with less than 9% completeness estimation error and time remaining estimation error less than 18% of duration in both dataset.
- Mar 01 2017 cs.CV arXiv:1702.08606v1While modern imaging technologies such as fMRI have opened exciting new possibilities for studying the brain in vivo, histological sections remain the best way to study the anatomy of the brain at the level of single neurons. The histological atlas changed little since 1909 and localizing brain regions is a still a labor intensive process performed only by experienced neuro-anatomists. Existing digital atlases such as the Allen Brain atlas are limited to low resolution images which cannot identify the detailed structure of the neurons. We have developed a digital atlas methodology that combines information about the 3D organization of the brain and the detailed texture of neurons in different structures. Using the methodology we developed an atlas for the mouse brainstem and mid-brain, two regions for which there are currently no good atlases. Our atlas is "active" in that it can be used to automatically align a histological stack to the atlas, thus reducing the work of the neuroanatomist.
- Feb 28 2017 cs.SI arXiv:1702.08097v1Selfies have become increasingly fashionable in the social media era. People are willing to share their selfies in various social media platforms such as Facebook, Instagram and Flicker. The popularity of selfie have caught researchers' attention, especially psychologists. In computer vision and machine learning areas, little attention has been paid to this phenomenon as a valuable data source. In this paper, we focus on exploring the deeper personal patterns behind people's different kinds of selfie-posting behaviours. We develop this work based on a dataset of WeChat, one of the most extensively used instant messaging platform in China. In particular, we first propose an unsupervised approach to classify the images posted by users. Based on the classification result, we construct three types of user-level features that reflect user preference, activity and posting habit. Based on these features, for a series of selfie related tasks, we build classifiers that can accurately predict two sets of users with opposite selfie-posting behaviours. We have found that people's interest, activity and posting habit have a great influence on their selfie-posting behaviours. For example, the classification accuracy between selfie-posting addict and nonaddict reaches 89.36%. We also prove that using user's image information to predict these behaviours achieve better performance than using text information. More importantly, for each set of users with a specific selfie-posting behaviour, we extract and visualize significant personal patterns about them. In addition, we cluster users and extract their high-level attributes, revealing the correlation between these attributes and users' selfie-posting behaviours. In the end, we demonstrate that users' selfie-posting behaviour, as a good predictor, could predict their different preferences toward these high-level attributes accurately.
- Feb 28 2017 cs.CV arXiv:1702.08318v1A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically.
- Feb 27 2017 cs.CV arXiv:1702.07472v1Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. However, as the TNRD model is a local model, the diffusion behavior of which is purely controlled by information of local patches, it is prone to create artifacts in the homogenous regions and over-smooth highly textured regions, especially in the case of strong noise levels. Meanwhile, it is widely known that the non-local self-similarity (NSS) prior stands as an effective image prior for image denoising, which has been widely exploited in many non-local methods. In this work, we are highly motivated to embed the NSS prior into the TNRD model to tackle its weaknesses. In order to preserve the expected property that end-to-end training is available, we exploit the NSS prior by a set of non-local filters, and derive our proposed trainable non-local reaction diffusion (TNLRD) model for image denoising. Together with the local filters and influence functions, the non-local filters are learned by employing loss-specific training. The experimental results show that the trained TNLRD model produces visually plausible recovered images with more textures and less artifacts, compared to its local versions. Moreover, the trained TNLRD model can achieve strongly competitive performance to recent state-of-the-art image denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM).
- Feb 27 2017 cs.CV arXiv:1702.07482v1Speckle reduction is a prerequisite for many image processing tasks in synthetic aperture radar (SAR) images, as well as all coherent images. In recent years, predominant state-of-the-art approaches for despeckling are usually based on nonlocal methods which mainly concentrate on achieving utmost image restoration quality, with relatively low computational efficiency. Therefore, in this study we aim to propose an efficient despeckling model with both high computational efficiency and high recovery quality. To this end, we exploit a newly-developed trainable nonlinear reaction diffusion(TNRD) framework which has proven a simple and effective model for various image restoration problems. In the original TNRD applications, the diffusion network is usually derived based on the direct gradient descent scheme. However, this approach will encounter some problem for the task of multiplicative noise reduction exploited in this study. To solve this problem, we employed a new architecture derived from the proximal gradient descent method. Taking into account the speckle noise statistics, the diffusion process for the despeckling task is derived. We then retrain all the model parameters in the presence of speckle noise. Finally, optimized nonlinear diffusion filtering models are obtained, which are specialized for despeckling with various noise levels. Experimental results substantiate that the trained filtering models provide comparable or even better results than state-of-the-art nonlocal approaches. Meanwhile, our proposed model merely contains convolution of linear filters with an image, which offers high level parallelism on GPUs. As a consequence, for images of size $512 \times 512$, our GPU implementation takes less than 0.1 seconds to produce state-of-the-art despeckling performance.
- Feb 23 2017 cs.CV arXiv:1702.06767v1In this paper, we propose a new simple and learning-free deep learning network named MomentsNet, whose convolution layer, nonlinear processing layer and pooling layer are constructed by Moments kernels, binary hashing and block-wise histogram, respectively. Twelve typical moments (including geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to construct the MomentsNet whose recognition performance for binary image is studied. The results reveal that MomentsNet has better recognition performance than its corresponding moments in almost all cases and ZernikeNet achieves the best recognition performance among MomentsNet constructed by twelve moments. ZernikeNet also shows better recognition performance on binary image database than that of PCANet, which is a learning-based deep learning network.
- Feb 22 2017 cs.DS arXiv:1702.06256v1The \em maximum duo-preservation string mapping (\sc Max-Duo) problem is the complement of the well studied \em minimum common string partition (\sc MCSP) problem, both of which have applications in many fields including text compression and bioinformatics. $k$-\sc Max-Duo is the restricted version of \sc Max-Duo, where every letter of the alphabet occurs at most $k$ times in each of the strings, which is readily reduced into the well known \em maximum independent set (\sc MIS) problem on a graph of maximum degree $\Delta \le 6(k-1)$. In particular, $2$-\sc Max-Duo can then be approximated arbitrarily close to $1.8$ using the state-of-the-art approximation algorithm for the \sc MIS problem. In this paper, we present a vertex-degree reduction technique and, based on which, we show that $2$-\sc Max-Duo can be approximated arbitrarily close to $1.4$.
- In this paper we define the overflow problem of a network coding storage system in which the encoding parameter and the storage parameter are mismatched. Through analyses and experiments, we first show the impacts of the overflow problem in a network coding scheme, which not only waste storage spaces, but also degrade coding efficiency. To avoid the overflow problem, we then develop the network coding based secure storage (NCSS) scheme. Thanks to considering both security and storage requirements in encoding procedures and distributed architectures, the NCSS can improve the performance of a cloud storage system from both the aspects of storage cost and coding processing time. We analyze the maximum allowable stored encoded data under the perfect secrecy criterion, and provide the design guidelines for the secure cloud storage system to enhance coding efficiency and achieve the minimal storage cost.
- Feb 16 2017 cs.LO arXiv:1702.04478v1The output of an automated theorem prover is usually presented by using a text format, they are often too heavy to be understood. In model checking setting, it would be helpful if one can observe the structure of models and the verification procedures. A 3D visualization tool (\textsfVMDV) is proposed in this paper to address these problems. The facility of \vmdv is illustrated by applying it to a proof systems.
- Achieving high spectral efficiency in realistic massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires computationally-complex algorithms for data detection in the uplink (users transmit to base station) and beamforming in the downlink (base station transmits to users). Most existing algorithms are designed to be executed on centralized computing hardware at the base station (BS), which both results in prohibitive complexity for systems with hundreds or thousands of antennas and generates raw baseband data rates that exceed the limits of current interconnect technology and chip I/O interfaces. This paper proposes a novel decentralized baseband processing architecture that alleviates these bottlenecks by partitioning the BS antenna array into clusters, each associated with independent radio-frequency chains, analog and digital modulation circuitry, and computing hardware. For this architecture, we develop novel decentralized data detection and beamforming algorithms that only access local channel-state information and require low communication bandwidth among the clusters. We study the associated trade-offs between error-rate performance, computational complexity, and interconnect bandwidth, and we demonstrate the scalability of our solutions for massive MU-MIMO systems with thousands of BS antennas using reference implementations on a graphic processing unit (GPU) cluster.
- This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue, as benefiting from two innovations. On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. A well-proven measure is employed to divide the weights in each layer of a pre-trained CNN model into two disjoint groups. The weights in the first group are responsible to form a low-precision base, thus they are quantized by a variable-length encoding method. The weights in the other group are responsible to compensate for the accuracy loss from the quantization, thus they are the ones to be re-trained. On the other hand, these three operations are repeated on the latest re-trained group in an iterative manner until all the weights are converted into low-precision ones, acting as an incremental network quantization and accuracy enhancement procedure. Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed method. Specifically, at 5-bit quantization, our models have improved accuracy than the 32-bit floating-point references. Taking ResNet-18 as an example, we further show that our quantized models with 4-bit, 3-bit and 2-bit ternary weights have improved or very similar accuracy against its 32-bit floating-point baseline. Besides, impressive results with the combination of network pruning and INQ are also reported. The code will be made publicly available.
- Spark is a new promising platform for scalable data-parallel computation. It provides several high-level application programming interfaces (APIs) to perform parallel data aggregation. Since execution of parallel aggregation in Spark is inherently non-deterministic, a natural requirement for Spark programs is to give the same result for any execution on the same data set. We present PureSpark, an executable formal Haskell specification for Spark aggregate combinators. Our specification allows us to deduce the precise condition for deterministic outcomes from Spark aggregation. We report case studies analyzing deterministic outcomes and correctness of Spark programs.
- We study the \em maximum duo-preservation string mapping (\sc Max-Duo) problem, which is the complement of the well studied \em minimum common string partition (\sc MCSP) problem. Both problems have applications in many fields including text compression and bioinformatics. Motivated by an earlier local search algorithm, we present an improved approximation and show that its performance ratio is no greater than ${35}/{12} < 2.917$. This beats the current best $3.25$-approximation for \sc Max-Duo. The performance analysis of our algorithm is done through a complex yet interesting amortization. Two lower bounds on the locality gap of our algorithm are also provided.
- Feb 07 2017 cs.IR arXiv:1702.01516v1Top-$N$ recommender systems typically utilize side information to address the problem of data sparsity. As nowadays side information is growing towards high dimensionality, the performances of existing methods deteriorate in terms of both effectiveness and efficiency, which imposes a severe technical challenge. In order to take advantage of high-dimensional side information, we propose in this paper an embedded feature selection method to facilitate top-$N$ recommendation. In particular, we propose to learn feature weights of side information, where zero-valued features are naturally filtered out. We also introduce non-negativity and sparsity to the feature weights, to facilitate feature selection and encourage low-rank structure. Two optimization problems are accordingly put forward, respectively, where the feature selection is tightly or loosely coupled with the learning procedure. Augmented Lagrange Multiplier and Alternating Direction Method are applied to efficiently solve the problems. Experiment results demonstrate the superior recommendation quality of the proposed algorithm to that of the state-of-the-art alternatives.
- Feb 03 2017 cs.DB arXiv:1702.00567v1Data fusion has played an important role in data mining because high-quality data is required in a lot of applications. As on-line data may be out-of-date and errors in the data may propagate with copying and referring between sources, it is hard to achieve satisfying results with merely applying existing data fusion methods to fuse Web data. In this paper, we make use of the crowd to achieve high-quality data fusion result. We design a framework selecting a set of tasks to ask crowds in order to improve the confidence of data. Since data are correlated and crowds may provide incorrect answers, how to select a proper set of tasks to ask the crowd is a very challenging problem. In this paper, we design an approximation solution to address this challenge since we prove that the problem is at NP-hard. To further improve the efficiency, we design a pruning strategy and a preprocessing method, which effectively improve the performance of the proposed approximation solution. Furthermore, we find that under certain scenarios, we are not interested in all the facts, but only a specific set of facts. Thus, for these specific scenarios, we also develop another approximation solution which is much faster than the general approximation solution. We verify the solutions with extensive experiments on a real crowdsourcing platform.
- Feb 03 2017 cs.CV arXiv:1702.00503v1Photo composition is an important factor affecting the aesthetics in photography. However, it is a highly challenging task to model the aesthetic properties of good compositions due to the lack of globally applicable rules to the wide variety of photographic styles. Inspired by the thinking process of photo taking, we treat the photo composition problem as a view finding process which successively examines pairs of views and determines the aesthetic preference. Without devising complex hand-crafted features, the ranking model is built upon a deep convolutional neural network through joint representation learning from raw pixels. Exploiting rich professional photographs on the web as data source, we devise a nearly unsupervised approach to generate unlimited high quality image pairs for training the network. The resulting ranking model is generic and without any heuristics. The experimental results show that the proposed view finding network achieves state-of-the-art performance with simple sliding window search strategy on two image cropping datasets.
- Feb 02 2017 cs.CV arXiv:1702.00098v1Hyperspectral image (HSI) denoising has been attracting much research attention in remote sensing area due to its importance in improving the HSI qualities. The existing HSI denoising methods mainly focus on specific spectral and spatial prior knowledge in HSIs, and share a common underlying assumption that the embedded noise in HSI is independent and identically distributed (i.i.d.). In real scenarios, however, the noise existed in a natural HSI is always with much more complicated non-i.i.d. statistical structures and the under-estimation to this noise complexity often tends to evidently degenerate the robustness of current methods. To alleviate this issue, this paper attempts the first effort to model the HSI noise using a non-i.i.d. mixture of Gaussians (NMoG) noise assumption, which is finely in accordance with the noise characteristics possessed by a natural HSI and thus is capable of adapting various noise shapes encountered in real applications. Then we integrate such noise modeling strategy into the low-rank matrix factorization (LRMF) model and propose a NMoG-LRMF model in the Bayesian framework. A variational Bayes algorithm is designed to infer the posterior of the proposed model. All involved parameters can be recursively updated in closed-form. Compared with the current techniques, the proposed method performs more robust beyond the state-of-the-arts, as substantiated by our experiments implemented on synthetic and real noisy HSIs.
- Feb 02 2017 cs.CV arXiv:1702.00158v1The design, analysis and application of a volumetric convolutional neural network (VCNN) are studied in this work. Although many CNNs have been proposed in the literature, their design is empirical. In the design of the VCNN, we propose a feed-forward K-means clustering algorithm to determine the filter number and size at each convolutional layer systematically. For the analysis of the VCNN, the cause of confusing classes in the output of the VCNN is explained by analyzing the relationship between the filter weights (also known as anchor vectors) from the last fully-connected layer to the output. Furthermore, a hierarchical clustering method followed by a random forest classification method is proposed to boost the classification performance among confusing classes. For the application of the VCNN, we examine the 3D shape classification problem and conduct experiments on a popular ModelNet40 dataset. The proposed VCNN offers the state-of-the-art performance among all volume-based CNN methods.
- Feb 01 2017 cs.CV arXiv:1701.08869v1A novel solution for the content-based 3D shape retrieval problem using an unsupervised clustering approach, which does not need any label information of 3D shapes, is presented in this work. The proposed shape retrieval system consists of two modules in cascade: the irrelevance filtering (IF) module and the similarity ranking (SR) module. The IF module attempts to cluster gallery shapes that are similar to each other by examining global and local features simultaneously. However, shapes that are close in the local feature space can be distant in the global feature space, and vice versa. To resolve this issue, we propose a joint cost function that strikes a balance between two distances. Irrelevant samples that are close in the local feature space but distant in the global feature space can be removed in this stage. The remaining gallery samples are ranked in the SR module using the local feature. The superior performance of the proposed IF/SR method is demonstrated by extensive experiments conducted on the popular SHREC12 dataset.
- We derive inner and outer bounds on the capacity region for a class of three-user partially connected interference channels. We focus on the impact of topology, interference alignment, and interplay between interference and noise. The representative channels we consider are the ones that have clear interference alignment gain. For these channels, Z-channel type outer bounds are tight to within a constant gap from capacity. We present near-optimal achievable schemes based on rate-splitting and lattice alignment.
- Taking into account of both the huge computing power of intruders and untrusted cloud servers, we develop an enhanced secure pseudonym scheme to protect the privacy of mobile cloud data. To face the huge computing power challenge, we develop an unconditionally secure lightweight network coding pseudonym scheme. For the privacy issue of untrusted cloud server, we further design a two tier network coding to decouple the stored mobile cloud data from the owner pseudonyms. Therefore, our proposed network coding based pseudonym scheme can simultaneously defend against attackers from both outside and inside. We implement our proposed two-tier light-weight network coding mechanism in a group location based service (LBS) using untrusted cloud database. Compared to computationally secure Hash-based pseudonym, our proposed scheme is not only unconditionally secure, but also can reduce more than 90 percent of processing time as well as 10 percent of energy consumption.
- Jan 25 2017 cs.CV arXiv:1701.06772v1Learning rich and diverse feature representation are always desired for deep convolutional neural networks (CNNs). Besides, when auxiliary annotations are available for specific data, simply ignoring them would be a great waste. In this paper, we incorporate these auxiliary annotations as privileged information and propose a novel CNN model that is able to maximize inherent diversity of a CNN model such that the model can learn better feature representation with a stronger generalization ability. More specifically, we propose a group orthogonal convolutional neural network (GoCNN) to learn features from foreground and background in an orthogonal way by exploiting privileged information for optimization, which automatically emphasizes feature diversity within a single model. Experiments on two benchmark datasets, ImageNet and PASCAL VOC, well demonstrate the effectiveness and high generalization ability of our proposed GoCNN models.
- Jan 24 2017 cs.CE arXiv:1701.06254v1This paper presents our work on developing parallel computational methods for two-phase flow on modern parallel computers, where techniques for linear solvers and nonlinear methods are studied and the standard and inexact Newton methods are investigated. A multi-stage preconditioner for two-phase flow is applied and advanced matrix processing strategies are studied. A local reordering method is developed to speed the solution of linear systems. Numerical experiments show that these computational methods are effective and scalable, and are capable of computing large-scale reservoir simulation problems using thousands of CPU cores on parallel computers. The nonlinear techniques, preconditioner and matrix processing strategies can also be applied to three-phase black oil, compositional and thermal models.
- Jan 24 2017 cs.SC arXiv:1701.06248v1In this paper, we give decision criteria for normal binomial difference polynomial ideals in the univariate difference polynomial ring Fy to have finite difference Groebner bases and an algorithm to compute the finite difference Groebner bases if these criteria are satisfied. The novelty of these criteria lies in the fact that complicated properties about difference polynomial ideals are reduced to elementary properties of univariate polynomials in Z[x].
- This paper studies the problem of secure communication over a K-transmitter multiple access channel in the presence of an external eavesdropper, subject to a joint secrecy constraint (i.e., information leakage rate from the collection of K messages to an eavesdropper is made vanishing). As a result, we establish the joint secrecy achievable rate region. To this end, our results build upon two techniques in addition to the standard information-theoretic methods. The first is a generalization of Chia-El Gamal's lemma on entropy bound for a set of codewords given partial information. The second is to utilize a compact representation of a list of sets that, together with properties of mutual information, leads to an efficient Fourier-Motzkin elimination. These two approaches could also be of independent interests in other contexts.
- One of key 5G scenarios is that device-to-device (D2D) and massive multiple-input multiple-output (MIMO) will be co-existed. However, interference in the uplink D2D underlaid massive MIMO cellular networks needs to be coordinated, due to the vast cellular and D2D transmissions. To this end, this paper introduces a spatially dynamic power control solution for mitigating the cellular-to-D2D and D2D-to-cellular interference. In particular, the proposed D2D power control policy is rather flexible including the special cases of no D2D links or using maximum transmit power. Under the considered power control, an analytical approach is developed to evaluate the spectral efficiency (SE) and energy efficiency (EE) in such networks. Thus, the exact expressions of SE for a cellular user or D2D transmitter are derived, which quantify the impacts of key system parameters such as massive MIMO antennas and D2D density. Moreover, the D2D scale properties are obtained, which provide the sufficient conditions for achieving the anticipated SE. Numerical results corroborate our analysis and show that the proposed power control solution can efficiently mitigate interference between the cellular and D2D tier. The results demonstrate that there exists the optimal D2D density for maximizing the area SE of D2D tier. In addition, the achievable EE of a cellular user can be comparable to that of a D2D user.
- Jan 09 2017 cs.CV arXiv:1701.01480v1Automatic photo cropping is an important tool for improving visual quality of digital photos without resorting to tedious manual selection. Traditionally, photo cropping is accomplished by determining the best proposal window through visual quality assessment or saliency detection. In essence, the performance of an image cropper highly depends on the ability to correctly rank a number of visually similar proposal windows. Despite the ranking nature of automatic photo cropping, little attention has been paid to learning-to-rank algorithms in tackling such a problem. In this work, we conduct an extensive study on traditional approaches as well as ranking-based croppers trained on various image features. In addition, a new dataset consisting of high quality cropping and pairwise ranking annotations is presented to evaluate the performance of various baselines. The experimental results on the new dataset provide useful insights into the design of better photo cropping algorithms.
- Sparse code multiple access (SCMA) scheme is considered to be one promising non-orthogonal multiple access technology for the future fifth generation (5G) communications. Due to the sparse nature, message passing algorithm (MPA) has been used as the receiver to achieve close to maximum likelihood (ML) detection performance with much lower complexity. However, the complexity order of MPA is still exponential with the size of codebook and the degree of signal superposition on a given resource element. In this paper, we propose a novel low complexity iterative receiver based on expectation propagation algorithm (EPA), which reduces the complexity order from exponential to linear. Simulation results demonstrate that the proposed EPA receiver achieves nearly the same block error rate (BLER) performance as the conventional message passing algorithm (MPA) receiver with orders less complexity.
- Energy internet is one of the most promising future energy infrastructures which could enhance energy efficiency and improve system operating flexibility. In analog to the micro-grid, micro energy internet puts more emphasis on distribution level and consumer side. The concept and design principles of smart micro energy internet are proposed to accommodate micro-grids, distributed poly-generation systems, energy storage facilities, and related energy distribution infrastructures. The dispatch and control system of the smart micro energy internet should be responsive to external disturbance and able to approach a satisfactory operating point which compromises multiple criteria, such as safety, economy, and environment protection. To realize the vision of the smart micro energy internet, engineering game theory based dispatch and control strategies are investigated. Based on the proposed architecture and energy management system, a prototype of the first domestic solar based smart micro energy internet is initially established in Qinghai University.
- Dec 23 2016 cs.CV arXiv:1612.07625v2Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).
- Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotated data. Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge. Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues. Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator. Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online. To ease empirical algorithmic comparisons in dialogues, this paper introduces a new, publicly available simulation framework, where our simulator, designed for the movie-booking domain, leverages both rules and collected data. The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demonstrate several agents and detail the procedure to add and test your own agent in the proposed framework.
- Dec 16 2016 cs.SD arXiv:1612.04919v1Some glottal analysis approaches based upon linear prediction or complex cepstrum approaches have been proved to be effective to estimate glottal source from real speech utterances. We propose a new approach employing both an all-pole odd-order linear prediction to provide a coarse estimation and phase decomposition based causality/anti-causality separation to generate further refinements. The obtained measures show that this method improved performance in terms of reducing source-filter separation in estimation of glottal flow pulses (GFP). No glottal model fitting is required by this method, thus it has wide and flexible adaptation to retain fidelity of speakers's vocal features with computationally affordable resource. The method is evaluated on real speech utterances to validate it.
- Recently, a novel family of biologically plausible online algorithms for reducing the dimensionality of streaming data has been derived from the similarity matching principle. In these algorithms, the number of output dimensions can be determined adaptively by thresholding the singular values of the input data matrix. However, setting such threshold requires knowing the magnitude of the desired singular values in advance. Here we propose online algorithms where the threshold is self-calibrating based on the singular values computed from the existing observations. To derive these algorithms from the similarity matching cost function we propose novel regularizers. As before, these online algorithms can be implemented by Hebbian/anti-Hebbian neural networks in which the learning rule depends on the chosen regularizer. We demonstrate both mathematically and via simulation the effectiveness of these online algorithms in various settings.
- We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absolute-$\epsilon$-optimal policy, with high probability, using $\mathcal{O}\left(\frac{|\mathcal{S}|^4 |\mathcal{A}|^2\sigma^2 }{(1-\gamma)^6\epsilon^2} \right)$ iterations/samples for the infinite-horizon discounted-reward MDP and $\mathcal{O}\left(\frac{|\mathcal{S}|^4 |\mathcal{A}|^2H^6\sigma^2 }{\epsilon^2} \right)$ for the finite-horizon MDP.
- Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance. Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.
- Dec 02 2016 cs.CV arXiv:1612.00119v2In this work, we address the challenging video scene parsing problem by developing effective representation learning methods given limited parsing annotations. In particular, we contribute two novel methods that constitute a unified parsing framework. (1) \textbfPredictive feature learning from nearly unlimited unlabeled video data. Different from existing methods learning features from single frame parsing, we learn spatiotemporal discriminative features by enforcing a parsing network to predict future frames and their parsing maps (if available) given only historical frames. In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations. (2) \textbfPrediction steering parsing architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video scene parsing performance. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our methods by showing significant improvement over well-established baselines.
- We introduce the problem of transporting vector-valued distributions. In this, a salient feature is that mass may flow between vectorial entries as well as across space (discrete or continuous). The theory relies on a first step taken to define an appropriate notion of optimal transport on a graph. The corresponding distance between distributions is readily computable via convex optimization and provides a suitable generalization of Wasserstein-type metrics. Building on this, we define Wasserstein-type metrics on vector-valued distributions supported on continuous spaces as well as graphs. Motivation for developing vector-valued mass transport is provided by applications such as multi-color image processing, polarimetric radar, as well as network problems where resources may be vectorial.
- Nov 29 2016 cs.GT arXiv:1611.09219v2Peer prediction mechanisms are often adopted to elicit truthful contributions from crowd workers when no ground-truth verification is available. Recently, mechanisms of this type have been developed to incentivize effort exertion, in addition to truthful elicitation. In this paper, we study a sequential peer prediction problem where a data requester wants to dynamically determine the reward level to optimize the trade-off between the quality of information elicited from workers and the total expected payment. In this problem, workers have homogeneous expertise and heterogeneous cost for exerting effort, both unknown to the requester. We propose a sequential posted-price mechanism to dynamically learn the optimal reward level from workers' contributions and to incentivize effort exertion and truthful reporting. We show that (1) in our mechanism, workers exerting effort according to a non-degenerate threshold policy and then reporting truthfully is an equilibrium that returns highest utility for each worker, and (2) The regret of our learning mechanism w.r.t. offering the optimal reward (price) is upper bounded by $\tilde{O}(T^{3/4})$ where $T$ is the learning horizon. We further show the power of our learning approach when the reports of workers do not necessarily follow the game-theoretic equilibrium.
- Sparse code multiple access (SCMA) is a new multiple access technique which supports massive connectivity. Compared with the current Long Term Evolution (LTE) system, it enables the overloading of active users on limited orthogonal resources and thus meets the requirement of the fifth generation (5G) wireless networks. However, the computation complexity of existing detection algorithms increases exponentially with $d_f$ (the degree of the resource nodes). Although the codebooks are designed to have low density, the detection still takes considerable time. The parameter $d_f$ must be designed to be very small, which largely limits the choice of codebooks. In this paper, a new detection algorithm is proposed by discretizing the probability distribution functions (PDFs) in the layer nodes (variable nodes). Given $M$ as the size of one codebook, the detection complexity of each resource node (function node) is reduced from $O(d_f M^{d_f})$ to $O(d_f^3 \ln (d_f))$. Its detection accuracy can quickly approach that of the previous detection algorithms with the decrease of sampling interval in discretization.
- Nov 24 2016 cs.LG arXiv:1611.07659v2The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.
- Nov 22 2016 cs.CL arXiv:1611.06722v1Transliterations play an important role in multilingual entity reference resolution, because proper names increasingly travel between languages in news and social media. Previous work associated with machine translation targets transliteration only single between language pairs, focuses on specific classes of entities (such as cities and celebrities) and relies on manual curation, which limits the expression power of transliteration in multilingual environment. By contrast, we present an unsupervised transliteration model covering 69 major languages that can generate good transliterations for arbitrary strings between any language pair. Our model yields top-(1, 20, 100) averages of (32.85%, 60.44%, 83.20%) in matching gold standard transliteration compared to results from a recently-published system of (26.71%, 50.27%, 72.79%). We also show the quality of our model in detecting true and false friends from Wikipedia high frequency lexicons. Our method indicates a strong signal of pronunciation similarity and boosts the probability of finding true friends in 68 out of 69 languages.
- Nov 17 2016 cs.CV arXiv:1611.05128v2Deep convolutional neural networks (CNNs) are indispensable to state-of-the-art computer vision algorithms. However, they are still rarely deployed on battery-powered mobile devices, such as smartphones and wearable gadgets, where vision algorithms can enable many revolutionary real-world applications. The key limiting factor is the high energy consumption of CNN processing due to its high computational complexity. While there are many previous efforts that try to reduce the CNN model size or amount of computation, we find that they do not necessarily result in lower energy consumption, and therefore do not serve as a good metric for energy cost estimation. To close the gap between CNN design and energy consumption optimization, we propose an energy-aware pruning algorithm for CNNs that directly uses energy consumption estimation of a CNN to guide the pruning process. The energy estimation methodology uses parameters extrapolated from actual hardware measurements that target realistic battery-powered system setups. The proposed layer-by-layer pruning algorithm also prunes more aggressively than previously proposed pruning methods by minimizing the error in output feature maps instead of filter weights. For each layer, the weights are first pruned and then locally fine-tuned with a closed-form least-square solution to quickly restore the accuracy. After all layers are pruned, the entire network is further globally fine-tuned using back-propagation. With the proposed pruning method, the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss. Finally, we show that pruning the AlexNet with a reduced number of target classes can greatly decrease the number of weights but the energy reduction is limited.
- We present a learning to learn approach for training recurrent neural networks to perform black-box global optimization. In the meta-learning phase we use a large set of smooth target functions to learn a recurrent neural network (RNN) optimizer, which is either a long-short term memory network or a differentiable neural computer. After learning, the RNN can be applied to learn policies in reinforcement learning, as well as other black-box learning tasks, including continuous correlated bandits and experimental design. We compare this approach to Bayesian optimization, with emphasis on the issues of computation speed, horizon length, and exploration-exploitation trade-offs.
- We have fabricated and successfully tested an analog vector-by-matrix multiplier, based on redesigned 10x12 arrays of 55 nm commercial NOR flash memory cells. The modified arrays enable high-precision individual analog tuning of each cell, with sub-1% accuracy, while keeping the highly optimized cells, with their long-term state retention, intact. The array has an area of 0.33 um^2 per cell, and is at least one order of magnitude more dense than the reported prior implementations of nonvolatile analog memories. The demonstrated vector-by-vector multiplier, using gate coupling to additional periphery cells, has ~2% precision, limited by the aggregate effect of cell noise, retention, mismatch, process variations, tuning precision, and capacitive crosstalk. A differential version of the multiplier has allowed us to demonstrate sub-3% temperature drift of the output signal in the range between 25C and 85C.
- The hard-core model has attracted much attention across several disciplines, representing lattice gases in statistical physics and independent sets in discrete mathematics and computer science. On finite graphs, we are given a parameter $\lambda$, and an independent set $I$ arises with probability proportional to $\lambda^{|I|}$. On infinite graphs a Gibbs measure is defined as a suitable limit with the correct conditional probabilities, and we are interested in determining when this limit is unique and when there is phase coexistence, i.e., existence of multiple Gibbs measures. It has long been conjectured that on ${\mathbb Z}^2$ this model has a critical value $\lambda_c \approx 3.796$ with the property that if $\lambda < \lambda_c$ then it exhibits uniqueness of phase, while if $\lambda > \lambda_c$ then there is phase coexistence. Much of the work to date on this problem has focused on the regime of uniqueness, with the state of the art being recent work of Sinclair, Srivastava, Štefankovič and Yin showing that there is a unique Gibbs measure for all $\lambda < 2.538$. Here we give the first non-trivial result in the other direction, showing that there are multiple Gibbs measures for all $\lambda > 5.3506$. There is some potential for lowering this bound, but with the methods we are using we cannot hope to replace $5.3506$ with anything below about $4.8771$. Our proof begins along the lines of the standard Peierls argument, but we add two innovations. First, following ideas of Kotecký and Randall, we construct an event that distinguishes two boundary conditions and always has long contours associated with it, obviating the need to accurately enumerate short contours. Second, we obtain improved bounds on the number of contours by relating them to a new class of self-avoiding walks on an oriented version of ${\mathbb Z}^2$.
- A main focus of machine learning research has been improving the generalization accuracy and efficiency of prediction models. Many models such as SVM, random forest, and deep neural nets have been proposed and achieved great success. However, what emerges as missing in many applications is actionability, i.e., the ability to turn prediction results into actions. For example, in applications such as customer relationship management, clinical prediction, and advertisement, the users need not only accurate prediction, but also actionable instructions which can transfer an input to a desirable goal (e.g., higher profit repays, lower morbidity rates, higher ads hit rates). Existing effort in deriving such actionable knowledge is few and limited to simple action models which restricted to only change one attribute for each action. The dilemma is that in many real applications those action models are often more complex and harder to extract an optimal solution. In this paper, we propose a novel approach that achieves actionability by combining learning with planning, two core areas of AI. In particular, we propose a framework to extract actionable knowledge from random forest, one of the most widely used and best off-the-shelf classifiers. We formulate the actionability problem to a sub-optimal action planning (SOAP) problem, which is to find a plan to alter certain features of a given input so that the random forest would yield a desirable output, while minimizing the total costs of actions. Technically, the SOAP problem is formulated in the SAS+ planning formalism, and solved using a Max-SAT based approach. Our experimental results demonstrate the effectiveness and efficiency of the proposed approach on a personal credit dataset and other benchmarks. Our work represents a new application of automated planning on an emerging and challenging machine learning paradigm.
- Nov 01 2016 cs.AR arXiv:1610.09761v1Compared to conventional general-purpose processors, accelerator-rich architectures (ARAs) can provide orders-of-magnitude performance and energy gains and are emerging as one of the most promising solutions in the age of dark silicon. However, many design issues related to the complex interaction between general-purpose cores, accelerators, customized on-chip interconnects, and memory systems remain unclear and difficult to evaluate. In this paper we design and implement the ARAPrototyper to enable rapid design space explorations for ARAs in real silicons and reduce the tedious prototyping efforts far down to manageable efforts. First, ARAPrototyper provides a reusable baseline prototype with a highly customizable memory system, including interconnect between accelerators and buffers, interconnect between buffers and last-level cache (LLC) or DRAM, coherency choice at LLC or DRAM, and address translation support. Second, ARAPrototyper provides a clean interface to quickly integrate users' own accelerators written in high-level synthesis (HLS) code. The whole design flow is highly automated to generate a prototype of ARA on an FPGA system-on-chip (SoC). Third, to quickly develop applications that run seamlessly on the ARA prototype, ARAPrototyper provides a system software stack, abstracts the accelerators as software libraries, and provides APIs for software developers. Our experimental results demonstrate that ARAPrototyper enables a wide range of design space explorations for ARAs at manageable prototyping efforts, which has 4,000X to 10,000X faster evaluation time than full-system simulations. We believe that ARAPrototyper can be an attractive alternative for ARA design and evaluation.
- Studying metabolic networks is vital for many areas such as novel drugs and bio-fuels. For biologists, a key challenge is that many reactions are impractical or expensive to be found through experiments. Our task is to recover the missing reactions. By exploiting the problem structure, we model reaction recovery as a hyperlink prediction problem, where each reaction is regarded as a hyperlink connecting its participating vertices (metabolites). Different from the traditional link prediction problem where two nodes form a link, a hyperlink can involve an arbitrary number of nodes. Since the cardinality of a hyperlink is variable, existing classifiers based on a fixed number of input features become infeasible. Traditional methods, such as common neighbors and Katz index, are not applicable either, since they are restricted to pairwise similarities. In this paper, we propose a novel hyperlink prediction algorithm, called Matrix Boosting (MATBoost). MATBoost conducts inference jointly in the incidence space and adjacency space by performing an iterative completion-matching optimization. We carry out extensive experiments to show that MATBoost achieves state-of-the-art performance. For a metabolic network with 1805 metabolites and 2583 reactions, our algorithm can successfully recover nearly 200 reactions out of 400 missing reactions.
- Oct 25 2016 cs.FL arXiv:1610.07380v2In this paper, we propose a novel algorithm to learn a Büchi automaton from a teacher who knows an $\omega$-regular language. The algorithm is based on learning a formalism named family of DFAs (FDFAs) recently proposed by Angluin and Fisman[10]. The main catch is that we use a classification tree structure instead of the standard observation table structure. The worst case storage space required by our algorithm is quadratically better than the table-based algorithm proposed in [10]. We implement the first publicly available library ROLL (Regular Omega Language Learning ), which consists of all $\omega$-regular learning algorithms available in the literature and the new algorithms proposed in this paper. Experimental results show that our tree-based algorithms have the best performance among others regarding the number of solved learning tasks.
- Neural networks are among the state-of-the-art techniques for language modeling. Existing neural language models typically map discrete words to distributed, dense vector representations. After information processing of the preceding context words by hidden layers, an output layer estimates the probability of the next word. Such approaches are time- and memory-intensive because of the large numbers of parameters for word embeddings and the output layer. In this paper, we propose to compress neural language models by sparse word representations. In the experiments, the number of parameters in our model increases very slowly with the growth of the vocabulary size, which is almost imperceptible. Moreover, our approach not only reduces the parameter space to a large extent, but also improves the performance in terms of the perplexity measure.
- This paper studies an attacker against a cyber-physical system (CPS) whose goal is to move the state of a CPS to a target state while ensuring that his or her probability of being detected does not exceed a given bound. The attacker's probability of being detected is related to the nonnegative bias induced by his or her attack on the CPS' detection statistic. We formulate a linear quadratic cost function that captures the attacker's control goal and establish constraints on the induced bias that reflect the attacker's detection-avoidance objectives. When the attacker is constrained to be detected at the false-alarm rate of the detector, we show that the optimal attack strategy reduces to a linear feedback of the attacker's state estimate. In the case that the attacker's bias is upper bounded by a positive constant, we provide two algorithms -- an optimal algorithm and a sub-optimal, less computationally intensive algorithm -- to find suitable attack sequences. Finally, we illustrate our attack strategies in numerical examples based on a remotely-controlled helicopter under attack.
- We will demonstrate a conversational products recommendation agent. This system shows how we combine research in personalized recommendation systems with research in dialogue systems to build a virtual sales agent. Based on new deep learning technologies we developed, the virtual agent is capable of learning how to interact with users, how to answer user questions, what is the next question to ask, and what to recommend when chatting with a human user. Normally a descent conversational agent for a particular domain requires tens of thousands of hand labeled conversational data or hand written rules. This is a major barrier when launching a conversation agent for a new domain. We will explore and demonstrate the effectiveness of the learning solution even when there is no hand written rules or hand labeled training data.