- Learning with Errors is one of the fundamental problems in computational learning theory and has in the last years become the cornerstone of post-quantum cryptography. In this work, we study the quantum sample complexity of Learning with Errors and show that there exists an efficient quantum learning algorithm (with polynomial sample and time complexity) for the Learning with Errors problem where the error distribution is the one used in cryptography. While our quantum learning algorithm does not break the LWE-based encryption schemes proposed in the cryptography literature, it does have some interesting implications for cryptography: first, when building an LWE-based scheme, one needs to be careful about the access to the public-key generation algorithm that is given to the adversary; second, our algorithm shows a possible way for attacking LWE-based encryption by using classical samples to approximate the quantum sample state, since then using our quantum learning algorithm would solve LWE.
- We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.
- Motivated by the idea that criticality and universality of phase transitions might play a crucial role in achieving and sustaining learning and intelligent behaviour in biological and artificial networks, we analyse a theoretical and a pragmatic experimental set up for critical phenomena in deep learning. On the theoretical side, we use results from statistical physics to carry out critical point calculations in feed-forward/fully connected networks, while on the experimental side we set out to find traces of criticality in deep neural networks. This is our first step in a series of upcoming investigations to map out the relationship between criticality and learning in deep networks.
- Feb 28 2017 cs.DS arXiv:1702.07961v1Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hierarchies make designing GPU-efficient algorithms challenging. In this work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway mergesort algorithm. MMS employs a new partitioning technique that exposes the parallelism needed by modern GPU architectures. To the best of our knowledge, MMS is the first sorting algorithm for the GPU that is asymptotically optimal in terms of global memory accesses and that is completely free of shared memory bank conflicts. We realize an initial implementation of MMS, evaluate its performance on three modern GPU architectures, and compare it to competitive implementations available in state-of-the- art GPU libraries. Despite these implementations being highly optimized, MMS compares favorably, achieving performance improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art algorithms are susceptible to bank conflicts. We find that for certain inputs that cause these algorithms to incur large numbers of bank conflicts, MMS can achieve a 33.7% performance improvement over its fastest competitor. Overall, even though its current implementation is not fully optimized, due to its efficient use of the memory hierarchy, MMS outperforms the fastest comparison-based sorting implementations available to date.
- Feb 28 2017 cs.CV arXiv:1702.08434v1Skin cancer is one of the major types of cancers and its incidence has been increasing over the past decades. Skin lesions can arise from various dermatologic disorders and can be classified to various types according to their texture, structure, color and other morphological features. The accuracy of diagnosis of skin lesions, specifically the discrimination of benign and malignant lesions, is paramount to ensure appropriate patient treatment. Machine learning-based classification approaches are among popular automatic methods for skin lesion classification. While there are many existing methods, convolutional neural networks (CNN) have shown to be superior over other classical machine learning methods for object detection and classification tasks. In this work, a fully automatic computerized method is proposed, which employs well established pre-trained convolutional neural networks and ensembles learning to classify skin lesions. We trained the networks using 2000 skin lesion images available from the ISIC 2017 challenge, which has three main categories and includes 374 melanoma, 254 seborrheic keratosis and 1372 benign nevi images. The trained classifier was then tested on 150 unlabeled images. The results, evaluated by the challenge organizer and based on the area under the receiver operating characteristic curve (AUC), were 84.8% and 93.6% for Melanoma and seborrheic keratosis binary classification problem, respectively. The proposed method achieved competitive results to experienced dermatologist. Further improvement and optimization of the proposed method with a larger training dataset could lead to a more precise, reliable and robust method for skin lesion classification.
- We introduce a novel approach to training generative adversarial networks, where we train a generator to match a target distribution that converges to the data distribution at the limit of a perfect discriminator. This objective can be interpreted as training a generator to produce samples that lie on the decision boundary of a current discriminator in training at each update, and we call a GAN trained using this algorithm a boundary-seeking GAN (BS-GAN). This approach can be used to train a generator with discrete output when the generator outputs a parametric conditional distribution. We demonstrate the effectiveness of the proposed algorithm with discrete image data. In contrary to the proposed algorithm, we observe that the recently proposed Gumbel-Softmax technique for re-parametrizing the discrete variables does not work for training a GAN with discrete data. Finally, we notice that the proposed boundary-seeking algorithm works even with continuous variables, and demonstrate its effectiveness with two widely used image data sets, SVHN and CelebA.
- Feb 28 2017 cs.CE arXiv:1702.08425v1Extreme-scale computational science increasingly demands multiscale and multiphysics formulations. Combining software developed by independent groups is imperative: no single team has resources for all predictive science and decision support capabilities. Scientific libraries provide high-quality, reusable software components for constructing applications with improved robustness and portability. However, without coordination, many libraries cannot be easily composed. Namespace collisions, inconsistent arguments, lack of third-party software versioning, and additional difficulties make composition costly. The Extreme-scale Scientific Software Development Kit (xSDK) defines community policies to improve code quality and compatibility across independently developed packages (hypre, PETSc, SuperLU, Trilinos, and Alquimia) and provides a foundation for addressing broader issues in software interoperability, performance portability, and sustainability. The xSDK provides turnkey installation of member software and seamless combination of aggregate capabilities, and it marks first steps toward extreme-scale scientific software ecosystems from which future applications can be composed rapidly with assured quality and scalability.
- Feb 28 2017 cs.CV arXiv:1702.08423v1"If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5?" The answer is probably a "No." Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.
- Feb 28 2017 cs.DB arXiv:1702.08409v1We introduce Rabbit, a combinator-based query language. Rabbit is designed to let data analysts and other accidental programmers query complex structured data. We combine the functional data model and the categorical semantics of computations to develop denotational semantics of database queries. In Rabbit, a query is modeled as a Kleisli arrow for a monadic container determined by the query cardinality. In this model, monadic composition can be used to navigate the database, while other query combinators can aggregate, filter, sort and paginate data; construct compound data; connect self-referential data; and reorganize data with grouping and data cube operations. A context-aware query model, with the input context represented as a comonadic container, can express query parameters and window functions. Rabbit semantics enables pipeline notation, encouraging its users to construct database queries as a series of distinct steps, each individually crafted and tested. We believe that Rabbit can serve as a practical tool for data analytics.
- We introduce new families of Integral Probability Metrics (IPM) for training Generative Adversarial Networks (GAN). Our IPMs are based on matching statistics of distributions embedded in a finite dimensional feature space. Mean and covariance feature matching IPMs allow for stable training of GANs, which we will call McGan. McGan minimizes a meaningful loss between distributions.
- Feb 28 2017 cs.DM arXiv:1702.08392v1The runtime performance of modern SAT solvers on random $k$-CNF formulas is deeply connected with the 'phase-transition' phenomenon seen empirically in the satisfiability of random $k$-CNF formulas. Recent universal hashing-based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both $k$-CNF and XOR constraints (known as $k$-CNF-XOR formulas), but the behavior of random $k$-CNF-XOR formulas is unexplored in prior work. In this paper, we present the first study of the satisfiability of random $k$-CNF-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a linear trade-off between $k$-CNF and XOR constraints. Furthermore, we prove that a phase-transition for $k$-CNF-XOR formulas exists for $k = 2$ and (when the number of $k$-CNF constraints is small) for $k > 2$.
- Feb 28 2017 cs.CV arXiv:1702.08379v1Mammography screening for early detection of breast lesions currently suffers from high amounts of false positive findings, which result in unnecessary invasive biopsies. Diffusion-weighted MR images (DWI) can help to reduce many of these false-positive findings prior to biopsy. Current approaches estimate tissue properties by means of quantitative parameters taken from generative, biophysical models fit to the q-space encoded signal under certain assumptions regarding noise and spatial homogeneity. This process is prone to fitting instability and partial information loss due to model simplicity. We reveal previously unexplored potentials of the signal by integrating all data processing components into a convolutional neural network (CNN) architecture that is designed to propagate clinical target information down to the raw input images. This approach enables simultaneous and target-specific optimization of image normalization, signal exploitation, global representation learning and classification. Using a multicentric data set of 222 patients, we demonstrate that our approach significantly improves clinical decision making with respect to the current state of the art.
- Feb 28 2017 cs.LG arXiv:1702.08360v1A critical component to enabling intelligent reasoning in partially observable environments is memory. Despite this importance, Deep Reinforcement Learning (DRL) agents have so far used relatively simple memory architectures, with the main methods to overcome partial observability being either a temporal convolution over the past k frames or an LSTM layer. More recent work (Oh et al., 2016) has went beyond these architectures by using memory networks which can allow more sophisticated addressing schemes over the past k frames. But even these architectures are unsatisfactory due to the reason that they are limited to only remembering information from the last k frames. In this paper, we develop a memory system with an adaptable write operator that is customized to the sorts of 3D environments that DRL agents typically interact with. This architecture, called the Neural Map, uses a spatially structured 2D memory image to learn to store arbitrary information about the environment over long time lags. We demonstrate empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments and show that it is capable of generalizing to environments that were not seen during training.
- We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec [Mikolov, 2013]. These embedding vectors are connected in time through a latent diffusion process. We describe two scalable variational inference algorithms---skip-gram smoothing and skip-gram filtering---that allow us to train the model jointly over all times; thus learning on all data while simultaneously allowing word and context vectors to drift. Experimental results on three different corpora demonstrate that our dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices.
- Feb 28 2017 cs.SY arXiv:1702.08357v1We consider a simple, yet widely studied, set-up in which a Fusion Center (FC) is asked to make a binary decision about a sequence of system states by relying on the possibly corrupted decisions provided by byzantine nodes, i.e. nodes which deliberately alter the result of the local decision to induce an error at the fusion center. When independent states are considered, the optimum fusion rule over a batch of observations has already been derived, however its complexity prevents its use in conjunction with large observation windows. In this paper, we propose a near-optimal algorithm based on message passing that greatly reduces the computational burden of the optimum fusion rule. In addition, the proposed algorithm retains very good performance also in the case of dependent system states. By first focusing on the case of small observation windows, we use numerical simulations to show that the proposed scheme introduces a negligible increase of the decision error probability compared to the optimum fusion rule. We then analyse the performance of the new scheme when the FC make its decision by relying on long observation windows. We do so by considering both the case of independent and Markovian system states and show that the obtained performance are superior to those obtained with prior suboptimal schemes. As an additional result, we confirm the previous finding that, in some cases, it is preferable for the byzantine nodes to minimise the mutual information between the sequence system states and the reports submitted to the FC, rather than always flipping the local decision.
- Data sharing among partners---users, organizations, companies---is crucial for the advancement of data analytics in many domains. Sharing through secure computation and differential privacy allows these partners to perform private computations on their sensitive data in controlled ways. However, in reality, there exist complex relationships among members. Politics, regulations, interest, trust, data demands and needs are one of the many reasons. Thus, there is a need for a mechanism to meet these conflicting relationships on data sharing. This paper presents Curie, an approach to exchange data among members whose membership has complex relationships. The CPL policy language that allows members to define the specifications of data exchange requirements is introduced. Members (partners) assert who and what to exchange through their local policies and negotiate a global sharing agreement. The agreement is implemented in a multi-party computation that guarantees sharing among members will comply with the policy as negotiated. The use of Curie is validated through an example of a health care application built on recently introduced secure multi-party computation and differential privacy frameworks, and policy and performance trade-offs are explored.
- Feb 28 2017 cs.CV arXiv:1702.08319v1Visual relations, such as "person ride bike" and "bike next to car", offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate $\approx$ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-to-end relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors.
- Feb 28 2017 cs.CV arXiv:1702.08318v1A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically.
- Feb 28 2017 cs.CL arXiv:1702.08303v1Multi-task learning (MTL) in deep neural networks for NLP has recently received increasing interest due to some compelling benefits, including its potential to efficiently regularize models and to reduce the need for labeled data. While it has brought significant improvements in a number of NLP tasks, mixed results have been reported, and little is known about the conditions under which MTL leads to gains in NLP. This paper sheds light on the specific task relations that can lead to gains from MTL models over single-task setups.
- Feb 28 2017 cs.LG arXiv:1702.08283v1A number of studies have proposed to use domain adaptation to reduce the training efforts needed to control an upper-limb prosthesis exploiting pre-trained models from prior subjects. These studies generally reported impressive reductions in the required number of training samples to achieve a certain level of accuracy for intact subjects. We further investigate two popular methods in this field to verify whether this result equally applies to amputees. Our findings show instead that this improvement can largely be attributed to a suboptimal hyperparameter configuration. When hyperparameters are appropriately tuned, the standard approach that does not exploit prior information performs on par with the more complicated transfer learning algorithms. Additionally, earlier studies erroneously assumed that the number of training samples relates proportionally to the efforts required from the subject. However, a repetition of a movement is the atomic unit for subjects and the total number of repetitions should therefore be used as reliable measure for training efforts. Also when correcting for this mistake, we do not find any performance increase due to the use of prior models.
- Feb 28 2017 cs.CV arXiv:1702.08272v1We present a new public dataset with a focus on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes. We train a fast object category detector for instance detection on our data. Using the dataset we show that, although increasingly accurate and fast, the state of the art for object detection is still severely impacted by object scale, occlusion, and viewing direction all of which matter for robotics applications. We next validate the dataset for simulating active vision, and use the dataset to develop and evaluate a deep-network-based system for next best move prediction for object classification using reinforcement learning. Our dataset is available for download at cs.unc.edu/~ammirato/active_vision_dataset_website/.
- Ensembling multiple predictions is a widely used technique to improve the accuracy of various machine learning tasks. In image classification tasks, for example, averaging the predictions for multiple patches extracted from the input image significantly improves accuracy. Using multiple networks trained independently to make predictions improves accuracy further. One obvious drawback of the ensembling technique is its higher execution cost during inference. If we average 100 predictions, the execution cost will be 100 times as high as the cost without the ensemble. This higher cost limits the real-world use of ensembling, even though using it is almost the norm to win image classification competitions. In this paper, we describe a new technique called adaptive ensemble prediction, which achieves the benefits of ensembling with much smaller additional execution costs. Our observation behind this technique is that many easy-to-predict inputs do not require ensembling. Hence we calculate the confidence level of the prediction for each input on the basis of the probability of the predicted label, i.e. the outputs from the softmax, during the ensembling computation. If the prediction for an input reaches a high enough probability on the basis of the confidence level, we stop ensembling for this input to avoid wasting computation power. We evaluated the adaptive ensembling by using various datasets and showed that it reduces the computation time significantly while achieving similar accuracy to the naive ensembling.
- Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. A common approach to human intention inference is to model specific trajectories towards known goals with supervised classifiers. However, these approaches do not take possible future movements into account nor do they make use of kinematic cues, such as legible and predictable motion. The bottleneck of these methods is the lack of an accurate model of general human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motions. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.
- We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we characterize the minimax regret up to log factors by proving an upper bound matching a previously known lower bound. In a partial feedback model motivated by second-price auctions, we prove upper bounds for Lipschitz and semi-Lipschitz losses that improve on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound.
- Feb 28 2017 cs.DL arXiv:1702.08210v1This paper describes how semantic indexing can help to generate a contextual overview of topics and visually compare clusters of articles. The method was originally developed for an innovative information exploration tool, called Ariadne, which operates on bibliographic databases with tens of millions of records. In this paper, the method behind Ariadne is further developed and applied to the research question of the special issue "Same data, different results" - the better understanding of topic (re-)construction by different bibliometric approaches. For the case of the Astro dataset of 111,616 articles in astronomy and astrophysics, a new instantiation of the interactive exploring tool, LittleAriadne, has been created. This paper contributes to the overall challenge to delineate and define topics in two different ways. First, we produce two clustering solutions based on vector representations of articles in a lexical space. These vectors are built on semantic indexing of entities associated with those articles. Second, we discuss how LittleAriadne can be used to browse through the network of topical terms, authors, journals, citations and various cluster solutions of the Astro dataset. More specifically, we treat the assignment of an article to the different clustering solutions as an additional element of its bibliographic record. Keeping the principle of semantic indexing on the level of such an extended list of entities of the bibliographic record, LittleAriadne in turn provides a visualization of the context of a specific clustering solution. It also conveys the similarity of article clusters produced by different algorithms, hence representing a complementary approach to other possible means of comparison.
- We introduce DeepNAT, a 3D Deep convolutional neural network for the automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance images. DeepNAT is an end-to-end learning-based approach to brain segmentation that jointly learns an abstract feature representation and a multi-class classification. We propose a 3D patch-based approach, where we do not only predict the center voxel of the patch but also neighbors, which is formulated as multi-task learning. To address a class imbalance problem, we arrange two networks hierarchically, where the first one separates foreground from background, and the second one identifies 25 brain structures on the foreground. Since patches lack spatial context, we augment them with coordinates. To this end, we introduce a novel intrinsic parameterization of the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As network architecture, we use three convolutional layers with pooling, batch normalization, and non-linearities, followed by fully connected layers with dropout. The final segmentation is inferred from the probabilistic output of the network with a 3D fully connected conditional random field, which ensures label agreement between close voxels. The roughly 2.7 million parameters in the network are learned with stochastic gradient descent. Our results show that DeepNAT compares favorably to state-of-the-art methods. Finally, the purely learning-based method may have a high potential for the adaptation to young, old, or diseased brains by fine-tuning the pre-trained network with a small training sample on the target application, where the availability of larger datasets with manual annotations may boost the overall segmentation accuracy in the future.
- Feb 28 2017 cs.CV arXiv:1702.08160v1We propose a novel approach to address the Simultaneous Detection and Segmentation problem. Using hierarchical structures we use an efficient and accurate procedure that exploits the hierarchy feature information using Locality Sensitive Hashing. We build on recent work that utilizes convolutional neural networks to detect bounding boxes in an image (Faster R-CNN) and then use the top similar hierarchical region that best fits each bounding box after hashing, we call this approach HashBox. We then refine our final segmentation results by automatic hierarchy pruning. HashBox introduces a train-free alternative to Hypercolumns. We conduct extensive experiments on Pascal VOC 2012 segmentation dataset, showing that HashBox gives competitive state-of-the-art object segmentations.
- We solve tensor balancing, rescaling an Nth order nonnegative tensor by multiplying (N - 1)th order N tensors so that every fiber sums to one. This generalizes a fundamental process of matrix balancing used to compare matrices in a wide range of applications from biology to economics. We present an efficient balancing algorithm with quadratic convergence using Newton's method and show in numerical experiments that the proposed algorithm is several orders of magnitude faster than existing ones. To theoretically prove the correctness of the algorithm, we model tensors as probability distributions in a statistical manifold and realize tensor balancing as projection onto a submanifold. The key to our algorithm is that the gradient of the manifold, used as a Jacobian matrix in Newton's method, can be analytically obtained using the Möbius inversion formula, the essential of combinatorial mathematics. Our model is not limited to tensor balancing but has a wide applicability as it includes various statistical and machine learning models such as weighted DAGs and Boltzmann machines.
- Social media platforms provide an environment where people can freely engage in discussions. Unfortunately, they also enable several problems, such as online harassment. Recently, Google and Jigsaw started a project called Perspective, which uses machine learning to automatically detect toxic language. A demonstration website has been also launched, which allows anyone to type a phrase in the interface and instantaneously see the toxicity score [1]. In this paper, we propose an attack on the Perspective toxic detection system based on the adversarial examples. We show that an adversary can subtly modify a highly toxic phrase in a way that the system assigns significantly lower toxicity score to it. We apply the attack on the sample phrases provided in the Perspective website and show that we can consistently reduce the toxicity scores to the level of the non-toxic phrases. The existence of such adversarial examples is very harmful for toxic detection systems and seriously undermines their usability.
- Feb 28 2017 cs.CV arXiv:1702.08112v1Thin leaves, fine stems, self-occlusion, non-rigid and slowly changing structures make plants difficult for three-dimensional (3D) scanning and reconstruction -- two critical steps in automated visual phenotyping. Many current solutions such as laser scanning, structured light, and multiview stereo can struggle to acquire usable 3D models because of limitations in scanning resolution and calibration accuracy. In response, we have developed a fast, low-cost, 3D scanning platform to image plants on a rotating stage with two tilting DSLR cameras centred on the plant. This uses new methods of camera calibration and background removal to achieve high-accuracy 3D reconstruction. We assessed the system's accuracy using a 3D visual hull reconstruction algorithm applied on 2 plastic models of dicotyledonous plants, 2 sorghum plants and 2 wheat plants across different sets of tilt angles. Scan times ranged from 3 minutes (to capture 72 images using 2 tilt angles), to 30 minutes (to capture 360 images using 10 tilt angles). The leaf lengths, widths, areas and perimeters of the plastic models were measured manually and compared to measurements from the scanning system: results were within 3-4% of each other. The 3D reconstructions obtained with the scanning system show excellent geometric agreement with all six plant specimens, even plants with thin leaves and fine stems.
- We consider the task of learning control policies for a robotic mechanism striking a puck in an air hockey game. The control signal is a direct command to the robot's motors. We employ a model free deep reinforcement learning framework to learn the motoric skills of striking the puck accurately in order to score. We propose certain improvements to the standard learning scheme which make the deep Q-learning algorithm feasible when it might otherwise fail. Our improvements include integrating prior knowledge into the learning scheme, and accounting for the changing distribution of samples in the experience replay buffer. Finally we present our simulation results for aimed striking which demonstrate the successful learning of this task, and the improvement in algorithm stability due to the proposed modifications.
- Feb 28 2017 cs.HC arXiv:1702.08072v1Theories of knowledge reuse posit two distinct processes: reuse for replication and reuse for innovation. We identify another distinct process, reuse for customization. Reuse for customization is a process in which designers manipulate the parameters of metamodels to produce models that fulfill their personal needs. We test hypotheses about reuse for customization in Thingiverse, a community of designers that shares files for three-dimensional printing. 3D metamodels are reused more often than the 3D models they generate. The reuse of metamodels is amplified when the metamodels are created by designers with greater community experience. Metamodels make the community's design knowledge available for reuse for customization-or further extension of the metamodels, a kind of reuse for innovation.
- The ensemble Kalman filter (EnKF) is a Monte Carlo based implementation of the Kalman filter (KF) for extremely high-dimensional, possibly nonlinear and non-Gaussian state estimation problems. Its ability to handle state dimensions in the order of millions has made the EnKF a popular algorithm in different geoscientific disciplines. Despite a similarly vital need for scalable algorithms in signal processing, e.g., to make sense of the ever increasing amount of sensor data, the EnKF is hardly discussed in our field. This self-contained review paper is aimed at signal processing researchers and provides all the knowledge to get started with the EnKF. The algorithm is derived in a KF framework, without the often encountered geoscientific terminology. Algorithmic challenges and required extensions of the EnKF are provided, as well as relations to sigma-point KF and particle filters. The relevant EnKF literature is summarized in an extensive survey and unique simulation examples, including popular benchmark problems, complement the theory with practical insights. The signal processing perspective highlights new directions of research and facilitates the exchange of potentially beneficial ideas, both for the EnKF and high-dimensional nonlinear and non-Gaussian filtering in general.
- Feb 28 2017 cs.CV arXiv:1702.08014v1Semantic segmentation constitutes an integral part of medical image analyses for which breakthroughs in the field of deep learning were of high relevance. The large number of trainable parameters of deep neural networks however renders them inherently data hungry, a characteristic that heavily challenges the medical imaging community. Though interestingly, with the de facto standard training of fully convolutional networks (FCNs) for semantic segmentation being agnostic towards the `structure' of the predicted label maps, valuable complementary information about the global quality of the segmentation lies idle. In order to tap into this potential, we propose utilizing an adversarial network which discriminates between expert and generated annotations in order to train FCNs for semantic segmentation. Because the adversary constitutes a learned parametrization of what makes a good segmentation at a global level, we hypothesize that the method holds particular advantages for segmentation tasks on complex structured, small datasets. This holds true in our experiments: We learn to segment aggressive prostate cancer utilizing MRI images of 152 patients and show that the proposed scheme is superior over the de facto standard in terms of the detection sensitivity and the dice-score for aggressive prostate cancer. The achieved relative gains are shown to be particularly pronounced in the small dataset limit.
- This paper addresses the task of designing a modular neural network architecture that jointly solves different tasks. As an example we use the tasks of depth estimation and semantic segmentation given a single RGB image. The main focus of this work is to analyze the cross-modality influence between depth and semantic prediction maps on their joint refinement. While most previous works solely focus on measuring improvements in accuracy, we propose a way to quantify the cross-modality influence. We show that there is a relationship between final accuracy and cross-modality influence, although not a simple linear one. Hence a larger cross-modality influence does not necessarily translate into an improved accuracy. We find that a beneficial balance between the cross-modality influences can be achieved by network architecture and conjecture that this relationship can be utilized to understand different network design choices. Towards this end we propose a Convolutional Neural Network (CNN) architecture that fuses the state of the state-of-the-art results for depth estimation and semantic labeling. By balancing the cross-modality influences between depth and semantic prediction, we achieve improved results for both tasks using the NYU-Depth v2 benchmark.
- Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental reason is the difficulty of back-propagation through discrete random variables combined with the inherent instability of the GAN training objective. To address these problems, we propose Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. Instead of directly optimizing the GAN objective, we derive a novel and low-variance objective using the discriminator's output that follows corresponds to the log-likelihood. Compared with the original, the new objective is proved to be consistent in theory and beneficial in practice. The experimental results on various discrete datasets demonstrate the effectiveness of the proposed approach.
- Feb 28 2017 cs.CV arXiv:1702.07971v1Most of computer vision focuses on what is in an image. We propose to train a standalone object-centric context representation to perform the opposite task: seeing what is not there. Given an image, our context model can predict where objects should exist, even when no object instances are present. Combined with object detection results, we can perform a novel vision task: finding where objects are missing in an image. Our model is based on a convolutional neural network structure. With a specially designed training strategy, the model learns to ignore objects and focus on context only. It is fully convolutional thus highly efficient. Experiments show the effectiveness of the proposed approach in one important accessibility task: finding city street regions where curb ramps are missing, which could help millions of people with mobility disabilities.
- Feb 28 2017 cs.CV arXiv:1702.07963v1In this paper, we proposed using a hybrid method that utilises deep convolutional and recurrent neural networks for accurate delineation of skin lesion of images supplied with ISBI 2017 lesion segmentation challenge. The proposed method was trained using 1800 images and tested on 150 images from ISBI 2017 challenge.
- We propose a new active learning approach using Generative Adversarial Networks (GAN). Different from regular active learning, we adaptively synthesize training instances for querying to increase learning speed. Our approach outperforms random generation using GAN alone in active learning experiments. We demonstrate the effectiveness of the proposed algorithm in various datasets when compared to other algorithms. To the best our knowledge, this is the first active learning work using GAN.
- Deep learning is an important component of big-data analytic tools and intelligent applications, such as, self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive, and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks. In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and implicit synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103x compared to the execution on one thread of the Xeon Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x compared to the sequential execution on Intel Core i5.
- Feb 28 2017 cs.LG arXiv:1702.07904v1Variational autoencoders (VAE) often use Gaussian or category distribution to model the inference process. This puts a limit on variational learning because this simplified assumption does not match the true posterior distribution, which is usually much more sophisticated. To break this limitation and apply arbitrary parametric distribution during inference, this paper derives a \emphsemi-continuous latent representation, which approximates a continuous density up to a prescribed precision, and is much easier to analyze than its continuous counterpart because it is fundamentally discrete. We showcase the proposition by applying polynomial exponential family distributions as the posterior, which are universal probability density function generators. Our experimental results show consistent improvements over commonly used VAE models.
- This paper presents an approach for semantic place categorization using data obtained from RGB cameras. Previous studies on visual place recognition and classification have shown that, by considering features derived from pre-trained Convolutional Neural Networks (CNNs) in combination with part-based classification models, high recognition accuracy can be achieved, even in presence of occlusions and severe viewpoint changes. Inspired by these works, we propose to exploit local deep representations, representing images as set of regions applying a Naïve Bayes Nearest Neighbor (NBNN) model for image classification. As opposed to previous methods where CNNs are merely used as feature extractors, our approach seamlessly integrates the NBNN model into a fully-convolutional neural network. Experimental results show that the proposed algorithm outperforms previous methods based on pre-trained CNN models and that, when employed in challenging robot place recognition tasks, it is robust to occlusions, environmental and sensor changes.
- We introduce the persistent homotopy type distance dHT to compare real valued functions defined on possibly different homotopy equivalent topological spaces. The underlying idea in the definition of dHT is to measure the minimal shift that is necessary to apply to one of the two functions in order that the sublevel sets of the two functions become homotopically equivalent. This distance is interesting in connection with persistent homology. Indeed, our main result states that dHT still provides an upper bound for the bottleneck distance between the persistence diagrams of the intervening functions. Moreover, because homotopy equivalences are weaker than homeomorphisms, this implies a lifting of the standard stability results provided by the L-infty distance and the natural pseudo-distance dNP. From a different standpoint, we prove that dHT extends the L-infty distance and dNP in two ways. First, we show that, appropriately restricting the category of objects to which dHT applies, it can be made to coincide with the other two distances. Finally, we show that dHT has an interpretation in terms of interleavings that naturally places it in the family of distances used in persistence theory.
- Feb 28 2017 cs.LG arXiv:1702.07870v1We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite. We devise an algorithm that obtains a tight regret bound of $\widetilde{O}(\epsilon T + N + \sqrt{NT})$, where $N$ is the empirical $\epsilon$-covering number of the sequence of loss functions generated by the environment. In addition, we present a hedging procedure that allows us to find the optimal $\epsilon$ in hindsight. Finally, we discuss a few interesting applications of our algorithm. We show how our algorithm is applicable in the approximately low rank experts model of Hazan et al. (2016), and discuss the case of experts with bounded variation, in which there is a surprisingly large gap between the regret bounds obtained in the statistical and online settings.
- The dependency on the correct functioning of embedded systems is rapidly growing, mainly due to their wide range of applications, such as micro-grids, automotive device control, health care, surveillance, mobile devices, and consumer electronics. Their structures are becoming more and more complex and now require multi-core processors with scalable shared memory, in order to meet increasing computational power demands. As a consequence, reliability of embedded (distributed) software becomes a key issue during system development, which must be carefully addressed and assured. The present research discusses challenges, problems, and recent advances to ensure correctness and timeliness regarding embedded systems. Reliability issues, in the development of micro-grids and cyber-physical systems, are then considered, as a prominent verification and synthesis application. In particular, machine learning techniques emerge as one of the main approaches to learn reliable implementations of embedded software for achieving a correct-by-construction design.
- Feb 28 2017 cs.CV arXiv:1702.07841v1Magnetic Resonance Imaging (MRI) is widely used in routine clinical diagnosis and treatment. However, variations in MRI acquisition protocols result in different appearances of normal and diseased tissue in the images. Convolutional neural networks (CNNs), which have shown to be successful in many medical image analysis tasks, are typically sensitive to the variations in imaging protocols. Therefore, in many cases, networks trained on data acquired with one MRI protocol, do not perform satisfactorily on data acquired with different protocols. This limits the use of models trained with large annotated legacy datasets on a new dataset with a different domain which is often a recurring situation in clinical settings. In this study, we aim to answer the following central questions regarding domain adaptation in medical image analysis: Given a fitted legacy model, 1) How much data from the new domain is required for a decent adaptation of the original network?; and, 2) What portion of the pre-trained model parameters should be retrained given a certain number of the new domain training samples? To address these questions, we conducted extensive experiments in white matter hyperintensity segmentation task. We trained a CNN on legacy MR images of brain and evaluated the performance of the domain-adapted network on the same task with images from a different domain. We then compared the performance of the model to the surrogate scenarios where either the same trained network is used or a new network is trained from scratch on the new dataset.The domain-adapted network tuned only by two training examples achieved a Dice score of 0.63 substantially outperforming a similar network trained on the same set of examples from scratch.
- Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using synthetically generated composite images for training state of the art object detectors. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models to real scenes. We demonstrate the effectiveness of these object detector training strategies on publicly available datasets of GMU-Kitchens and Washington RGB-D Scenes v2, and show how object detectors can be trained with limited amounts of annotated real scenes with objects present. This charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.
- We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda.
- We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For the segmentation model, we propose a novel way of performing phoneme boundary detection with deep neural networks using connectionist temporal classification (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet that requires fewer parameters and trains faster than the original. By using a neural network for each component, our system is simpler and more flexible than traditional text-to-speech systems, where each component requires laborious feature engineering and extensive domain expertise. Finally, we show that inference with our system can be performed faster than real time and describe optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.
- We consider how to connect a set of disjoint networks to optimize the performance of the resulting composite network. We quantify this performance by the coherence of the composite network, which is defined by an H2 norm of the system. Two dynamics are considered: noisy consensus dynamics with and without stubborn agents. For noisy consensus dynamics without stubborn agents, we derive analytical expressions for the coherence of composite networks in terms of the coherence of the individual networks and the structure of their interconnections. We also identify optimal interconnection topologies and give bounds on coherence for general composite graphs. For noisy consensus dynamics with stubborn agents, we show that the coherence of a composite network is a submodular function over the set of potential edges between the disjoint networks. We leverage this submodularity to develop a non-combinatorial algorithm that identifies connecting edges such that the composite network coherence is within a provable bound of optimal.