results for au:Sun_J in:cs

- Nov 23 2017 cs.CV arXiv:1711.08184v1In this paper, we propose a novel method called AlignedReID that extracts a global feature which is jointly learned with local features. Global feature learning benefits greatly from local feature learning, which performs an alignment/matching by calculating the shortest path between two sets of local features, without requiring extra supervision. After the joint learning, we only keep the global feature to compute the similarities between images. Our method achieves rank-1 accuracy of 94.0% on Market1501 and 96.1% on CUHK03, outperforming state-of-the-art methods by a large margin. We also evaluate human-level performance and demonstrate that our method is the first to surpass human-level performance on Market1501 and CUHK03, two widely used Person ReID datasets.
- Nov 22 2017 cs.CV arXiv:1711.07752v1Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in real-world scenarios. In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowd-robust localization. Our detector trained by repulsion loss outperforms all the state-of-the-art methods with a significant improvement in occlusion cases.
- Nov 21 2017 cs.CV arXiv:1711.07319v1The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
- Nov 21 2017 cs.CV arXiv:1711.07264v1In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be make publicly available.
- Nov 21 2017 cs.CV arXiv:1711.07240v2The improvements in recent CNN-based object detection works, from R-CNN [11] and Fast/Faster R-CNN [10, 29] to recent Mask R-CNN [14] and RetinaNet [22], mainly come from new network, or framework, or loss design. But mini-batch size, a key factor in the training, has not been well studied. In this paper, we propose a Large Mini-Batch Object Detector (MegDet) to enable the training with much larger mini-batch size than before (e.g. from 16 to 256), so that we can effectively utilize multiple GPUs (up to 128 in our experiments) to significantly shorten the training time. Technically, we suggest a learning rate policy and Cross- GPU Batch Normalization, which together allow us to suc- cessfully train a large mini-batch detector in much less time (e.g., from 33 hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st place of Detection task.
- Nov 03 2017 cs.CV arXiv:1711.00583v1There is an emerging trend to leverage noisy image datasets in many visual recognition tasks. However, the label noise among the datasets severely degenerates the \mboxperformance of deep learning approaches. Recently, one mainstream is to introduce the latent label to handle label noise, which has shown promising improvement in the network designs. Nevertheless, the mismatch between latent labels and noisy labels still affects the predictions in such methods. To address this issue, we propose a quality embedding model, which explicitly introduces a quality variable to represent the trustworthiness of noisy labels. Our key idea is to identify the mismatch between the latent and noisy labels by embedding the quality variables into different subspaces, which effectively minimizes the noise effect. At the same time, the high-quality labels is still able to be applied for training. To instantiate the model, we further propose a Contrastive-Additive Noise network (CAN), which consists of two important layers: (1) the contrastive layer estimates the quality variable in the embedding space to reduce noise effect; and (2) the additive layer aggregates the prior predictions and noisy labels as the posterior to train the classifier. Moreover, to tackle the optimization difficulty, we deduce an SGD algorithm with the reparameterization tricks, which makes our method scalable to big data. We conduct the experimental evaluation of the proposed method over a range of noisy image datasets. Comprehensive results have demonstrated CAN outperforms the state-of-the-art deep learning approaches.
- Like most nonparametric estimators of information functionals involving continuous multidimensional random variables, the k-nearest neighbors (knn) estimators involve an estimate of the probability density functions (pdfs) of the variables. The pdfs are estimated using spheres in an appropriate norm to represent local volumes. We introduce a new class of knn estimators that we call geometric knn estimators (g-kNN), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-kNN estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical systems attractors. There is a trade-off between the amount of local data needed to fit a more complicated local volume element and the improvement in the estimate due to the better description of the local geometry. In a series of numerical examples, this g-kNN estimator of mutual information is compared to the Kraskov-StĂ¶gbauer-Grassberger (KSG) estimator, where we find that the modelling of the local geometry pays off in terms of better estimates, both when the joint distribution is thinly supported, and when sample sizes are small. In particular, the examples suggest that the g-kNN estimators can be of particular relevance to applications in which the system is large but data size is limited.
- Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits.
- Oct 10 2017 cs.DB arXiv:1710.02817v1Missing and incorrect values often cause serious consequences. To deal with these data quality problems, a class of common employed tools are dependency rules, such as Functional Dependencies (FDs), Conditional Functional Dependencies (CFDs) and Edition Rules (ERs), etc. The stronger expressing ability a dependency has, data with the better quality can be obtained. To the best of our knowledge, all previous dependencies treat each attribute value as a non-splittable whole. Actually however, in many applications, part of a value may contains meaningful information, indicating that more powerful dependency rules to handle data quality problems are possible. In this paper, we consider of discovering such type of dependencies in which the left hand side is part of a regular-expression-like paradigm, named Paradigm Dependencies (PDs). PDs tell that if a string matches the paradigm, element at the specified position can decides a certain other attribute's value. We propose a framework in which strings with similar coding rules and different lengths are clustered together and aligned vertically, from which PDs can be discovered directly. The aligning problem is the key component of this framework and is proved in NP-Complete. A greedy algorithm is introduced in which the clustering and aligning tasks can be accomplished simultaneously. Because of the greedy algorithm's high time complexity, several pruning strategies are proposed to reduce the running time. In the experimental study, three real datasets as well as several synthetical datasets are employed to verify our methods' effectiveness and efficiency.
- Sep 28 2017 cs.CV arXiv:1709.09641v1Multi-atlas segmentation approach is one of the most widely-used image segmentation techniques in biomedical applications. There are two major challenges in this category of methods, i.e., atlas selection and label fusion. In this paper, we propose a novel multi-atlas segmentation method that formulates multi-atlas segmentation in a deep learning framework for better solving these challenges. The proposed method, dubbed deep fusion net (DFN), is a deep architecture that integrates a feature extraction subnet and a non-local patch-based label fusion (NL-PLF) subnet in a single network. The network parameters are learned by end-to-end training strategy for automatically learning deep features that enable optimal performance in a NL-PLF framework. Besides, the learned deep features are further utilized in defining a similarity measure for atlas selection. We evaluate our proposed method on two public cardiac MR databases of SATA-13 and LV-09 for left ventricle segmentation, and our learned DFNs with extracted deep features for atlas selection at testing phase achieve state-of-the-art accuracies, e.g., 0.833 in averaged Dice metric (ADM) on SATA-13 database and 0.95 in ADM for epicardium segmentation on LV-09 database. Besides, our method is robust to the cross-database evaluation, e.g., the DFN learned on LV-09 database achieves 0.815 in ADM on SATA-13 database. We also test our proposed method on Cardiac Atlas Project (CAP) testing set of MICCAI 2013 SATA Segmentation Challenge, and our method achieves 0.815 in Dice metric, ranking as the highest result on this dataset.
- Design, Modeling and Dynamic Compensation PID Control of a Fully-Actuated Aerial Manipulation SystemSep 26 2017 cs.RO arXiv:1709.08054v1This paper addresses design, modeling and dynamic-compensation PID (dc-PID) control of a novel type of fully-actuated aerial manipulation (AM) system. Firstly, design of novel mechanical structure of the AM is presented. Secondly, kinematics and dynamics of AM are modeled using Craig parameters and recursion Newton-Euler equations respectively, which give rise to a more accurate dynamic relationship between aerial platform and manipulator. Then, the dynamic-compensation PID control is proposed to solve the problem of fully-actuated control of AM. Finally, uniform coupled matrix equations between driving forces/moments and rotor speeds are derived, which can support design and analysis of parameters and decoupling theoretically. It is taken into account practical problems including noise and perturbation, parameter uncertainty, and power limitation in simulations, and results from simulations shows that the AM system presented can be fully-actued controlled with advanced control performances, which can not achieved theoretically in traditional AM. And with compared to backstepping control dc-PID has better control accuracy and capability to disturbance rejection in two simulations of aerial operation tasks with motion of joint. The experiment of dc-pid proves the availability and effectiveness of the method proposed.
- Sep 19 2017 cs.SY arXiv:1709.05457v1We proposed a fusion mechanism for the distributed cooperative map matching (CMM) within the vehicular ad-hoc network. This mechanism makes the information from each node reachable within the network by other nodes without direct communication, thus improving the overall localization accuracy and robustness. Each node runs a Rao-Blackwellized particle filter (RBPF) that processes the Global Navigation Satellite System (GNSS) measurements of its own and its neighbors, followed by a map matching step that reduces or eliminates the GNSS atmospheric error. Then each node fuses its own filtered results with those from its neighbors for a better estimation. In this work, the complicated dynamics and fusion mechanics of these RBPFs are represented by a linear dynamical system. We proposed a distributed optimization framework that explores the model to improve both robustness and accuracy of the distributed CMM. The effectiveness of this distributed optimization framework is illustrated by simulation results on realistic vehicular networks drawn from data, compared with the centralized one and a decentralized one with random fusion weights.
- Sep 19 2017 cs.LG arXiv:1709.05342v2In this paper, we propose and evaluate the application of unsupervised machine learning to anomaly detection for a Cyber-Physical System (CPS). We compare two methods: Deep Neural Networks (DNN) adapted to time series data generated by a CPS, and one-class Support Vector Machines (SVM). These methods are evaluated against data from the Secure Water Treatment (SWaT) testbed, a scaled-down but fully operational raw water purification plant. For both methods, we first train detectors using a log generated by SWaT operating under normal conditions. Then, we evaluate the performance of both methods using a log generated by SWaT operating under 36 different attack scenarios. We find that our DNN generates fewer false positives than our one-class SVM while our SVM detects slightly more anomalies. Overall, our DNN has a slightly better F measure than our SVM. We discuss the characteristics of the DNN and one-class SVM used in this experiment, and compare the advantages and disadvantages of the two methods.
- Sep 08 2017 cs.CV arXiv:1709.01993v1Lighting estimation from face images is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image. Lacking massive ground truth lighting labels for face images in the wild, we use an existing method to estimate lighting parameters, which are treated as ground truth with unknown noises. To alleviate the effect of such noises, we utilize the idea of Generative Adversarial Networks (GAN) and propose a Label Denoising Adversarial Network (LDAN) to make use of synthetic data with accurate ground truth to help train a deep CNN for lighting regression on real face images. Experiments show that our network outperforms existing methods in producing consistent lighting parameters of different faces under similar lighting conditions. Moreover, our method is 100,000 times faster in execution time than prior optimization-based lighting estimation approaches.
- In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.
- We introduce a new model for building conditional generative models in a semi-supervised setting to conditionally generate data given attributes by adapting the GAN framework. The proposed semi-supervised GAN (SS-GAN) model uses a pair of stacked discriminators to learn the marginal distribution of the data, and the conditional distribution of the attributes given the data respectively. In the semi-supervised setting, the marginal distribution (which is often harder to learn) is learned from the labeled + unlabeled data, and the conditional distribution is learned purely from the labeled data. Our experimental results demonstrate that this model performs significantly better compared to existing semi-supervised conditional GAN models.
- Long Short-Term Memory (LSTM) is the primary recurrent neural networks architecture for acoustic modeling in automatic speech recognition systems. Residual learning is an efficient method to help neural networks converge easier and faster. In this paper, we propose several types of residual LSTM methods for our acoustic modeling. Our experiments indicate that, compared with classic LSTM, our architecture shows more than 8% relative reduction in Phone Error Rate (PER) on TIMIT tasks. At the same time, our residual fast LSTM approach shows 4% relative reduction in PER on the same task. Besides, we find that all this architecture could have good results on THCHS-30, Librispeech and Switchboard corpora.
- Jul 27 2017 cs.LG arXiv:1707.08262v1Sleep disorders, such as sleep apnea, parasomnias, and hypersomnia, affect 50-70 million adults in the United States (Hillman et al., 2006). Overnight polysomnography (PSG), including brain monitoring using electroencephalography (EEG), is a central component of the diagnostic evaluation for sleep disorders. While PSG is conventionally performed by trained technologists, the recent rise of powerful neural network learning algorithms combined with large physiological datasets offers the possibility of automation, potentially making expert-level sleep analysis more widely available. We propose SLEEPNET (Sleep EEG neural network), a deployed annotation tool for sleep staging. SLEEPNET uses a deep recurrent neural network trained on the largest sleep physiology database assembled to date, consisting of PSGs from over 10,000 patients from the Massachusetts General Hospital (MGH) Sleep Laboratory. SLEEPNET achieves human-level annotation performance on an independent test set of 1,000 EEGs, with an average accuracy of 85.76% and algorithm-expert inter-rater agreement (IRA) of kappa = 79.46%, comparable to expert-expert IRA.
- Jul 20 2017 cs.CV arXiv:1707.06168v2In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.
- The handwritten string recognition is still a challengeable task, though the powerful deep learning tools were introduced. In this paper, based on TAO-FCN, we proposed an end-to-end system for handwritten string recognition. Compared with the conventional methods, there is no preprocess nor manually designed rules employed. With enough labelled data, it is easy to apply the proposed method to different applications. Although the performance of the proposed method may not be comparable with the state-of-the-art approaches, it's usability and robustness are more meaningful for practical applications.
- Jul 05 2017 cs.CV arXiv:1707.01083v1We introduce an extremely computation efficient CNN architecture named ShuffleNet, designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two proposed operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 6.7\%) than the recent MobileNet system on ImageNet classification under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves \textasciitilde 13$\times$ actual speedup over AlexNet while maintaining comparable accuracy.
- Jun 20 2017 cs.SY arXiv:1706.06027v1In this paper, we develop two zonotope-based set-membership estimation algorithms for identification of time-varying parameters in linear models, where both additive and multiplicative uncertainties are treated explicitly. The two recursive algorithms can be differentiated by their ways of processing the data and required computations. The first algorithm, which is referred to as Cone And Zonotope Intersection (CAZI), requires solving linear programming problems at each iteration. The second algorithm, referred to as the Polyhedron And Zonotope Intersection (PAZI), involves linear programming as well as an optimization subject to linear matrix inequalities (LMIs). Both algorithms are capable of providing tight overbounds of the feasible solution set (FSS) in our numerical case studies. Furthermore, PAZI provides an additional opportunity of further analyzing the relation between the estimation results at different iterations. An application to health monitoring of marine engines is considered to demonstrate the utility and effectiveness of the algorithms.
- Jun 09 2017 cs.HC arXiv:1706.02637v2Gaze-based virtual keyboards provide an effective interface for text entry by eye movements. The efficiency and usability of these keyboards have traditionally been evaluated with conventional text entry performance measures such as words per minute, keystrokes per character, backspace usage, etc. However, in comparison to the traditional text entry approaches, gaze-based typing involves natural eye movements that are highly correlated with human brain cognition. Employing eye gaze as an input could lead to excessive mental demand, and in this work we argue the need to include cognitive load as an eye typing evaluation measure. We evaluate three variations of gaze-based virtual keyboards, which implement variable designs in terms of word suggestion positioning. The conventional text entry metrics indicate no significant difference in the performance of the different keyboard designs. However, STFT (Short-time Fourier Transform) based analysis of EEG signals indicate variances in the mental workload of participants while interacting with these designs. Moreover, the EEG analysis provides insights into the user's cognition variation for different typing phases and intervals, which should be considered in order to improve eye typing usability.
- The study of social networks --- where people are located, geographically, and how they might be connected to one another --- is a current hot topic of interest, because of its immediate relevance to important applications, from devising efficient immunization techniques for the arrest of epidemics, to the design of better transportation and city planning paradigms, to the understanding of how rumors and opinions spread and take shape over time. We develop a spatial social complex network (SSCN) model that captures not only essential connectivity features of real-life social networks, including a heavy-tailed degree distribution and high clustering, but also the spatial location of individuals, reproducing Zipf's law for the distribution of city populations as well as other observed hallmarks. We then simulate Milgram's Small-World experiment on our SSCN model, obtaining good qualitative agreement with the known results and shedding light on the role played by various network attributes and the strategies used by the players in the game. This demonstrates the potential of the SSCN model for the simulation and study of the many social processes mentioned above, where both connectivity and geography play a role in the dynamics.
- May 22 2017 cs.CV arXiv:1705.06869v1Compressive sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR images from a small number of under-sampled data in k-space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and speed, in this paper, we propose two novel deep architectures, dubbed ADMM-Nets in basic and generalized versions. ADMM-Nets are defined over data flow graphs, which are derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a general CS-based MRI model. They take the sampled k-space data as inputs and output reconstructed MR images. Moreover, we extend our network to cope with complex-valued MR images. In the training phase, all parameters of the nets, e.g., transforms, shrinkage functions, etc., are discriminatively trained end-to-end. In the testing phase, they have computational overhead similar to ADMM algorithm but use optimized parameters learned from the data for CS-based reconstruction task. We investigate different configurations in network structures and conduct extensive experiments on MR image reconstruction under different sampling rates. Due to the combination of the advantages in model-based approach and deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction accuracies with fast computational speed.
- May 02 2017 cs.SY arXiv:1705.00568v2Cooperative map matching (CMM) uses the Global Navigation Satellite System (GNSS) position information of a group of vehicles to improve the standalone localization accuracy. It has been shown, in our previous work, that the GNSS error can be reduced from several meters to sub-meter level by matching the biased GNSS positioning to a digital map with road constraints. While further error reduction is expected by increasing the number of participating vehicles, fundamental questions on how the vehicle membership within CMM affects the performance of the CMM results need to be addressed to provide guidelines for design and optimization of the vehicle network. This work presents a theoretical study that establishes a framework for quantitative evaluation of the impact of the road constraints on the CMM accuracy. More specifically, a closed-form expression of the CMM error in terms of the road constraints and GNSS error is derived based on a simple CMM rule. The asymptotic decay of the CMM error as the number of vehicles increases is established and justified through numerical simulations. Moreover, it is proved that the CMM error can be minimized if the directions of the roads on which the connected vehicles travel obey a uniform distribution. Finally, the localization accuracy of CMM is evaluated based on the Safety Pilot Model Deployment and Pillar dataset of Ann Arbor traffic flow collected over three years period. The contributions of this work include establishing a theoretical foundation for CMM as well as providing insight and motivation for applications of CMM.
- Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.
- We describe a prototype dialogue response generation model for the customer service domain at Amazon. The model, which is trained in a weakly supervised fashion, measures the similarity between customer questions and agent answers using a dual encoder network, a Siamese-like neural network architecture. Answer templates are extracted from embeddings derived from past agent answers, without turn-by-turn annotations. Responses to customer inquiries are generated by selecting the best template from the final set of templates. We show that, in a closed domain like customer service, the selected templates cover $>$70\% of past customer inquiries. Furthermore, the relevance of the model-selected templates is significantly higher than templates selected by a standard tf-idf baseline.
- Mar 28 2017 cs.SY arXiv:1703.08818v2Cooperative map matching (CMM) uses the Global Navigation Satellite System (GNSS) positioning of a group of vehicles to improve the standalone localization accuracy. It has been shown to reduce GNSS error from several meters to sub-meter level by matching the biased GNSS positioning of four vehicles to a digital map with road constraints in our previous work. While further error reduction is expected by increasing the number of participating vehicles, fundamental questions on how the vehicle membership of the CMM affects the performance of the GNSS-based localization results need to be addressed to provide guidelines for design and optimization of the vehicle network. The quantitative relationship between the estimation error and the road constraints has to be systematically investigated to provide insights. In this work, a theoretical study is presented that aims at developing a framework for quantitatively evaluating effects of the road constraints on the CMM accuracy and for eventual optimization of the CMM network. More specifically, a closed form expression of the CMM error in terms of the road angles and GNSS error is first derived based on a simple CMM rule. Then a Branch and Bound algorithm and a Cross Entropy method are developed to minimize this error by selecting the optimal group of vehicles under two different assumptions about the GNSS error variance.
- Mar 24 2017 cs.SI arXiv:1703.08100v2In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.
- Mar 23 2017 cs.DB arXiv:1703.07617v1Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.
- Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN.
- In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To handle such data, the state-of-the-art tensor model is the so-called PARAFAC2, which yields interpretable and robust output and can naturally handle sparse data. However, its main limitation up to now has been the lack of efficient algorithms that can handle large-scale datasets. In this work, we fill this gap by developing a scalable method to compute the PARAFAC2 decomposition of large and sparse datasets, called SPARTan. Our method exploits special structure within PARAFAC2, leading to a novel algorithmic reformulation that is both fast (in absolute time) and more memory-efficient than prior work. We evaluate SPARTan on both synthetic and real datasets, showing 22X performance gains over the best previous implementation and also handling larger problem instances for which the baseline fails. Furthermore, we are able to apply SPARTan to the mining of temporally-evolving phenotypes on data taken from real and medically complex pediatric patients. The clinical meaningfulness of the phenotypes identified in this process, as well as their temporal evolution over time for several patients, have been endorsed by clinical experts.
- Mar 09 2017 cs.CV arXiv:1703.02719v1One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity. However, in the field of semantic segmenta- tion, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the clas- sification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the ob- ject boundaries. Our approach achieves state-of-art perfor- mance on two public benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.
- Mar 08 2017 cs.SY arXiv:1703.02098v1Cooperative localization with map matching has been shown to reduce Global Navigation Satellite System (GNSS) localization error from several meters to sub-meter level by fusing the GNSS measurements of four vehicles in our previous work. While further error reduction is expected to be achievable by increasing the number of vehicles, the quantitative relationship between the estimation error and the number of connected vehicles has neither been systematically investigated nor analytically proved. In this work, a theoretical study is presented that analytically proves the correlation between the localization error and the number of connected vehicles in two cases of practical interest. More specifically, it is shown that, under the assumption of small non-common error, the expected square error of the GNSS common error correction is inversely proportional to the number of vehicles, if the road directions obey a uniform distribution, or inversely proportional to logarithm of the number of vehicles, if the road directions obey a Bernoulli distribution. Numerical simulations are conducted to justify these analytic results. Moreover, the simulation results show that the aforementioned error decrement rates hold even when the assumption of small non-common error is violated.
- Feb 28 2017 cs.CV arXiv:1702.07971v1Most of computer vision focuses on what is in an image. We propose to train a standalone object-centric context representation to perform the opposite task: seeing what is not there. Given an image, our context model can predict where objects should exist, even when no object instances are present. Combined with object detection results, we can perform a novel vision task: finding where objects are missing in an image. Our model is based on a convolutional neural network structure. With a specially designed training strategy, the model learns to ignore objects and focus on context only. It is fully convolutional thus highly efficient. Experiments show the effectiveness of the proposed approach in one important accessibility task: finding city street regions where curb ramps are missing, which could help millions of people with mobility disabilities.
- Feb 28 2017 cs.CV arXiv:1702.07975v1Like other problems in computer vision, offline handwritten Chinese character recognition (HCCR) has achieved impressive results using convolutional neural network (CNN)-based methods. However, larger and deeper networks are needed to deliver state-of-the-art results in this domain. Such networks intuitively appear to incur high computational cost, and require the storage of a large number of parameters, which renders them unfeasible for deployment in portable devices. To solve this problem, we propose a Global Supervised Low-rank Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve the problems of speed and storage capacity. We design a nine-layer CNN for HCCR consisting of 3,755 classes, and devise an algorithm that can reduce the networks computational cost by nine times and compress the network to 1/18 of the original size of the baseline model, with only a 0.21% drop in accuracy. In tests, the proposed algorithm surpassed the best single-network performance reported thus far in the literature while requiring only 2.3 MB for storage. Furthermore, when integrated with our effective forward implementation, the recognition of an offline character image took only 9.7 ms on a CPU. Compared with the state-of-the-art CNN model for HCCR, our approach is approximately 30 times faster, yet 10 times more cost efficient.
- Feb 27 2017 cs.SY arXiv:1702.07536v1In this paper, the problem of event-triggered consensus for linear continuous-time multi-agent systems is investigated. A new event-triggered consensus protocol based on a predictor is proposed to achieve consensus without continuous communication among agents. In the proposed consensus protocol, each agent only needs to monitor its states to determine its event-triggered instants. When an event is triggered, the agent will update its consensus protocol and sent its state information to its neighbors. In addition, the agent will also update its consensus protocol and the predictor when it receives the state information from its neighbors. A necessary and sufficient condition that the consensus problem can be solved is derived. Moreover, it is proved that Zeno behavior does not exist. Finally, a numerical example is given to illustrate that the protocol proposed in this paper can make the multi-agent systems achieve consensus through much fewer event-triggered times.
- Feb 21 2017 cs.SY arXiv:1702.05792v2A crucial function for automated vehicle technologies is accurate localization. Lane-level accuracy is not readily available from low-cost Global Navigation Satellite System (GNSS) receivers because of factors such as multipath error and atmospheric bias. Approaches such as Differential GNSS can improve localization accuracy, but usually require investment in expensive base stations. Connected vehicle technologies provide an alternative approach to improving the localization accuracy. It will be shown in this paper that localization accuracy can be enhanced using crude GNSS measurements from a group of connected vehicles, by matching their locations to a digital map. A Rao-Blackwellized particle filter (RBPF) is used to jointly estimate the common biases of the pseudo-ranges and the vehicle positions. Multipath biases, which introduce receiver-specific (non-common) error, are mitigated by a multi-hypothesis detection-rejection approach. The temporal correlation of the estimations is exploited through the prediction-update process. The proposed approach is compared to existing methods using both simulations and experimental results. It was found that the proposed algorithm can eliminate the common biases and reduce the localization error to below 1 meter under open sky conditions.
- Feb 16 2017 cs.CV arXiv:1702.04517v2Convective storm nowcasting has attracted substantial attention in various fields. Existing methods under a deep learning framework rely primarily on radar data. Although they perform nowcast storm advection well, it is still challenging to nowcast storm initiation and growth, due to the limitations of the radar observations. This paper describes the first attempt to nowcast storm initiation, growth, and advection simultaneously under a deep learning framework using multi-source meteorological data. To this end, we present a multi-channel 3D-cube successive convolution network (3D-SCN). As real-time re-analysis meteorological data can now provide valuable atmospheric boundary layer thermal dynamic information, which is essential to predict storm initiation and growth, both raw 3D radar and re-analysis data are used directly without any handcraft feature engineering. These data are formulated as multi-channel 3D cubes, to be fed into our network, which are convolved by cross-channel 3D convolutions. By stacking successive convolutional layers without pooling, we build an end-to-end trainable model for nowcasting. Experimental results show that deep learning methods achieve better performance than traditional extrapolation methods. The qualitative analyses of 3D-SCN show encouraging results of nowcasting of storm initiation, growth, and advection.
- In application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regularized counterpart in causal accuracy and is competitive in predictive performance. We perform non-linear causality analysis by causally regularizing a special neural network architecture. We also show that the proposed causal regularizer can be used together with neural representation learning algorithms to yield up to 20% improvement over multilayer perceptron in detecting multivariate causation, a situation common in healthcare, where many causal factors should occur simultaneously to have an effect on the target variable.
- The problem of quantizing the activations of a deep neural network is considered. An examination of the popular binary quantization approach shows that this consists of approximating a classical non-linearity, the hyperbolic tangent, by two functions: a piecewise constant sign function, which is used in feedforward network computations, and a piecewise linear hard tanh function, used in the backpropagation step during network learning. The problem of approximating the ReLU non-linearity, widely used in the recent deep learning literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations commonly used in the literature. To overcome the problem of gradient mismatch, due to the use of different forward and backward approximations, several piece-wise backward approximators are then investigated. The implementation of the resulting quantized network, denoted as HWGQ-Net, is shown to achieve much closer performance to full precision networks, such as AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision networks, with 1-bit binary weights and 2-bit quantized activations.
- Dec 28 2016 cs.CV arXiv:1612.08484v1Nowadays the CNN is widely used in practical applications for image classification task. However the design of the CNN model is very professional work and which is very difficult for ordinary users. Besides, even for experts of CNN, to select an optimal model for specific task may still need a lot of time (to train many different models). In order to solve this problem, we proposed an automated CNN recommendation system for image classification task. Our system is able to evaluate the complexity of the classification task and the classification ability of the CNN model precisely. By using the evaluation results, the system can recommend the optimal CNN model and which can match the task perfectly. The recommendation process of the system is very fast since we don't need any model training. The experiment results proved that the evaluation methods are very accurate and reliable.
- Spatial relationships between objects provide important information for text-based image retrieval. As users are more likely to describe a scene from a real world perspective, using 3D spatial relationships rather than 2D relationships that assume a particular viewing direction, one of the main challenges is to infer the 3D structure that bridges images with users' text descriptions. However, direct inference of 3D structure from images requires learning from large scale annotated data. Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description. We present a generic framework for retrieving images from a textual description of a scene by matching images with these generated abstract object layouts. Images are ranked by matching object detection outputs (bounding boxes) to 2D layout candidates (also represented by bounding boxes) which are obtained by projecting the 3D scenes with sampled camera directions. We validate our approach using public indoor scene datasets and show that our method outperforms baselines built upon object occurrence histograms and learned 2D pairwise relations.
- Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: -Data insufficiency:Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results. -Interpretation:The representations learned by deep learning methods should align with medical knowledge. To address these challenges, we propose a GRaph-based Attention Model, GRAM that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies. Based on the data volume and the ontology structure, GRAM represents a medical concept as a combination of its ancestors in the ontology via an attention mechanism. We compared predictive performance (i.e. accuracy, data needs, interpretability) of GRAM to various methods including the recurrent neural network (RNN) in two sequential diagnoses prediction tasks and one heart failure prediction task. Compared to the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely observed in the training data and 3% improved area under the ROC curve for predicting heart failure using an order of magnitude less training data. Additionally, unlike other methods, the medical concept representations learned by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits intuitive attention behaviors by adaptively generalizing to higher level concepts when facing data insufficiency at the lower level concepts.
- Nov 10 2016 cs.SI arXiv:1611.02941v1How can we recognise social roles of people, given a completely unlabelled social network? We present a transfer learning approach to network role classification based on feature transformations from each network's local feature distribution to a global feature space. Experiments are carried out on real-world datasets. (See manuscript for the full abstract.)
- We denote by $\mathcal{P}_q$ the vector space of functions from a finite field $\mathbb{F}_q$ to itself, which can be represented as the space $\mathcal{P}_q := \mathbb{F}_q[x]/(x^q-x)$ of polynomial functions. We denote by $\mathcal{O}_n \subset \mathcal{P}_q$ the set of polynomials that are either the zero polynomial, or have at most $n$ distinct roots in $\mathbb{F}_q$. Given two subspaces $Y,Z$ of $\mathcal{P}_q$, we denote by $\langle Y,Z \rangle$ their span. We prove that the following are equivalent. A) Let $k, q$ integers, with $q$ a prime power and $2 \leq k \leq q$. Suppose that either: 1) $q$ is odd 2) $q$ is even and $k \not\in \{3, q-1\}$. Then there do not exist distinct subspaces $Y$ and $Z$ of $\mathcal{P}_q$ such that: 1') $dim(\langle Y, Z \rangle) = k$ 2') $dim(Y) = dim(Z) = k-1$. 3') $\langle Y, Z \rangle \subset \mathcal{O}_{k-1}$ 4') $Y, Z \subset \mathcal{O}_{k-2}$ 5') $Y\cap Z \subset \mathcal{O}_{k-3}$. B) The MDS conjecture is true for the given $(q,k)$.
- Optical flow refers to the visual motion observed between two consecutive images. Since the degree of freedom is typically much larger than the constraints imposed by the image observations, the straightforward formulation of optical flow inference is an ill-posed problem. By setting some type of additional "regularity" constraints, classical approaches formulate a well-posed optical flow inference problem in the form of a parameterized set of variational equations. In this work we build a mathematical connection, focused on optical flow methods, between classical variational optical flow approaches and Bayesian statistical inversion. A classical optical flow solution is in fact identical to a maximum a posteriori estimator under the assumptions of linear model with additive independent Gaussian noise and a Gaussian prior distribution. Unlike classical approaches, the statistical inversion approach to optical flow estimation not only allows for "point" estimates, but also provides a distribution of solutions which can be used for ensemble estimation and in particular uncertainty quantification.
- Scalable and automatic formal verification for concurrent systems is always demanding, but yet to be developed. In this paper, we propose a verification framework to support automated compositional reasoning for concurrent programs with shared variables. Our framework models concurrent programs as succinct automata and supports the verification of multiple important properties. Safety verification and simulations of succinct automata are parallel compositional, and safety properties of succinct automata are preserved under refinements. Formal verification of finite state succinct automata can be automated. Furthermore, we propose the first automated approach to checking rely-guarantee based simulations between infinite state concurrent programs. We have prototyped our algorithm and applied our tool to the verification of multiple refinements.
- Oct 28 2016 cs.SE arXiv:1610.08607v1Debugging is difficult. Recent studies show that automatic bug localization techniques have limited usefulness. One of the reasons is that programmers typically have to understand why the program fails before fixing it. In this work, we aim to help programmers understand a bug by automatically generating likely invariants which are violated in the failed tests. Given a program with an initial assertion and at least one test case failing the assertion, we first generate random test cases, identify potential bug locations through bug localization, and then generate program state mutation based on active learning techniques to identify a predicate "explaining" the cause of the bug. The predicate is a classifier for the passed test cases and failed test cases. Our main contribution is the application of invariant learning for bug explanation, as well as a novel approach to overcome the problem of lack of test cases in practice. We apply our method to real-world bugs and show the generated invariants are often correlated to the actual bug fixes.
- We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested sampling technique to overcome a key scalability problem in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense core tensor; the result of our approach is a faster, more space-efficient, and more accurate method. We extensively test our method on a real healthcare dataset, which is collected from 30K patients and results in an 18th order sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the full data set on a single multi-threaded machine. It can also do so more accurately and in less time than the state-of-the-art: on a 12th order subset of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a previously state-of-the-art method. Even for analyzing low order tensors (e.g., 4-order), our method requires close to an order of magnitude less time and over two orders of magnitude less memory, as compared to traditional tensor factorization methods such as CP and Tucker. Moreover, we observe that Sparse H-Tucker scales nearly linearly in the number of non-zero tensor elements. The resulting model also provides an interpretable disease hierarchy, which is confirmed by a clinical expert.
- Oct 25 2016 cs.LG arXiv:1610.07563v1We investigate a general framework of multiplicative multitask feature learning which decomposes each task's model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods have been proposed as special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task-specific component for all these regularizers, leading to a better understanding of the shrinkage effect. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. Empirical studies have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks.
- Oct 21 2016 cs.SE arXiv:1610.06371v2Precisely modeling complex systems like cyber-physical systems is often challenging, which may render model-based system verification techniques like model checking infeasible. To overcome this challenge, we propose a method called LAR to `verify' such complex systems through a combination of learning, abstraction and refinement. Instead of starting with system modeling, our method takes a set of concrete system traces as input. The output is either a counterexample with a bounded probability of being a spurious counterexample, or a probabilistic model based on which the given property is `verified'. The model could be viewed as a proof obligation, i.e., the property is verified if the model is correct. It can also be used for subsequent system analysis activities like runtime monitoring. Our method has been implemented as a self-contained software toolkit. The evaluation on multiple benchmark systems as well as a real-world water purification system show promising results.
- As the next generation cellular system, 5G network is required to provide a large variety of services for different kinds of terminals, from traditional voice and data services over mobile phones to small packet transmission over massive machine-type terminals. Although orthogonal-subcarrier based waveform has been widely used nowadays in many practical systems, it can hardly meet the future requirements in the coming 5G networks. Therefore, more flexible waveforms have been proposed to address the unprecedented challenges. In this article, we will provide comprehensive analysis and comparison for the typical waveform candidates. To obtain insightful analysis, we will not only introduce the basic principles of the waveforms but also reveal the underlying characteristics of each waveform. Moreover, a comprehensive comparison in terms of different performance metrics will be also presented in this article, which provide an overall understanding of the new waveforms.
- Cyber-physical systems (CPS), which integrate algorithmic control with physical processes, often consist of physically distributed components communicating over a network. A malfunctioning or compromised component in such a CPS can lead to costly consequences, especially in the context of public infrastructure. In this short paper, we argue for the importance of constructing invariants (or models) of the physical behaviour exhibited by CPS, motivated by their applications to the control, monitoring, and attestation of components. To achieve this despite the inherent complexity of CPS, we propose a new technique for learning invariants that combines machine learning with ideas from mutation testing. We present a preliminary study on a water treatment system that suggests the efficacy of this approach, propose strategies for establishing confidence in the correctness of invariants, then summarise some research questions and the steps we are taking to investigate them.
- Hybrid systems exhibit both continuous and discrete behavior. Analyzing hybrid systems is known to be hard. Inspired by the idea of concolic testing (of programs), we investigate whether we can combine random sampling and symbolic execution in order to effectively verify hybrid systems. We identify a sufficient condition under which such a combination is more effective than random sampling. Furthermore, we analyze different strategies of combining random sampling and symbolic execution and propose an algorithm which allows us to dynamically switch between them so as to reduce the overall cost. Our method has been implemented as a web-based checker named HyChecker. HyChecker has been evaluated with benchmark hybrid systems and a water treatment system in order to test its effectiveness.
- Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.
- Sparse representation presents an efficient approach to approximately recover a signal by the linear composition of a few bases from a learnt dictionary, based on which various successful applications have been observed. However, in the scenario of data compression, its efficiency and popularity are hindered due to the extra overhead for encoding the sparse coefficients. Therefore, how to establish an accurate rate model in sparse coding and dictionary learning becomes meaningful, which has been not fully exploited in the context of sparse representation. According to the Shannon entropy inequality, the variance of data source bounds its entropy, which can reflect the actual coding bits. Hence, in this work a Globally Variance-Constrained Sparse Representation (GVCSR) model is proposed, where a variance-constrained rate model is introduced in the optimization process. Specifically, we employ the Alternating Direction Method of Multipliers (ADMM) to solve the non-convex optimization problem for sparse coding and dictionary learning, both of which have shown state-of-the-art performance in image representation. Furthermore, we investigate the potential of GVCSR in practical image set compression, where a common dictionary is trained by several key images to represent the whole image set. Experimental results have demonstrated significant performance improvements against the most popular image codecs including JPEG and JPEG2000.
- Jul 20 2016 cs.CV arXiv:1607.05477v1Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The first stage is a multi-task Region Proposal Network (RPN), which simultaneously predicts candidate face regions along with associated facial landmarks. The candidate regions are then warped by mapping the detected facial landmarks to their canonical positions to better normalize the face patterns. The second stage, which is a RCNN, then verifies if the warped candidate regions are valid faces or not. We conduct end-to-end learning of the cascaded network, including optimizing the canonical positions of the facial landmarks. This supervised learning of the transformations automatically selects the best scale to differentiate face/non-face patterns. By combining feature maps from both stages of the network, we achieve state-of-the-art detection accuracies on several public benchmarks. For real-time performance, we run the cascaded network only on regions of interests produced from a boosting cascade face detector. Our detector runs at 30 FPS on a single CPU core for a VGA-resolution image.
- Jun 21 2016 cs.SI physics.soc-ph arXiv:1606.06159v3The emerging domain of data-enabled science necessitates development of algorithms and tools for knowledge discovery. Human interaction with data through well-constructed graphical representation can take special advantage of our visual ability to identify patterns. We develop a data visualization framework, called BiFold, for exploratory analysis of bipartite datasets that describe binary relationships between groups of objects. Typical data examples would include voting records, organizational memberships, and pairwise associations, or other binary datasets. BiFold provides a low dimensional embedding of data that represents similarity by visual nearness, analogous to Multidimensional Scaling (MDS). The unique and new feature of BiFold is its ability to simultaneously capture both within-group and between-group relationships among objects, enhancing knowledge discovery. We benchmark BiFold using the \it Southern Women Dataset, where social groups are now visually evident. We construct BiFold plots for two US voting datasets: For the presidential election outcomes since 1976, BiFold illustrates the evolving geopolitical structures that underlie these election results. For Senate congressional voting, BiFold identifies a partisan coordinate, separating senators into two parties while simultaneously visualizing a bipartisan-coalition coordinate which captures the ultimate fate of the bills (pass/fail). Finally, we consider a global cuisine dataset of the association between recipes and food ingredients. BiFold allows us to visually compare and contrast cuisines while also allowing identification of signature ingredients of individual cuisines.
- Jun 17 2016 cs.NE arXiv:1606.05169v1Evolutionary algorithms (EAs) have been well acknowledged as a promising paradigm for solving optimisation problems with multiple conflicting objectives in the sense that they are able to locate a set of diverse approximations of Pareto optimal solutions in a single run. EAs drive the search for approximated solutions through maintaining a diverse population of solutions and by recombining promising solutions selected from the population. Combining machine learning techniques has shown great potentials since the intrinsic structure of the Pareto optimal solutions of an multiobjective optimisation problem can be learned and used to guide for effective recombination. However, existing multiobjective EAs (MOEAs) based on structure learning spend too much computational resources on learning. To address this problem, we propose to use an online learning scheme. Based on the fact that offsprings along evolution are streamy, dependent and non-stationary (which implies that the intrinsic structure, if any, is temporal and scale-variant), an online agglomerative clustering algorithm is applied to adaptively discover the intrinsic structure of the Pareto optimal solution set; and to guide effective offspring recombination. Experimental results have shown significant improvement over five state-of-the-art MOEAs on a set of well-known benchmark problems with complicated Pareto sets and complex Pareto fronts.
- An essential function for automated vehicle technologies is accurate localization. It is difficult, however, to achieve lane-level accuracy with low-cost Global Navigation Satellite System (GNSS) receivers due to the biased noisy pseudo-range measurements. Approaches such as Differential GNSS can improve the accuracy, but usually require an enormous amount of investment in base stations. The emerging connected vehicle technologies provide an alternative approach to improving the localization accuracy. It has been shown in this paper that localization accuracy can be enhanced by fusing GNSS information within a group of connected vehicles and matching the configuration of the group to a digital map to eliminate the common bias in localization. A Rao-Blackwellized particle filter (RBPF) was used to jointly estimate the common biases of the pseudo-ranges and the vehicles positions. Multipath biases, which are non-common to vehicles, were mitigated by a multi-hypothesis detection-rejection approach. The temporal correlation was exploited through the prediction-update process. The proposed approach was compared to the existing static and smoothed static methods in the intersection scenario. Simulation results show that the proposed algorithm reduced the estimation error by fifty percent and reduced the estimation variance by two orders of magnitude.
- Understanding and even defining what constitutes animal interactions remains a challenging problem. Correlational tools may be inappropriate for detecting communication between a set of many agents exhibiting nonlinear behavior. A different approach is to define coordinated motions in terms of an information theoretic channel of direct causal information flow. In this work, we consider time series data obtained by an experimental protocol of optical tracking of the insect species Chironomus riparius. The data constitute reconstructed 3-D spatial trajectories of the insects' flight trajectories and kinematics. We present an application of the optimal causation entropy (oCSE) principle to identify direct causal relationships or information channels among the insects. The collection of channels inferred by oCSE describes a network of information flow within the swarm. We find that information channels with a long spatial range are more common than expected under the assumption that causal information flows should be spatially localized. The tools developed herein are general and applicable to the inference and study of intercommunication networks in a wide variety of natural settings.
- Collaborative Topic Regression (CTR) combines ideas of probabilistic matrix factorization (PMF) and topic modeling (e.g., LDA) for recommender systems, which has gained increasing successes in many applications. Despite enjoying many advantages, the existing CTR algorithms have some critical limitations. First of all, they are often designed to work in a batch learning manner, making them unsuitable to deal with streaming data or big data in real-world recommender systems. Second, the document-specific topic proportions of LDA are fed to the downstream PMF, but not reverse, which is sub-optimal as the rating information is not exploited in discovering the low-dimensional representation of documents and thus can result in a sub-optimal representation for prediction. In this paper, we propose a novel scheme of Online Bayesian Collaborative Topic Regression (OBCTR) which is efficient and scalable for learning from data streams. Particularly, we \it jointly optimize the combined objective function of both PMF and LDA in an online learning fashion, in which both PMF and LDA tasks can be reinforced each other during the online learning process. Our encouraging experimental results on real-world data validate the effectiveness of the proposed method.
- May 27 2016 cs.SE arXiv:1605.08278v4Many automated system analysis techniques (e.g., model checking, model-based testing) rely on first obtaining a model of the system under analysis. System modeling is often done manually, which is often considered as a hindrance to adopt model-based system analysis and development techniques. To overcome this problem, researchers have proposed to automatically "learn" models based on sample system executions and shown that the learned models can be useful sometimes. There are however many questions to be answered. For instance, how much shall we generalize from the observed samples and how fast would learning converge? Or, would the analysis result based on the learned model be more accurate than the estimation we could have obtained by sampling many system executions within the same amount of time? In this work, we investigate existing algorithms for learning probabilistic models for model checking, propose an evolution-based approach for better controlling the degree of generalization and conduct an empirical study in order to answer the questions. One of our findings is that the effectiveness of learning may sometimes be limited.
- May 23 2016 cs.CV arXiv:1605.06409v2We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn
- Apr 19 2016 cs.CV arXiv:1604.05144v1Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure. We note that for the topic of interactive image segmentation, scribbles are very widely used in academic research and commercial software, and are recognized as one of the most user-friendly ways of interacting. In this paper, we propose to use scribbles to annotate images, and develop an algorithm to train convolutional networks for semantic segmentation supervised by scribbles. Our algorithm is based on a graphical model that jointly propagates information from scribbles to unmarked pixels and learns network parameters. We present competitive object semantic segmentation results on the PASCAL VOC dataset by using scribbles as annotations. Scribbles are also favored for annotating stuff (e.g., water, sky, grass) that has no well-defined shape, and our method shows excellent results on the PASCAL-CONTEXT dataset thanks to extra inexpensive scribble annotations. Our scribble annotations on PASCAL VOC are available at http://research.microsoft.com/en-us/um/people/jifdai/downloads/scribble_sup
- Apr 08 2016 cs.CV arXiv:1604.01894v1Candidate text region extraction plays a critical role in convolutional neural network (CNN) based text detection from natural images. In this paper, we propose a CNN based scene text detection algorithm with a new text region extractor. The so called candidate text region extractor I-MSER is based on Maximally Stable Extremal Region (MSER), which can improve the independency and completeness of the extracted candidate text regions. Design of I-MSER is motivated by the observation that text MSERs have high similarity and are close to each other. The independency of candidate text regions obtained by I-MSER is guaranteed by selecting the most representative regions from a MSER tree which is generated according to the spatial overlapping relationship among the MSERs. A multi-layer CNN model is trained to score the confidence value of the extracted regions extracted by the I-MSER for text detection. The new text detection algorithm based on I-MSER is evaluated with wide-used ICDAR 2011 and 2013 datasets and shows improved detection performance compared to the existing algorithms.
- Apr 08 2016 cs.CV arXiv:1604.01891v1Scene text recognition plays an important role in many computer vision applications. The small size of available public available scene text datasets is the main challenge when training a text recognition CNN model. In this paper, we propose a CNN based Chinese text recognition algorithm. To enlarge the dataset for training the CNN model, we design a synthetic data engine for Chinese scene character generation, which generates representative character images according to the fonts use frequency of Chinese texts. As the Chinese text is more complex, the English text recognition CNN architecture is modified for Chinese text. To ensure the small size nature character dataset and the large size artificial character dataset are comparable in training, the CNN model are trained progressively. The proposed Chinese text recognition algorithm is evaluated with two Chinese text datasets. The algorithm achieves better recognize accuracy compared to the baseline methods.
- Mar 31 2016 cs.CV arXiv:1603.09016v2We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out of-domain datasets.