results for au:Sun_J in:cs

- Clinical notes are text documents that are created by clinicians for each patient encounter. They are typically accompanied by medical codes, which describe the diagnosis and treatment. Annotating these codes is labor intensive and error prone; furthermore, the connection between the codes and the text is not annotated, obscuring the reasons and details behind specific diagnoses and treatments. We present an attentional convolutional network that predicts medical codes from clinical text. Our method aggregates information across the document using a convolutional neural network, and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes. The method is accurate, achieving precision @ 8 of 0.7 and a Micro-F1 of 0.52, which are both significantly better than the prior state of the art. Furthermore, through an interpretability evaluation by a physician, we show that the attention mechanism identifies meaningful explanations for each code assignment.
- Graph convolutional network (GCN) is generalization of convolutional neural network (CNN) to work with arbitrarily structured graphs. A binary adjacency matrix is commonly used in training a GCN. Recently, the attention mechanism allows the network to learn a dynamic and adaptive aggregation of the neighborhood. We propose a new GCN model on the graphs where edges are characterized in multiple views or precisely in terms of multiple relationships. For instance, in chemical graph theory, compound structures are often represented by the hydrogen-depleted molecular graph where nodes correspond to atoms and edges correspond to chemical bonds. Multiple attributes can be important to characterize chemical bonds, such as atom pair (the types of atoms that a bond connects), aromaticity, and whether a bond is in a ring. The different attributes lead to different graph representations for the same molecule. There is growing interests in both chemistry and machine learning fields to directly learn molecular properties of compounds from the molecular graph, instead of from fingerprints predefined by chemists. The proposed GCN model, which we call edge attention-based multi-relational GCN (EAGCN), jointly learns attention weights and node features in graph convolution. For each bond attribute, a real-valued attention matrix is used to replace the binary adjacency matrix. By designing a dictionary for the edge attention, and forming the attention matrix of each molecule by looking up the dictionary, the EAGCN exploits correspondence between bonds in different molecules. The prediction of compound properties is based on the aggregated node features, which is independent of the varying molecule (graph) size. We demonstrate the efficacy of the EAGCN on multiple chemical datasets: Tox21, HIV, Freesolv, and Lipophilicity, and interpret the resultant attention weights.
- This paper presents a new predictive second order sliding controller (PSSC) formulation for setpoint tracking of constrained linear systems. The PSSC scheme is developed by combining the concepts of model predictive control (MPC) and second order discrete sliding mode control. In order to guarantee the feasibility of the PSSC during setpoint changes, a virtual reference variable is added to the PSSC cost function to calculate the closest admissible set point. The states of the system are then driven asymptotically to this admissible setpoint by the control action of the PSSC. The performance of the proposed PSSC is evaluated for an advanced automotive engine case study, where a high fidelity physics-based model of a reactivity controlled compression ignition (RCCI) engine is utilized to serve as the virtual test-bed for the simulations. Considering the hard physical constraints on the RCCI engine states and control inputs, simultaneous tracking of engine load and optimal combustion phasing is a challenging objective to achieve. The simulation results of testing the proposed PSSC on the high fidelity RCCI model show that the developed predictive controller is able to track desired engine load and combustion phasing setpoints, with minimum steady state error, and no overshoot. Moreover, the simulation results confirm the robust tracking performance of the PSSC during transient operations, in the presence of engine cyclic variability.
- Feb 06 2018 cs.MA physics.soc-ph arXiv:1802.01194v1Understanding the mechanics behind the coordinated movement of mobile animal groups provides key insights into their biology and ecology while also yielding algorithms for bio-inspired technologies and autonomous systems. It is becoming increasingly clear that many mobile animal groups are composed of heterogeneous individuals with differential levels and types of influence over group behaviors--often considered as "leaders". The ability to infer this differential influence, or leadership, is critical to understanding group functioning in these collective animal systems. Due to the broad interpretation of leadership, many different measures and mathematical tools are used to describe and infer "leadership", e.g., position, causality, influence, information flow. But a key question remains: which, if any, of these concepts actually describes leadership? We argue that instead of asserting a single definition or notion of leadership, the complex interaction rules and dynamics typical of a group implies that leadership itself is not merely a scalar quantity, but rather, a complex combination of many different components. In this manuscript we develop an anatomy of leadership, identify several principle components and provide a general mathematical framework for discussing leadership. With real and synthetic examples we then illustrate how to use this framework. We believe this multifaceted approach to leadership definition will allow for a broader understanding the role of leadership and its inference from data in mobile animal groups and beyond.
- Cyber-physical systems (CPS) consist of sensors, actuators, and controllers all communicating over a network; if any subset becomes compromised, an attacker could cause significant damage. With access to data logs and a model of the CPS, the physical effects of an attack could potentially be detected before any damage is done. Manually building a model that is accurate enough in practice, however, is extremely difficult. In this paper, we propose a novel approach for constructing models of CPS automatically, by applying supervised machine learning to data traces obtained after systematically seeding their software components with faults ("mutants"). We demonstrate the efficacy of this approach on the simulator of a real-world water purification plant, presenting a framework that automatically generates mutants, collects data traces, and learns an SVM-based model. Using cross-validation and statistical model checking, we show that the learnt model characterises an invariant physical property of the system. Furthermore, we demonstrate the usefulness of the invariant by subjecting the system to 55 network and code-modification attacks, and showing that it can detect 85% of them from the data logs generated at runtime.
- Automated segmentation of intracranial arteries on magnetic resonance angiography (MRA) allows for quantification of cerebrovascular features, which provides tools for understanding aging and pathophysiological adaptations of the cerebrovascular system. Using a convolutional autoencoder (CAE) for segmentation is promising as it takes advantage of the autoencoder structure in effective noise reduction and feature extraction by representing high dimensional information with low dimensional latent variables. In this report, an optimized CAE model (Y-net) was trained to learn a 3D segmentation model of intracranial arteries from 49 cases of MRA data. The trained model was shown to perform better than the three traditional segmentation methods in both binary classification and visual evaluation.
- Dec 19 2017 cs.SE arXiv:1712.06025v1Symbolic execution is a well established method for test input generation. By taking inputs as symbolic values and solving constraints encoding path conditions, it helps achieve a better test coverage. Despite of having achieved tremendous success over numeric domains, existing symbolic execution techniques for heap-based programs (e.g., linked lists and trees) are limited due to the lack of a succinct and precise description for symbolic values over unbounded heaps. In this work, we present a new symbolic execution method for heap-based programs using separation logic. The essence of our proposal is the use of existential quantifiers to precisely represent symbolic heaps. Furthermore, we propose a context-sensitive lazy initialization, a novel approach for efficient test input generation.We show that by reasoning about the heap in an existential manner, the proposed lazy initialization is sound and complete. We have implemented our proposal into a prototype tool, called Java StarFinder, and evaluated it on a set of programs with complex heap inputs. The results show that our approach significantly reduces the number of invalid test inputs and improves the test coverage.
- Dec 13 2017 cs.AI arXiv:1712.04155v1Modeling and verifying real-world cyber-physical systems are challenging, especially so for complex systems where manually modeling is infeasible. In this work, we report our experience on combining model learning and abstraction refinement to analyze a challenging system, i.e., a real-world Secure Water Treatment (SWaT) system. Given a set of safety requirements, the objective is to either show that the system is safe with a high probability (so that a system shutdown is rarely triggered due to safety violation) or otherwise. As the system is too complicated to be manually modelled, we apply latest automatic model learning techniques to construct a set of Markov chains through abstraction and refinement, based on two long system execution logs (one for training and the other for testing). For each probabilistic property, we either report it does not hold with a certain level of probabilistic confidence, or report that it holds by showing the evidence in the form of an abstract Markov chain. The Markov chains can subsequently be implemented as runtime monitors in SWaT. This is the first case study of applying model learning techniques to a real-world system as far as we know.
- While convergence of the Alternating Direction Method of Multipliers (ADMM) on convex problems is well studied, convergence on nonconvex problems is only partially understood. In this paper, we consider the Gaussian phase retrieval problem, formulated as a linear constrained optimization problem with a biconvex objective. The particular structure allows for a novel application of the ADMM. It can be shown that the dual variable is zero at the global minimizer. This motivates the analysis of a block coordinate descent algorithm, which is equivalent to the ADMM with the dual variable fixed to be zero. We show that the block coordinate descent algorithm converges to the global minimizer at a linear rate, when starting from a deterministically achievable initialization point.
- Nov 23 2017 cs.CV arXiv:1711.08184v2In this paper, we propose a novel method called AlignedReID that extracts a global feature which is jointly learned with local features. Global feature learning benefits greatly from local feature learning, which performs an alignment/matching by calculating the shortest path between two sets of local features, without requiring extra supervision. After the joint learning, we only keep the global feature to compute the similarities between images. Our method achieves rank-1 accuracy of 94.4% on Market1501 and 97.8% on CUHK03, outperforming state-of-the-art methods by a large margin. We also evaluate human-level performance and demonstrate that our method is the first to surpass human-level performance on Market1501 and CUHK03, two widely used Person ReID datasets.
- Nov 22 2017 cs.CV arXiv:1711.07752v1Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in real-world scenarios. In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowd-robust localization. Our detector trained by repulsion loss outperforms all the state-of-the-art methods with a significant improvement in occlusion cases.
- Nov 21 2017 cs.CV arXiv:1711.07319v1The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
- Nov 21 2017 cs.CV arXiv:1711.07264v2In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available.
- Nov 21 2017 cs.CV arXiv:1711.07240v2The improvements in recent CNN-based object detection works, from R-CNN [11] and Fast/Faster R-CNN [10, 29] to recent Mask R-CNN [14] and RetinaNet [22], mainly come from new network, or framework, or loss design. But mini-batch size, a key factor in the training, has not been well studied. In this paper, we propose a Large Mini-Batch Object Detector (MegDet) to enable the training with much larger mini-batch size than before (e.g. from 16 to 256), so that we can effectively utilize multiple GPUs (up to 128 in our experiments) to significantly shorten the training time. Technically, we suggest a learning rate policy and Cross- GPU Batch Normalization, which together allow us to suc- cessfully train a large mini-batch detector in much less time (e.g., from 33 hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st place of Detection task.
- Nov 03 2017 cs.CV arXiv:1711.00583v1There is an emerging trend to leverage noisy image datasets in many visual recognition tasks. However, the label noise among the datasets severely degenerates the \mboxperformance of deep learning approaches. Recently, one mainstream is to introduce the latent label to handle label noise, which has shown promising improvement in the network designs. Nevertheless, the mismatch between latent labels and noisy labels still affects the predictions in such methods. To address this issue, we propose a quality embedding model, which explicitly introduces a quality variable to represent the trustworthiness of noisy labels. Our key idea is to identify the mismatch between the latent and noisy labels by embedding the quality variables into different subspaces, which effectively minimizes the noise effect. At the same time, the high-quality labels is still able to be applied for training. To instantiate the model, we further propose a Contrastive-Additive Noise network (CAN), which consists of two important layers: (1) the contrastive layer estimates the quality variable in the embedding space to reduce noise effect; and (2) the additive layer aggregates the prior predictions and noisy labels as the posterior to train the classifier. Moreover, to tackle the optimization difficulty, we deduce an SGD algorithm with the reparameterization tricks, which makes our method scalable to big data. We conduct the experimental evaluation of the proposed method over a range of noisy image datasets. Comprehensive results have demonstrated CAN outperforms the state-of-the-art deep learning approaches.
- Nonparametric estimation of mutual information is used in a wide range of scientific problems to determine the dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical systems attractors. In a series of numerical examples, this g-knn estimator of mutual information outperforms the Kraskov-Stögbauer-Grassberger (KSG) estimator both when the joint distribution is thinly supported, and when sample sizes are small. In particular, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large but data size is limited.
- Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits.
- Oct 10 2017 cs.DB arXiv:1710.02817v1Missing and incorrect values often cause serious consequences. To deal with these data quality problems, a class of common employed tools are dependency rules, such as Functional Dependencies (FDs), Conditional Functional Dependencies (CFDs) and Edition Rules (ERs), etc. The stronger expressing ability a dependency has, data with the better quality can be obtained. To the best of our knowledge, all previous dependencies treat each attribute value as a non-splittable whole. Actually however, in many applications, part of a value may contains meaningful information, indicating that more powerful dependency rules to handle data quality problems are possible. In this paper, we consider of discovering such type of dependencies in which the left hand side is part of a regular-expression-like paradigm, named Paradigm Dependencies (PDs). PDs tell that if a string matches the paradigm, element at the specified position can decides a certain other attribute's value. We propose a framework in which strings with similar coding rules and different lengths are clustered together and aligned vertically, from which PDs can be discovered directly. The aligning problem is the key component of this framework and is proved in NP-Complete. A greedy algorithm is introduced in which the clustering and aligning tasks can be accomplished simultaneously. Because of the greedy algorithm's high time complexity, several pruning strategies are proposed to reduce the running time. In the experimental study, three real datasets as well as several synthetical datasets are employed to verify our methods' effectiveness and efficiency.
- Sep 28 2017 cs.CV arXiv:1709.09641v1Multi-atlas segmentation approach is one of the most widely-used image segmentation techniques in biomedical applications. There are two major challenges in this category of methods, i.e., atlas selection and label fusion. In this paper, we propose a novel multi-atlas segmentation method that formulates multi-atlas segmentation in a deep learning framework for better solving these challenges. The proposed method, dubbed deep fusion net (DFN), is a deep architecture that integrates a feature extraction subnet and a non-local patch-based label fusion (NL-PLF) subnet in a single network. The network parameters are learned by end-to-end training strategy for automatically learning deep features that enable optimal performance in a NL-PLF framework. Besides, the learned deep features are further utilized in defining a similarity measure for atlas selection. We evaluate our proposed method on two public cardiac MR databases of SATA-13 and LV-09 for left ventricle segmentation, and our learned DFNs with extracted deep features for atlas selection at testing phase achieve state-of-the-art accuracies, e.g., 0.833 in averaged Dice metric (ADM) on SATA-13 database and 0.95 in ADM for epicardium segmentation on LV-09 database. Besides, our method is robust to the cross-database evaluation, e.g., the DFN learned on LV-09 database achieves 0.815 in ADM on SATA-13 database. We also test our proposed method on Cardiac Atlas Project (CAP) testing set of MICCAI 2013 SATA Segmentation Challenge, and our method achieves 0.815 in Dice metric, ranking as the highest result on this dataset.
- Design, Modeling and Dynamic Compensation PID Control of a Fully-Actuated Aerial Manipulation SystemSep 26 2017 cs.RO arXiv:1709.08054v1This paper addresses design, modeling and dynamic-compensation PID (dc-PID) control of a novel type of fully-actuated aerial manipulation (AM) system. Firstly, design of novel mechanical structure of the AM is presented. Secondly, kinematics and dynamics of AM are modeled using Craig parameters and recursion Newton-Euler equations respectively, which give rise to a more accurate dynamic relationship between aerial platform and manipulator. Then, the dynamic-compensation PID control is proposed to solve the problem of fully-actuated control of AM. Finally, uniform coupled matrix equations between driving forces/moments and rotor speeds are derived, which can support design and analysis of parameters and decoupling theoretically. It is taken into account practical problems including noise and perturbation, parameter uncertainty, and power limitation in simulations, and results from simulations shows that the AM system presented can be fully-actued controlled with advanced control performances, which can not achieved theoretically in traditional AM. And with compared to backstepping control dc-PID has better control accuracy and capability to disturbance rejection in two simulations of aerial operation tasks with motion of joint. The experiment of dc-pid proves the availability and effectiveness of the method proposed.
- Sep 19 2017 cs.SY arXiv:1709.05457v1We proposed a fusion mechanism for the distributed cooperative map matching (CMM) within the vehicular ad-hoc network. This mechanism makes the information from each node reachable within the network by other nodes without direct communication, thus improving the overall localization accuracy and robustness. Each node runs a Rao-Blackwellized particle filter (RBPF) that processes the Global Navigation Satellite System (GNSS) measurements of its own and its neighbors, followed by a map matching step that reduces or eliminates the GNSS atmospheric error. Then each node fuses its own filtered results with those from its neighbors for a better estimation. In this work, the complicated dynamics and fusion mechanics of these RBPFs are represented by a linear dynamical system. We proposed a distributed optimization framework that explores the model to improve both robustness and accuracy of the distributed CMM. The effectiveness of this distributed optimization framework is illustrated by simulation results on realistic vehicular networks drawn from data, compared with the centralized one and a decentralized one with random fusion weights.
- Sep 19 2017 cs.LG arXiv:1709.05342v2In this paper, we propose and evaluate the application of unsupervised machine learning to anomaly detection for a Cyber-Physical System (CPS). We compare two methods: Deep Neural Networks (DNN) adapted to time series data generated by a CPS, and one-class Support Vector Machines (SVM). These methods are evaluated against data from the Secure Water Treatment (SWaT) testbed, a scaled-down but fully operational raw water purification plant. For both methods, we first train detectors using a log generated by SWaT operating under normal conditions. Then, we evaluate the performance of both methods using a log generated by SWaT operating under 36 different attack scenarios. We find that our DNN generates fewer false positives than our one-class SVM while our SVM detects slightly more anomalies. Overall, our DNN has a slightly better F measure than our SVM. We discuss the characteristics of the DNN and one-class SVM used in this experiment, and compare the advantages and disadvantages of the two methods.
- Sep 08 2017 cs.CV arXiv:1709.01993v1Lighting estimation from face images is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image. Lacking massive ground truth lighting labels for face images in the wild, we use an existing method to estimate lighting parameters, which are treated as ground truth with unknown noises. To alleviate the effect of such noises, we utilize the idea of Generative Adversarial Networks (GAN) and propose a Label Denoising Adversarial Network (LDAN) to make use of synthetic data with accurate ground truth to help train a deep CNN for lighting regression on real face images. Experiments show that our network outperforms existing methods in producing consistent lighting parameters of different faces under similar lighting conditions. Moreover, our method is 100,000 times faster in execution time than prior optimization-based lighting estimation approaches.
- In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.
- We introduce a new model for building conditional generative models in a semi-supervised setting to conditionally generate data given attributes by adapting the GAN framework. The proposed semi-supervised GAN (SS-GAN) model uses a pair of stacked discriminators to learn the marginal distribution of the data, and the conditional distribution of the attributes given the data respectively. In the semi-supervised setting, the marginal distribution (which is often harder to learn) is learned from the labeled + unlabeled data, and the conditional distribution is learned purely from the labeled data. Our experimental results demonstrate that this model performs significantly better compared to existing semi-supervised conditional GAN models.
- Long Short-Term Memory (LSTM) is the primary recurrent neural networks architecture for acoustic modeling in automatic speech recognition systems. Residual learning is an efficient method to help neural networks converge easier and faster. In this paper, we propose several types of residual LSTM methods for our acoustic modeling. Our experiments indicate that, compared with classic LSTM, our architecture shows more than 8% relative reduction in Phone Error Rate (PER) on TIMIT tasks. At the same time, our residual fast LSTM approach shows 4% relative reduction in PER on the same task. Besides, we find that all this architecture could have good results on THCHS-30, Librispeech and Switchboard corpora.
- Jul 27 2017 cs.LG arXiv:1707.08262v1Sleep disorders, such as sleep apnea, parasomnias, and hypersomnia, affect 50-70 million adults in the United States (Hillman et al., 2006). Overnight polysomnography (PSG), including brain monitoring using electroencephalography (EEG), is a central component of the diagnostic evaluation for sleep disorders. While PSG is conventionally performed by trained technologists, the recent rise of powerful neural network learning algorithms combined with large physiological datasets offers the possibility of automation, potentially making expert-level sleep analysis more widely available. We propose SLEEPNET (Sleep EEG neural network), a deployed annotation tool for sleep staging. SLEEPNET uses a deep recurrent neural network trained on the largest sleep physiology database assembled to date, consisting of PSGs from over 10,000 patients from the Massachusetts General Hospital (MGH) Sleep Laboratory. SLEEPNET achieves human-level annotation performance on an independent test set of 1,000 EEGs, with an average accuracy of 85.76% and algorithm-expert inter-rater agreement (IRA) of kappa = 79.46%, comparable to expert-expert IRA.
- Jul 20 2017 cs.CV arXiv:1707.06168v2In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.
- The handwritten string recognition is still a challengeable task, though the powerful deep learning tools were introduced. In this paper, based on TAO-FCN, we proposed an end-to-end system for handwritten string recognition. Compared with the conventional methods, there is no preprocess nor manually designed rules employed. With enough labelled data, it is easy to apply the proposed method to different applications. Although the performance of the proposed method may not be comparable with the state-of-the-art approaches, it's usability and robustness are more meaningful for practical applications.
- Jul 05 2017 cs.CV arXiv:1707.01083v2We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.
- Jun 20 2017 cs.SY arXiv:1706.06027v1In this paper, we develop two zonotope-based set-membership estimation algorithms for identification of time-varying parameters in linear models, where both additive and multiplicative uncertainties are treated explicitly. The two recursive algorithms can be differentiated by their ways of processing the data and required computations. The first algorithm, which is referred to as Cone And Zonotope Intersection (CAZI), requires solving linear programming problems at each iteration. The second algorithm, referred to as the Polyhedron And Zonotope Intersection (PAZI), involves linear programming as well as an optimization subject to linear matrix inequalities (LMIs). Both algorithms are capable of providing tight overbounds of the feasible solution set (FSS) in our numerical case studies. Furthermore, PAZI provides an additional opportunity of further analyzing the relation between the estimation results at different iterations. An application to health monitoring of marine engines is considered to demonstrate the utility and effectiveness of the algorithms.
- Jun 09 2017 cs.HC arXiv:1706.02637v2Gaze-based virtual keyboards provide an effective interface for text entry by eye movements. The efficiency and usability of these keyboards have traditionally been evaluated with conventional text entry performance measures such as words per minute, keystrokes per character, backspace usage, etc. However, in comparison to the traditional text entry approaches, gaze-based typing involves natural eye movements that are highly correlated with human brain cognition. Employing eye gaze as an input could lead to excessive mental demand, and in this work we argue the need to include cognitive load as an eye typing evaluation measure. We evaluate three variations of gaze-based virtual keyboards, which implement variable designs in terms of word suggestion positioning. The conventional text entry metrics indicate no significant difference in the performance of the different keyboard designs. However, STFT (Short-time Fourier Transform) based analysis of EEG signals indicate variances in the mental workload of participants while interacting with these designs. Moreover, the EEG analysis provides insights into the user's cognition variation for different typing phases and intervals, which should be considered in order to improve eye typing usability.
- The study of social networks --- where people are located, geographically, and how they might be connected to one another --- is a current hot topic of interest, because of its immediate relevance to important applications, from devising efficient immunization techniques for the arrest of epidemics, to the design of better transportation and city planning paradigms, to the understanding of how rumors and opinions spread and take shape over time. We develop a spatial social complex network (SSCN) model that captures not only essential connectivity features of real-life social networks, including a heavy-tailed degree distribution and high clustering, but also the spatial location of individuals, reproducing Zipf's law for the distribution of city populations as well as other observed hallmarks. We then simulate Milgram's Small-World experiment on our SSCN model, obtaining good qualitative agreement with the known results and shedding light on the role played by various network attributes and the strategies used by the players in the game. This demonstrates the potential of the SSCN model for the simulation and study of the many social processes mentioned above, where both connectivity and geography play a role in the dynamics.
- May 22 2017 cs.CV arXiv:1705.06869v1Compressive sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR images from a small number of under-sampled data in k-space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and speed, in this paper, we propose two novel deep architectures, dubbed ADMM-Nets in basic and generalized versions. ADMM-Nets are defined over data flow graphs, which are derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a general CS-based MRI model. They take the sampled k-space data as inputs and output reconstructed MR images. Moreover, we extend our network to cope with complex-valued MR images. In the training phase, all parameters of the nets, e.g., transforms, shrinkage functions, etc., are discriminatively trained end-to-end. In the testing phase, they have computational overhead similar to ADMM algorithm but use optimized parameters learned from the data for CS-based reconstruction task. We investigate different configurations in network structures and conduct extensive experiments on MR image reconstruction under different sampling rates. Due to the combination of the advantages in model-based approach and deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction accuracies with fast computational speed.
- May 02 2017 cs.SY arXiv:1705.00568v2Cooperative map matching (CMM) uses the Global Navigation Satellite System (GNSS) position information of a group of vehicles to improve the standalone localization accuracy. It has been shown, in our previous work, that the GNSS error can be reduced from several meters to sub-meter level by matching the biased GNSS positioning to a digital map with road constraints. While further error reduction is expected by increasing the number of participating vehicles, fundamental questions on how the vehicle membership within CMM affects the performance of the CMM results need to be addressed to provide guidelines for design and optimization of the vehicle network. This work presents a theoretical study that establishes a framework for quantitative evaluation of the impact of the road constraints on the CMM accuracy. More specifically, a closed-form expression of the CMM error in terms of the road constraints and GNSS error is derived based on a simple CMM rule. The asymptotic decay of the CMM error as the number of vehicles increases is established and justified through numerical simulations. Moreover, it is proved that the CMM error can be minimized if the directions of the roads on which the connected vehicles travel obey a uniform distribution. Finally, the localization accuracy of CMM is evaluated based on the Safety Pilot Model Deployment and Pillar dataset of Ann Arbor traffic flow collected over three years period. The contributions of this work include establishing a theoretical foundation for CMM as well as providing insight and motivation for applications of CMM.
- Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.
- We describe a prototype dialogue response generation model for the customer service domain at Amazon. The model, which is trained in a weakly supervised fashion, measures the similarity between customer questions and agent answers using a dual encoder network, a Siamese-like neural network architecture. Answer templates are extracted from embeddings derived from past agent answers, without turn-by-turn annotations. Responses to customer inquiries are generated by selecting the best template from the final set of templates. We show that, in a closed domain like customer service, the selected templates cover $>$70\% of past customer inquiries. Furthermore, the relevance of the model-selected templates is significantly higher than templates selected by a standard tf-idf baseline.
- Mar 28 2017 cs.SY arXiv:1703.08818v2Cooperative map matching (CMM) uses the Global Navigation Satellite System (GNSS) positioning of a group of vehicles to improve the standalone localization accuracy. It has been shown to reduce GNSS error from several meters to sub-meter level by matching the biased GNSS positioning of four vehicles to a digital map with road constraints in our previous work. While further error reduction is expected by increasing the number of participating vehicles, fundamental questions on how the vehicle membership of the CMM affects the performance of the GNSS-based localization results need to be addressed to provide guidelines for design and optimization of the vehicle network. The quantitative relationship between the estimation error and the road constraints has to be systematically investigated to provide insights. In this work, a theoretical study is presented that aims at developing a framework for quantitatively evaluating effects of the road constraints on the CMM accuracy and for eventual optimization of the CMM network. More specifically, a closed form expression of the CMM error in terms of the road angles and GNSS error is first derived based on a simple CMM rule. Then a Branch and Bound algorithm and a Cross Entropy method are developed to minimize this error by selecting the optimal group of vehicles under two different assumptions about the GNSS error variance.
- Mar 24 2017 cs.SI arXiv:1703.08100v2In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.
- Mar 23 2017 cs.DB arXiv:1703.07617v1Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.
- Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN.
- In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To handle such data, the state-of-the-art tensor model is the so-called PARAFAC2, which yields interpretable and robust output and can naturally handle sparse data. However, its main limitation up to now has been the lack of efficient algorithms that can handle large-scale datasets. In this work, we fill this gap by developing a scalable method to compute the PARAFAC2 decomposition of large and sparse datasets, called SPARTan. Our method exploits special structure within PARAFAC2, leading to a novel algorithmic reformulation that is both fast (in absolute time) and more memory-efficient than prior work. We evaluate SPARTan on both synthetic and real datasets, showing 22X performance gains over the best previous implementation and also handling larger problem instances for which the baseline fails. Furthermore, we are able to apply SPARTan to the mining of temporally-evolving phenotypes on data taken from real and medically complex pediatric patients. The clinical meaningfulness of the phenotypes identified in this process, as well as their temporal evolution over time for several patients, have been endorsed by clinical experts.
- Mar 09 2017 cs.CV arXiv:1703.02719v1One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity. However, in the field of semantic segmenta- tion, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the clas- sification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the ob- ject boundaries. Our approach achieves state-of-art perfor- mance on two public benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.
- Mar 08 2017 cs.SY arXiv:1703.02098v1Cooperative localization with map matching has been shown to reduce Global Navigation Satellite System (GNSS) localization error from several meters to sub-meter level by fusing the GNSS measurements of four vehicles in our previous work. While further error reduction is expected to be achievable by increasing the number of vehicles, the quantitative relationship between the estimation error and the number of connected vehicles has neither been systematically investigated nor analytically proved. In this work, a theoretical study is presented that analytically proves the correlation between the localization error and the number of connected vehicles in two cases of practical interest. More specifically, it is shown that, under the assumption of small non-common error, the expected square error of the GNSS common error correction is inversely proportional to the number of vehicles, if the road directions obey a uniform distribution, or inversely proportional to logarithm of the number of vehicles, if the road directions obey a Bernoulli distribution. Numerical simulations are conducted to justify these analytic results. Moreover, the simulation results show that the aforementioned error decrement rates hold even when the assumption of small non-common error is violated.
- Feb 28 2017 cs.CV arXiv:1702.07971v1Most of computer vision focuses on what is in an image. We propose to train a standalone object-centric context representation to perform the opposite task: seeing what is not there. Given an image, our context model can predict where objects should exist, even when no object instances are present. Combined with object detection results, we can perform a novel vision task: finding where objects are missing in an image. Our model is based on a convolutional neural network structure. With a specially designed training strategy, the model learns to ignore objects and focus on context only. It is fully convolutional thus highly efficient. Experiments show the effectiveness of the proposed approach in one important accessibility task: finding city street regions where curb ramps are missing, which could help millions of people with mobility disabilities.
- Feb 28 2017 cs.CV arXiv:1702.07975v1Like other problems in computer vision, offline handwritten Chinese character recognition (HCCR) has achieved impressive results using convolutional neural network (CNN)-based methods. However, larger and deeper networks are needed to deliver state-of-the-art results in this domain. Such networks intuitively appear to incur high computational cost, and require the storage of a large number of parameters, which renders them unfeasible for deployment in portable devices. To solve this problem, we propose a Global Supervised Low-rank Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve the problems of speed and storage capacity. We design a nine-layer CNN for HCCR consisting of 3,755 classes, and devise an algorithm that can reduce the networks computational cost by nine times and compress the network to 1/18 of the original size of the baseline model, with only a 0.21% drop in accuracy. In tests, the proposed algorithm surpassed the best single-network performance reported thus far in the literature while requiring only 2.3 MB for storage. Furthermore, when integrated with our effective forward implementation, the recognition of an offline character image took only 9.7 ms on a CPU. Compared with the state-of-the-art CNN model for HCCR, our approach is approximately 30 times faster, yet 10 times more cost efficient.
- Feb 27 2017 cs.SY arXiv:1702.07536v1In this paper, the problem of event-triggered consensus for linear continuous-time multi-agent systems is investigated. A new event-triggered consensus protocol based on a predictor is proposed to achieve consensus without continuous communication among agents. In the proposed consensus protocol, each agent only needs to monitor its states to determine its event-triggered instants. When an event is triggered, the agent will update its consensus protocol and sent its state information to its neighbors. In addition, the agent will also update its consensus protocol and the predictor when it receives the state information from its neighbors. A necessary and sufficient condition that the consensus problem can be solved is derived. Moreover, it is proved that Zeno behavior does not exist. Finally, a numerical example is given to illustrate that the protocol proposed in this paper can make the multi-agent systems achieve consensus through much fewer event-triggered times.
- Feb 21 2017 cs.SY arXiv:1702.05792v2A crucial function for automated vehicle technologies is accurate localization. Lane-level accuracy is not readily available from low-cost Global Navigation Satellite System (GNSS) receivers because of factors such as multipath error and atmospheric bias. Approaches such as Differential GNSS can improve localization accuracy, but usually require investment in expensive base stations. Connected vehicle technologies provide an alternative approach to improving the localization accuracy. It will be shown in this paper that localization accuracy can be enhanced using crude GNSS measurements from a group of connected vehicles, by matching their locations to a digital map. A Rao-Blackwellized particle filter (RBPF) is used to jointly estimate the common biases of the pseudo-ranges and the vehicle positions. Multipath biases, which introduce receiver-specific (non-common) error, are mitigated by a multi-hypothesis detection-rejection approach. The temporal correlation of the estimations is exploited through the prediction-update process. The proposed approach is compared to existing methods using both simulations and experimental results. It was found that the proposed algorithm can eliminate the common biases and reduce the localization error to below 1 meter under open sky conditions.
- Feb 16 2017 cs.CV arXiv:1702.04517v2Convective storm nowcasting has attracted substantial attention in various fields. Existing methods under a deep learning framework rely primarily on radar data. Although they perform nowcast storm advection well, it is still challenging to nowcast storm initiation and growth, due to the limitations of the radar observations. This paper describes the first attempt to nowcast storm initiation, growth, and advection simultaneously under a deep learning framework using multi-source meteorological data. To this end, we present a multi-channel 3D-cube successive convolution network (3D-SCN). As real-time re-analysis meteorological data can now provide valuable atmospheric boundary layer thermal dynamic information, which is essential to predict storm initiation and growth, both raw 3D radar and re-analysis data are used directly without any handcraft feature engineering. These data are formulated as multi-channel 3D cubes, to be fed into our network, which are convolved by cross-channel 3D convolutions. By stacking successive convolutional layers without pooling, we build an end-to-end trainable model for nowcasting. Experimental results show that deep learning methods achieve better performance than traditional extrapolation methods. The qualitative analyses of 3D-SCN show encouraging results of nowcasting of storm initiation, growth, and advection.
- In application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regularized counterpart in causal accuracy and is competitive in predictive performance. We perform non-linear causality analysis by causally regularizing a special neural network architecture. We also show that the proposed causal regularizer can be used together with neural representation learning algorithms to yield up to 20% improvement over multilayer perceptron in detecting multivariate causation, a situation common in healthcare, where many causal factors should occur simultaneously to have an effect on the target variable.
- The problem of quantizing the activations of a deep neural network is considered. An examination of the popular binary quantization approach shows that this consists of approximating a classical non-linearity, the hyperbolic tangent, by two functions: a piecewise constant sign function, which is used in feedforward network computations, and a piecewise linear hard tanh function, used in the backpropagation step during network learning. The problem of approximating the ReLU non-linearity, widely used in the recent deep learning literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations commonly used in the literature. To overcome the problem of gradient mismatch, due to the use of different forward and backward approximations, several piece-wise backward approximators are then investigated. The implementation of the resulting quantized network, denoted as HWGQ-Net, is shown to achieve much closer performance to full precision networks, such as AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision networks, with 1-bit binary weights and 2-bit quantized activations.
- Dec 28 2016 cs.CV arXiv:1612.08484v1Nowadays the CNN is widely used in practical applications for image classification task. However the design of the CNN model is very professional work and which is very difficult for ordinary users. Besides, even for experts of CNN, to select an optimal model for specific task may still need a lot of time (to train many different models). In order to solve this problem, we proposed an automated CNN recommendation system for image classification task. Our system is able to evaluate the complexity of the classification task and the classification ability of the CNN model precisely. By using the evaluation results, the system can recommend the optimal CNN model and which can match the task perfectly. The recommendation process of the system is very fast since we don't need any model training. The experiment results proved that the evaluation methods are very accurate and reliable.
- Spatial relationships between objects provide important information for text-based image retrieval. As users are more likely to describe a scene from a real world perspective, using 3D spatial relationships rather than 2D relationships that assume a particular viewing direction, one of the main challenges is to infer the 3D structure that bridges images with users' text descriptions. However, direct inference of 3D structure from images requires learning from large scale annotated data. Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description. We present a generic framework for retrieving images from a textual description of a scene by matching images with these generated abstract object layouts. Images are ranked by matching object detection outputs (bounding boxes) to 2D layout candidates (also represented by bounding boxes) which are obtained by projecting the 3D scenes with sampled camera directions. We validate our approach using public indoor scene datasets and show that our method outperforms baselines built upon object occurrence histograms and learned 2D pairwise relations.
- Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: -Data insufficiency:Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results. -Interpretation:The representations learned by deep learning methods should align with medical knowledge. To address these challenges, we propose a GRaph-based Attention Model, GRAM that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies. Based on the data volume and the ontology structure, GRAM represents a medical concept as a combination of its ancestors in the ontology via an attention mechanism. We compared predictive performance (i.e. accuracy, data needs, interpretability) of GRAM to various methods including the recurrent neural network (RNN) in two sequential diagnoses prediction tasks and one heart failure prediction task. Compared to the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely observed in the training data and 3% improved area under the ROC curve for predicting heart failure using an order of magnitude less training data. Additionally, unlike other methods, the medical concept representations learned by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits intuitive attention behaviors by adaptively generalizing to higher level concepts when facing data insufficiency at the lower level concepts.
- Nov 10 2016 cs.SI arXiv:1611.02941v1How can we recognise social roles of people, given a completely unlabelled social network? We present a transfer learning approach to network role classification based on feature transformations from each network's local feature distribution to a global feature space. Experiments are carried out on real-world datasets. (See manuscript for the full abstract.)
- We denote by $\mathcal{P}_q$ the vector space of functions from a finite field $\mathbb{F}_q$ to itself, which can be represented as the space $\mathcal{P}_q := \mathbb{F}_q[x]/(x^q-x)$ of polynomial functions. We denote by $\mathcal{O}_n \subset \mathcal{P}_q$ the set of polynomials that are either the zero polynomial, or have at most $n$ distinct roots in $\mathbb{F}_q$. Given two subspaces $Y,Z$ of $\mathcal{P}_q$, we denote by $\langle Y,Z \rangle$ their span. We prove that the following are equivalent. A) Let $k, q$ integers, with $q$ a prime power and $2 \leq k \leq q$. Suppose that either: 1) $q$ is odd 2) $q$ is even and $k \not\in \{3, q-1\}$. Then there do not exist distinct subspaces $Y$ and $Z$ of $\mathcal{P}_q$ such that: 1') $dim(\langle Y, Z \rangle) = k$ 2') $dim(Y) = dim(Z) = k-1$. 3') $\langle Y, Z \rangle \subset \mathcal{O}_{k-1}$ 4') $Y, Z \subset \mathcal{O}_{k-2}$ 5') $Y\cap Z \subset \mathcal{O}_{k-3}$. B) The MDS conjecture is true for the given $(q,k)$.
- Optical flow refers to the visual motion observed between two consecutive images. Since the degree of freedom is typically much larger than the constraints imposed by the image observations, the straightforward formulation of optical flow inference is an ill-posed problem. By setting some type of additional "regularity" constraints, classical approaches formulate a well-posed optical flow inference problem in the form of a parameterized set of variational equations. In this work we build a mathematical connection, focused on optical flow methods, between classical variational optical flow approaches and Bayesian statistical inversion. A classical optical flow solution is in fact identical to a maximum a posteriori estimator under the assumptions of linear model with additive independent Gaussian noise and a Gaussian prior distribution. Unlike classical approaches, the statistical inversion approach to optical flow estimation not only allows for "point" estimates, but also provides a distribution of solutions which can be used for ensemble estimation and in particular uncertainty quantification.
- Scalable and automatic formal verification for concurrent systems is always demanding, but yet to be developed. In this paper, we propose a verification framework to support automated compositional reasoning for concurrent programs with shared variables. Our framework models concurrent programs as succinct automata and supports the verification of multiple important properties. Safety verification and simulations of succinct automata are parallel compositional, and safety properties of succinct automata are preserved under refinements. Formal verification of finite state succinct automata can be automated. Furthermore, we propose the first automated approach to checking rely-guarantee based simulations between infinite state concurrent programs. We have prototyped our algorithm and applied our tool to the verification of multiple refinements.
- Oct 28 2016 cs.SE arXiv:1610.08607v1Debugging is difficult. Recent studies show that automatic bug localization techniques have limited usefulness. One of the reasons is that programmers typically have to understand why the program fails before fixing it. In this work, we aim to help programmers understand a bug by automatically generating likely invariants which are violated in the failed tests. Given a program with an initial assertion and at least one test case failing the assertion, we first generate random test cases, identify potential bug locations through bug localization, and then generate program state mutation based on active learning techniques to identify a predicate "explaining" the cause of the bug. The predicate is a classifier for the passed test cases and failed test cases. Our main contribution is the application of invariant learning for bug explanation, as well as a novel approach to overcome the problem of lack of test cases in practice. We apply our method to real-world bugs and show the generated invariants are often correlated to the actual bug fixes.
- We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested sampling technique to overcome a key scalability problem in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense core tensor; the result of our approach is a faster, more space-efficient, and more accurate method. We extensively test our method on a real healthcare dataset, which is collected from 30K patients and results in an 18th order sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the full data set on a single multi-threaded machine. It can also do so more accurately and in less time than the state-of-the-art: on a 12th order subset of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a previously state-of-the-art method. Even for analyzing low order tensors (e.g., 4-order), our method requires close to an order of magnitude less time and over two orders of magnitude less memory, as compared to traditional tensor factorization methods such as CP and Tucker. Moreover, we observe that Sparse H-Tucker scales nearly linearly in the number of non-zero tensor elements. The resulting model also provides an interpretable disease hierarchy, which is confirmed by a clinical expert.
- Oct 25 2016 cs.LG arXiv:1610.07563v1We investigate a general framework of multiplicative multitask feature learning which decomposes each task's model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods have been proposed as special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task-specific component for all these regularizers, leading to a better understanding of the shrinkage effect. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. Empirical studies have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks.
- Oct 21 2016 cs.SE arXiv:1610.06371v2Precisely modeling complex systems like cyber-physical systems is often challenging, which may render model-based system verification techniques like model checking infeasible. To overcome this challenge, we propose a method called LAR to `verify' such complex systems through a combination of learning, abstraction and refinement. Instead of starting with system modeling, our method takes a set of concrete system traces as input. The output is either a counterexample with a bounded probability of being a spurious counterexample, or a probabilistic model based on which the given property is `verified'. The model could be viewed as a proof obligation, i.e., the property is verified if the model is correct. It can also be used for subsequent system analysis activities like runtime monitoring. Our method has been implemented as a self-contained software toolkit. The evaluation on multiple benchmark systems as well as a real-world water purification system show promising results.
- As the next generation cellular system, 5G network is required to provide a large variety of services for different kinds of terminals, from traditional voice and data services over mobile phones to small packet transmission over massive machine-type terminals. Although orthogonal-subcarrier based waveform has been widely used nowadays in many practical systems, it can hardly meet the future requirements in the coming 5G networks. Therefore, more flexible waveforms have been proposed to address the unprecedented challenges. In this article, we will provide comprehensive analysis and comparison for the typical waveform candidates. To obtain insightful analysis, we will not only introduce the basic principles of the waveforms but also reveal the underlying characteristics of each waveform. Moreover, a comprehensive comparison in terms of different performance metrics will be also presented in this article, which provide an overall understanding of the new waveforms.
- Cyber-physical systems (CPS), which integrate algorithmic control with physical processes, often consist of physically distributed components communicating over a network. A malfunctioning or compromised component in such a CPS can lead to costly consequences, especially in the context of public infrastructure. In this short paper, we argue for the importance of constructing invariants (or models) of the physical behaviour exhibited by CPS, motivated by their applications to the control, monitoring, and attestation of components. To achieve this despite the inherent complexity of CPS, we propose a new technique for learning invariants that combines machine learning with ideas from mutation testing. We present a preliminary study on a water treatment system that suggests the efficacy of this approach, propose strategies for establishing confidence in the correctness of invariants, then summarise some research questions and the steps we are taking to investigate them.
- Hybrid systems exhibit both continuous and discrete behavior. Analyzing hybrid systems is known to be hard. Inspired by the idea of concolic testing (of programs), we investigate whether we can combine random sampling and symbolic execution in order to effectively verify hybrid systems. We identify a sufficient condition under which such a combination is more effective than random sampling. Furthermore, we analyze different strategies of combining random sampling and symbolic execution and propose an algorithm which allows us to dynamically switch between them so as to reduce the overall cost. Our method has been implemented as a web-based checker named HyChecker. HyChecker has been evaluated with benchmark hybrid systems and a water treatment system in order to test its effectiveness.
- Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.
- Sparse representation presents an efficient approach to approximately recover a signal by the linear composition of a few bases from a learnt dictionary, based on which various successful applications have been observed. However, in the scenario of data compression, its efficiency and popularity are hindered due to the extra overhead for encoding the sparse coefficients. Therefore, how to establish an accurate rate model in sparse coding and dictionary learning becomes meaningful, which has been not fully exploited in the context of sparse representation. According to the Shannon entropy inequality, the variance of data source bounds its entropy, which can reflect the actual coding bits. Hence, in this work a Globally Variance-Constrained Sparse Representation (GVCSR) model is proposed, where a variance-constrained rate model is introduced in the optimization process. Specifically, we employ the Alternating Direction Method of Multipliers (ADMM) to solve the non-convex optimization problem for sparse coding and dictionary learning, both of which have shown state-of-the-art performance in image representation. Furthermore, we investigate the potential of GVCSR in practical image set compression, where a common dictionary is trained by several key images to represent the whole image set. Experimental results have demonstrated significant performance improvements against the most popular image codecs including JPEG and JPEG2000.
- Jul 20 2016 cs.CV arXiv:1607.05477v1Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The first stage is a multi-task Region Proposal Network (RPN), which simultaneously predicts candidate face regions along with associated facial landmarks. The candidate regions are then warped by mapping the detected facial landmarks to their canonical positions to better normalize the face patterns. The second stage, which is a RCNN, then verifies if the warped candidate regions are valid faces or not. We conduct end-to-end learning of the cascaded network, including optimizing the canonical positions of the facial landmarks. This supervised learning of the transformations automatically selects the best scale to differentiate face/non-face patterns. By combining feature maps from both stages of the network, we achieve state-of-the-art detection accuracies on several public benchmarks. For real-time performance, we run the cascaded network only on regions of interests produced from a boosting cascade face detector. Our detector runs at 30 FPS on a single CPU core for a VGA-resolution image.
- Jun 21 2016 cs.SI physics.soc-ph arXiv:1606.06159v3The emerging domain of data-enabled science necessitates development of algorithms and tools for knowledge discovery. Human interaction with data through well-constructed graphical representation can take special advantage of our visual ability to identify patterns. We develop a data visualization framework, called BiFold, for exploratory analysis of bipartite datasets that describe binary relationships between groups of objects. Typical data examples would include voting records, organizational memberships, and pairwise associations, or other binary datasets. BiFold provides a low dimensional embedding of data that represents similarity by visual nearness, analogous to Multidimensional Scaling (MDS). The unique and new feature of BiFold is its ability to simultaneously capture both within-group and between-group relationships among objects, enhancing knowledge discovery. We benchmark BiFold using the \it Southern Women Dataset, where social groups are now visually evident. We construct BiFold plots for two US voting datasets: For the presidential election outcomes since 1976, BiFold illustrates the evolving geopolitical structures that underlie these election results. For Senate congressional voting, BiFold identifies a partisan coordinate, separating senators into two parties while simultaneously visualizing a bipartisan-coalition coordinate which captures the ultimate fate of the bills (pass/fail). Finally, we consider a global cuisine dataset of the association between recipes and food ingredients. BiFold allows us to visually compare and contrast cuisines while also allowing identification of signature ingredients of individual cuisines.
- Jun 17 2016 cs.NE arXiv:1606.05169v1Evolutionary algorithms (EAs) have been well acknowledged as a promising paradigm for solving optimisation problems with multiple conflicting objectives in the sense that they are able to locate a set of diverse approximations of Pareto optimal solutions in a single run. EAs drive the search for approximated solutions through maintaining a diverse population of solutions and by recombining promising solutions selected from the population. Combining machine learning techniques has shown great potentials since the intrinsic structure of the Pareto optimal solutions of an multiobjective optimisation problem can be learned and used to guide for effective recombination. However, existing multiobjective EAs (MOEAs) based on structure learning spend too much computational resources on learning. To address this problem, we propose to use an online learning scheme. Based on the fact that offsprings along evolution are streamy, dependent and non-stationary (which implies that the intrinsic structure, if any, is temporal and scale-variant), an online agglomerative clustering algorithm is applied to adaptively discover the intrinsic structure of the Pareto optimal solution set; and to guide effective offspring recombination. Experimental results have shown significant improvement over five state-of-the-art MOEAs on a set of well-known benchmark problems with complicated Pareto sets and complex Pareto fronts.