results for au:Chen_Y in:cs

- We revisit the classic problem of proving safety over parameterised concurrent systems, i.e., an infinite family of finite-state concurrent systems that are represented by some finite (symbolic) means. An example of such an infinite family is a dining philosopher protocol with any number n of processes (n being the parameter that defines the infinite family). Regular model checking is a well-known generic framework for modelling parameterised concurrent systems, where an infinite set of configurations (resp. transitions) is represented by a regular set (resp. regular transducer). Although verifying safety properties in the regular model checking framework is undecidable in general, many sophisticated semi-algorithms have been developed in the past fifteen years that can successfully prove safety in many practical instances. In this paper, we propose a simple solution to synthesise regular inductive invariants that makes use of Angluin's classic L* algorithm (and its variants). We provide a termination guarantee when the set of configurations reachable from a given set of initial configurations is regular. We have tested L* algorithm on standard (as well as new) examples in regular model checking including the dining philosopher protocol, the dining cryptographer protocol, and several mutual exclusion protocols (e.g. Bakery, Burns, Szymanski, and German). Our experiments show that, despite the simplicity of our solution, it can perform at least as well as existing semi-algorithms.
- The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies Generalized Advantage Estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios where the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely-separated events occur, adversely affecting the policies learned. Additionally, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.
- When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework-DC-Prophet-based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.
- Sep 19 2017 cs.CL arXiv:1709.05475v1Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition. The central ideas of CTC include adding a label "blank" during training. With this mechanism, CTC eliminates the need of segment alignment, and hence has been applied to various sequence-to-sequence learning problems. In this work, we applied CTC to abstractive summarization for spoken content. The "blank" in this case implies the corresponding input data are less important or noisy; thus it can be ignored. This approach was shown to outperform the existing methods in term of ROUGE scores over Chinese Gigaword and MATBN corpora. This approach also has the nice property that the ordering of words or characters in the input documents can be better preserved in the generated summaries.
- Sep 19 2017 cs.CV arXiv:1709.06031v1Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly or they may not even produce any result at all. This paper explores the orthogonal approach of processing each frame independently, i.e disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS-S), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent video segmentation databases, which show that OSVOS-S is both the fastest and most accurate method in the state of the art.
- Sep 19 2017 cs.LG arXiv:1709.05342v1In this paper, we propose and evaluate the application of unsupervised machine learning to anomaly detection for a Cyber-Physical System (CPS). We compare two methods: Deep Neural Networks (DNN) adapted to time series data generated by a CPS, and one-class Support Vector Machines (SVM). These methods are evaluated against data from the Secure Water Treatment (SWaT) testbed, a scaled-down but fully operational raw water purification plant. For both methods, we first train detectors using a log generated by SWaT operating under normal conditions. Then, we evaluate the performance of both methods using a log generated by SWaT operating under 36 different attack scenarios. We find that our DNN generates fewer false positives than our one-class SVM while our SVM detects slightly more anomalies. Overall, our DNN has a slightly better F measure than our SVM. We discuss the characteristics of the DNN and one-class SVM used in this experiment, and compare the advantages and disadvantages of the two methods.
- Model compression is significant for wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and in business clusters requiring quick responses to large-scale service requests. In this work, we focus on reducing the sizes of basic structures (including input updates, gates, hidden states, cell states and outputs) within Long Short-term Memory (LSTM) units, so as to learn homogeneously-sparse LSTMs. Independently reducing the sizes of those basic structures can result in unmatched dimensions among them, and consequently, end up with illegal LSTM units. To overcome this, we propose Intrinsic Sparse Structures (ISS) in LSTMs. By reducing one component of ISS, the sizes of basic structures are simultaneously reduced by one such that the consistency of dimensions is maintained. By learning ISS within LSTM units, the eventual LSTMs are still regular LSTMs but have much smaller sizes of basic structures. Our method is successfully evaluated by state-of-the-art LSTMs in applications of language modeling (of Penn TreeBank dataset) and machine Question Answering (of SQuAD dataset). Our source code is public available.
- Sep 18 2017 cs.CV arXiv:1709.05188v1Occluded face detection is a challenging detection task due to the large appearance variations incurred by various real-world occlusions. This paper introduces an Adversarial Occlusion-aware Face Detector (AOFD) by simultaneously detecting occluded faces and segmenting occluded areas. Specifically, we employ an adversarial training strategy to generate occlusion-like face features that are difficult for a face detector to recognize. Occlusion mask is predicted simultaneously while detecting occluded faces and the occluded area is utilized as an auxiliary instead of being regarded as a hindrance. Moreover, the supervisory signals from the segmentation branch will reversely affect the features, aiding in detecting heavily-occluded faces accordingly. Consequently, AOFD is able to find the faces with few exposed facial landmarks with very high confidences and keeps high detection accuracy even for masked faces. Extensive experiments demonstrate that AOFD not only significantly outperforms state-of-the-art methods on the MAFA occluded face detection dataset, but also achieves competitive detection accuracy on benchmark dataset for general face detection such as FDDB.
- Sep 14 2017 cs.CV arXiv:1709.04121v1Sketch is an important media for human to communicate ideas, which reflects the superiority of human intelligence. Studies on sketch can be roughly summarized into recognition and generation. Existing models on image recognition failed to obtain satisfying performance on sketch classification. But for sketch generation, a recent study proposed a sequence-to-sequence variational-auto-encoder (VAE) model called sketch-rnn which was able to generate sketches based on human inputs. The model achieved amazing results when asked to learn one category of object, such as an animal or a vehicle. However, the performance dropped when multiple categories were fed into the model. Here, we proposed a model called sketch-pix2seq which could learn and draw multiple categories of sketches. Two modifications were made to improve the sketch-rnn model: one is to replace the bidirectional recurrent neural network (BRNN) encoder with a convolutional neural network(CNN); the other is to remove the Kullback-Leibler divergence from the objective function of VAE. Experimental results showed that models with CNN encoders outperformed those with RNN encoders in generating human-style sketches. Visualization of the latent space illustrated that the removal of KL-divergence made the encoder learn a posterior of latent space that reflected the features of different categories. Moreover, the combination of CNN encoder and removal of KL-divergence, i.e., the sketch-pix2seq model, had better performance in learning and generating sketches of multiple categories and showed promising results in creativity tasks.
- Sep 14 2017 cs.CV arXiv:1709.04303v1Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN, we adopt the stacked convolutional layers to effectively capture the contextual dependencies of the input sequence, which is characterized by lower computational complexity and easier parallel computation. Compared to the chain structure of recurrent networks, the Convolutional Neural Network (CNN) provides a natural way to capture long-term dependencies between elements, which is 9 times faster than Bidirectional Long Short-Term Memory (BLSTM). Furthermore, in order to enhance the representation of foreground text and suppress the background noise, we incorporate the residual attention modules into a small densely connected network to improve the discriminability of CNN features. We validate the performance of our approach on the standard benchmarks, including the Street View Text, IIIT5K and ICDAR datasets. As a result, state-of-the-art or highly-competitive performance and efficiency show the superiority of the proposed approach.
- Sep 04 2017 cs.DS arXiv:1709.00378v1A lattice is the integer span of some linearly independent vectors. Lattice problems have many significant applications in coding theory and cryptographic systems for their conjectured hardness. The Shortest Vector Problem (SVP), which is to find the shortest non-zero vector in a lattice, is one of the well-known problems that are believed to be hard to solve, even with a quantum computer. In this paper we propose space-efficient classical and quantum algorithms for solving SVP. Currently the best time-efficient algorithm for solving SVP takes $2^{n+o(n)}$ time and $2^{n+o(n)}$ space. Our classical algorithm takes $2^{2.05n+o(n)}$ time to solve SVP with only $2^{0.5n+o(n)}$ space. We then modify our classical algorithm to a quantum version, which can solve SVP in time $2^{1.2553n+o(n)}$ with $2^{0.5n+o(n)}$ classical space and only poly(n) qubits.
- Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.
- Aug 30 2017 cs.CR arXiv:1708.08519v1Domain squatting is a common adversarial practice where attackers register domain names that are purposefully similar to popular domains. In this work, we study a specific type of domain squatting called "combosquatting," in which attackers register domains that combine a popular trademark with one or more phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first large-scale, empirical study of combosquatting by analyzing more than 468 billion DNS records---collected from passive and active DNS data sources over almost six years. We find that almost 60% of abusive combosquatting domains live for more than 1,000 days, and even worse, we observe increased activity associated with combosquatting year over year. Moreover, we show that combosquatting is used to perform a spectrum of different types of abuse including phishing, social engineering, affiliate abuse, trademark abuse, and even advanced persistent threats. Our results suggest that combosquatting is a real problem that requires increased scrutiny by the security community.
- This paper presents GRAPHR, the first ReRAM-based graph processing accelerator. GRAPHR follows the principle of near-data processing but explores the opportunity of per-forming massive parallel operations with low hardware and energy cost. Compared to recent works in applying ReRAM to more regular neural computations, we are faced with several challenges: 1) The graph data are stored in the com-pressed format, instead of matrix forms, making it impossible to perform direct in-situ computations in memory; 2) It is less intuitive to map various graph algorithms to ReRAM with hardware constrains; 3) Coordinating data movements among ReRAM crossbars and memory to achieve high throughput. GRAPHR is a novel accelerator architecture consisting of two major components: memory ReRAM and graph engine (GE). The core graph computations are performed in sparse matrix format in GEs (ReRAM crossbars), which perform efficient matrix-vector multiplications. The vector/matrix-based graph computation is not new, but ReRAM offers the unique opportunity to realize the massive parallelism with unprecedented energy efficiency and low hardware cost. Due to the same cost/performance tradeoff, with ReRAM, the gain of performing parallel operations overshadows the wastes due to sparsity in matrix view within a small subgraph. Moreover, it naturally enables near data processing with reduced data movements. The experiment results show that GRAPHR achieves a16.01x (up to132.67x) speedup and an33.82x energy saving on geometric mean compared to a CPU baseline system.
- Sparse Subspace Clustering (SSC) is a state-of-the-art method for clustering high-dimensional data points lying in a union of low-dimensional subspaces. However, while $\ell_1$ optimization-based SSC algorithms suffer from high computational complexity, other variants of SSC, such as Orthogonal Matching Pursuit-based SSC (OMP-SSC), lose clustering accuracy in pursuit of improving time efficiency. In this letter, we propose a novel Active OMP-SSC, which improves clustering accuracy of OMP-SSC by adaptively updating data points and randomly dropping data points in the OMP process, while still enjoying the low computational complexity of greedy pursuit algorithms. We provide heuristic analysis of our approach, and explain how these two active steps achieve a better tradeoff between connectivity and separation. Numerical results on both synthetic data and real-world data validate our analyses and show the advantages of the proposed active algorithm.
- The capacity of a wireless network with fractal and hierarchical social communications is studied in this paper. Specifically, we mathematically formulate the self-similarity of a fractal wireless network by a power-law degree distribution $ P(k) $, and we capture the direct social connection feature between two nodes with degree $ k_{1} $ and $ k_{2} $ by a joint probability distribution $ P(k_{1},k_{2}) $. Firstly, for a fractal wireless network with direct social communications, it is proved that the maximum capacity is $ \Theta\left(\frac{1}{\sqrt{n\log n}}\right) $ with $ n $ denotes the total number of nodes in the network, if the source node communicates with one of its direct contacts randomly, and it can reach up to $ \Theta\left(\frac{1}{\log n}\right) $ if the two nodes with distance $ d $ communicate according to the probability in proportion to $ d^{-\beta} $. Secondly, since humans might get in touch with others without direct connection but through the inter-conneced users, the fractal wireless networks with hierarchical social communications is studied as well, and the related capacity is derived based on the results in the case with direct social communications. Our results show that this capacity is mainly affected by the correlation exponent $\epsilon$ of the fractal networks. The capacity is reduced in proportional to $ \frac{1}{{\log n}} $ if $ 2<\epsilon<3 $, while the reduction coefficient is $ \frac{1}{n} $ if $ \epsilon=3 $.
- Aug 15 2017 cs.CV arXiv:1708.04181v1This paper studies the Tensor Robust Principal Component (TRPCA) problem which extends the known Robust PCA \citeRPCA to the tensor case. Our model is based on a new tensor Singular Value Decomposition (t-SVD) \citekilmer2011factorization and its induced tensor tubal rank and tensor nuclear norm. Consider that we have a 3-way tensor $\bm{\mathcal{X}}\in\mathbb{R}^{n_1\times n_2\times n_3}$ such that $\bm{\mathcal{X}}=\bm{\mathcal{L}}_0+\bm{\mathcal{S}}_0$, where $\bm{\mathcal{L}}_0$ has low tubal rank and $\bm{\mathcal{S}}_0$ is sparse. Is that possible to recover both components? In this work, we prove that under certain suitable assumptions, we can recover both the low-rank and the sparse components exactly by simply solving a convex program whose objective is a weighted combination of the tensor nuclear norm and the $\ell_1$-norm, i.e., \beginalign* \min_\bm\mathcalL,\bm\mathcalE \ \|\bm\mathcalL\|_*+\lambda\|\bm\mathcalE\|_1, \ \texts.t. \ \bm\mathcalX=\bm\mathcalL+\bm\mathcalE, \endalign* where $\lambda= {1}/{\sqrt{\max(n_1,n_2)n_3}}$. Interestingly, TRPCA involves RPCA as a special case when $n_3=1$ and thus it is a simple and elegant tensor extension of RPCA. Also numerical experiments verify our theory and the application for the image denoising demonstrates the effectiveness of our method.
- Skyline queries have wide-ranging applications in fields that involve multi-criteria decision making, including tourism, retail industry, and human resources. By automatically removing incompetent candidates, skyline queries allow users to focus on a subset of superior data items (i.e., the skyline), thus reducing the decision-making overhead. However, users are still required to interpret and compare these superior items manually before making a successful choice. This task is challenging because of two issues. First, people usually have fuzzy, unstable, and inconsistent preferences when presented with multiple candidates. Second, skyline queries do not reveal the reasons for the superiority of certain skyline points in a multi-dimensional space. To address these issues, we propose SkyLens, a visual analytic system aiming at revealing the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two scenarios demonstrate the usefulness of SkyLens on two datasets with a dozen of attributes. A qualitative study is also conducted to show that users can efficiently accomplish skyline understanding and comparison tasks with SkyLens.
- Aug 09 2017 cs.CV arXiv:1708.02349v1We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.
- We present Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch. State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks. Model fine-tuning for the detection task could alleviate this bias to some extent but not fundamentally. Besides, transferring pre-trained models from classification to detection between discrepant domains is even more difficult (e.g. RGB to depth images). A better solution to tackle these two critical problems is to train object detectors from scratch, which motivates our proposed DSOD. Previous efforts in this direction mostly failed due to much more complicated loss functions and limited training data in object detection. In DSOD, we contribute a set of design principles for training object detectors from scratch. One of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector. Combining with several other principles, we develop DSOD following the single-shot detection (SSD) framework. Experiments on PASCAL VOC 2007, 2012 and MS COCO datasets demonstrate that DSOD can achieve better results than the state-of-the-art solutions with much more compact models. For instance, DSOD outperforms SSD on all three benchmarks with real-time detection speed, while requires only 1/2 parameters to SSD and 1/10 parameters to Faster RCNN. Our code and models are available at: https://github.com/szq0214/DSOD .
- Aug 04 2017 cs.CV arXiv:1708.01001v1Low-bit deep neural networks (DNNs) become critical for embedded applications due to their low storage requirement and computing efficiency. However, they suffer much from the non-negligible accuracy drop. This paper proposes the stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs. The motivation is due to the following observation. Existing training algorithms approximate the real-valued elements/filters with low-bit representation all together in each iteration. The quantization errors may be small for some elements/filters, while are remarkable for others, which lead to inappropriate gradient direction during training, and thus bring notable accuracy drop. Instead, SQ quantizes a portion of elements/filters to low-bit with a stochastic probability inversely proportional to the quantization error, while keeping the other portion unchanged with full-precision. The quantized and full-precision portions are updated with corresponding gradients separately in each iteration. The SQ ratio is gradually increased until the whole network is quantized. This procedure can greatly compensate the quantization error and thus yield better accuracy for low-bit DNNs. Experiments show that SQ can consistently and significantly improve the accuracy for different low-bit DNNs on various datasets and various network structures.
- Aug 04 2017 cs.SY arXiv:1708.00939v1The composite load model (CLM) proposed by the Western Electricity Coordinating Council (WECC) is gaining increasing traction in industry, particularly in North America. At the same time, it has been recognized that further improvements in structure, initialization and aggregation methods are needed to enhance model accuracy. However, the lack of an open-source implementation of the WECC CLM has become a roadblock for many researchers for further improvement. To bridge this gap, this paper presents the first open reference implementation of the WECC CLM. Individual load components and the CLM are first developed and tested in Matlab, then translated to the high performance computing (HPC) based, parallel simulation framework - GridPACK. The main contributions of the paper include: 1) presenting important yet undocumented details of modeling and initializing the CLM, particularly for a parallel simulation frame-work like GridPACK; 2) implementation details of the load components such as the single-phase air conditioner motor; 3) implementing the CLM in a modular and extensible manner. The implementation has been tested at both the component as well as system levels and benchmarked against commercial simulation programs, with satisfactory accuracy.
- This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise binary comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model---the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity---the number of paired comparisons needed to ensure exact top-$K$ identification. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established based on a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative optimization procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan $\sin\Theta$ theorem for symmetric matrices.
- Aug 01 2017 physics.med-ph cs.CV arXiv:1707.09636v2Compressive sensing (CS) has proved effective for tomographic reconstruction from sparsely collected data or under-sampled measurements, which are practically important for few-view CT, tomosynthesis, interior tomography, and so on. To perform sparse-data CT, the iterative reconstruction commonly use regularizers in the CS framework. Currently, how to choose the parameters adaptively for regularization is a major open problem. In this paper, inspired by the idea of machine learning especially deep learning, we unfold a state-of-the-art "fields of experts" based iterative reconstruction scheme up to a number of iterations for data-driven training, construct a Learned Experts' Assessment-based Reconstruction Network ("LEARN") for sparse-data CT, and demonstrate the feasibility and merits of our LEARN network. The experimental results with our proposed LEARN network produces a competitive performance with the well-known Mayo Clinic Low-Dose Challenge Dataset relative to several state-of-the-art methods, in terms of artifact reduction, feature preservation, and computational speed. This is consistent to our insight that because all the regularization terms and parameters used in the iterative reconstruction are now learned from the training data, our LEARN network utilizes application-oriented knowledge more effectively and recovers underlying images more favorably than competing algorithms. Also, the number of layers in the LEARN network is only 12, reducing the computational complexity of typical iterative algorithms by orders of magnitude.
- Scenario generation is an important step in the operation and planning of power systems with high renewable penetrations. In this work, we proposed a data-driven approach for scenario generation using generative adversarial networks, which is based on two interconnected deep neural networks. Compared with existing methods based on probabilistic models that are often hard to scale or sample from, our method is data-driven, and captures renewable energy production patterns in both temporal and spatial dimensions for a large number of correlated resources. For validation, we use wind and solar times-series data from NREL integration data sets. We demonstrate that the proposed method is able to generate realistic wind and photovoltaic power profiles with full diversity of behaviors. We also illustrate how to generate scenarios based on different conditions of interest by using labeled data during training. For example, scenarios can be conditioned on weather events~(e.g. high wind day) or time of the year~(e,g. solar generation for a day in July). Because of the feedforward nature of the neural networks, scenarios can be generated extremely efficiently without sophisticated sampling techniques.
- The 10th Asia-Europe workshop in "Concepts in Information Theory and Communications" AEW10 was held in Boppard, Germany on June 21-23, 2017. It is based on a longstanding cooperation between Asian and European scientists. The first workshop was held in Eindhoven, the Netherlands in 1989. The idea of the workshop is threefold: 1) to improve the communication between the scientist in the different parts of the world; 2) to exchange knowledge and ideas; and 3) to pay a tribute to a well respected and special scientist.
- Jul 27 2017 cs.CV arXiv:1707.08289v1Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait images. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.
- We present MoodSwipe, a soft keyboard that suggests text messages given the user-specified emotions utilizing the real dialog data. The aim of MoodSwipe is to create a convenient user interface to enjoy the technology of emotion classification and text suggestion, and at the same time to collect labeled data automatically for developing more advanced technologies. While users select the MoodSwipe keyboard, they can type as usual but sense the emotion conveyed by their text and receive suggestions for their message as a benefit. In MoodSwipe, the detected emotions serve as the medium for suggested texts, where viewing the latter is the incentive to correcting the former. We conduct several experiments to show the superiority of the emotion classification models trained on the dialog data, and further to verify good emotion cues are important context for text suggestion.
- Jul 25 2017 cs.CV arXiv:1707.07584v1Foreground segmentation in video sequences is a classic topic in computer vision. Due to the lack of semantic and prior knowledge, it is difficult for existing methods to deal with sophisticated scenes well. Therefore, in this paper, we propose an end-to-end two-stage deep convolutional neural network (CNN) framework for foreground segmentation in video sequences. In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes. In the second stage, the reconstructed background and current frame are input into a multi-channel fully-convolutional sub-network (MCFCN) for accurate foreground segmentation. In the two-stage CNN, the reconstruction loss and segmentation loss are jointly optimized. The background images and foreground objects are output simultaneously in an end-to-end way. Moreover, by incorporating the prior semantic knowledge of foreground and background in the pre-training process, our method could restrain the background noise and keep the integrity of foreground objects at the same time. Experiments on CDNet 2014 show that our method outperforms the state-of-the-art by 4.9%.
- We propose a minority route choice game to investigate the effect of the network structure on traffic network performance under the assumption of drivers' bounded rationality. We investigate ring-and-hub topologies to capture the nature of traffic networks in cities, and employ a minority game-based inductive learning process to model the characteristic behavior under the route choice scenario. Through numerical experiments, we find that topological changes in traffic networks induce a phase transition from an uncongested phase to a congested phase. Understanding this phase transition is helpful in planning new traffic networks.
- In this paper, the one-sided secrecy of two-way wiretap channel with feedback is investigated, where the confidential messages of one user through multiple transmissions is guaranteed secure against an external eavesdropper. For one thing, one-sided secrecy satisfies the secure demand of many practical scenarios. For another, the secrecy is measured over many blocks since the correlation between eavesdropper's observation and the confidential messages in successive blocks, instead of secrecy measurement of one block in previous works. Thus, firstly, an achievable secrecy rate region is derived for the general two-way wiretap channel with feedback through multiple transmissions under one-sided secrecy. Secondly, outer bounds on the secrecy capacity region are also obtained. The gap between inner and outer bounds on the secrecy capacity region is explored via the binary input two-way wiretap channels. Most notably, the secrecy capacity regions are established for the XOR channel. Furthermore, the result shows that the achievable rate region with feedback is larger than that without feedback. Therefore, the benefit role of feedback is precisely characterized for two-way wiretap channel with feedback under one-sided secrecy.
- Jul 19 2017 cs.CV arXiv:1707.05495v2In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.
- Security-critical tasks require proper isolation from untrusted software. Chip manufacturers design and include trusted execution environments (TEEs) in their processors to secure these tasks. The integrity and security of the software in the trusted environment depend on the verification process of the system. We find a form of attack that can be performed on the current implementations of the widely deployed ARM TrustZone technology. The attack exploits the fact that the trustlet (TA) or TrustZone OS loading verification procedure may use the same verification key and may lack proper rollback prevention across versions. If an exploit works on an out-of-date version, but the vulnerability is patched on the latest version, an attacker can still use the same exploit to compromise the latest system by downgrading the software to an older and exploitable version. We did experiments on popular devices on the market including those from Google, Samsung and Huawei, and found that all of them have the risk of being attacked. Also, we show a real-world example to exploit Qualcomm's QSEE. In addition, in order to find out which device images share the same verification key, pattern matching schemes for different vendors are analyzed and summarized.
- Sensor-based activity recognition seeks the profound high-level knowledge about human activity from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, most of those approaches heavily rely on heuristic hand-crafted feature extraction methods, which dramatically hinder their generalization performance. Additionally, those methods often produce unsatisfactory results for unsupervised and incremental learning tasks. Meanwhile, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. In this paper, we survey and highlight the recent advancement of deep learning approaches for sensor-based activity recognition. Specifically, we summarize existing literatures from three aspects: sensor modality, deep model and application. We also present a detailed discussion and propose grand challenges for future direction.
- Jul 07 2017 cs.CV arXiv:1707.01629v2In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications.
- Jul 07 2017 cs.GT arXiv:1707.01590v1Recent literature on computational notions of fairness has been broadly divided into two distinct camps, supporting interventions that address either individual-based or group-based fairness. Rather than privilege a single definition, we seek to resolve both within the particular domain of employment discrimination. To this end, we construct a dual labor market model composed of a Temporary Labor Market, in which firm strategies are constrained to ensure group-level fairness, and a Permanent Labor Market, in which individual worker fairness is guaranteed. We show that such restrictions on hiring practices induces an equilibrium that Pareto-dominates those arising from strategies that employ statistical discrimination or a "group-blind" criterion. Individual worker reputations produce externalities for collective reputation, generating a feedback loop termed a "self-fulfilling prophecy." Our model produces its own feedback loop, raising the collective reputation of an initially disadvantaged group via a fairness intervention that need not be permanent. Moreover, we show that, contrary to popular assumption, the asymmetric equilibria resulting from hiring practices that disregard group-fairness may be immovable without targeted intervention. The enduring nature of such equilibria that are both inequitable and Pareto inefficient suggest that fairness interventions are of critical importance in moving the labor market to be more socially just and efficient.
- Jul 07 2017 cs.CV arXiv:1707.01691v1We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.
- We have witnessed rapid evolution of deep neural network architecture design in the past years. These latest progresses greatly facilitate the developments in various areas such as computer vision, natural language processing, etc. However, along with the extraordinary performance, these state-of-the-art models also bring in expensive computational cost. Directly deploying these models into applications with real-time requirement is still infeasible. Recently, Hinton etal. have shown that the dark knowledge within a powerful teacher model can significantly help the training of a smaller and faster student network. These knowledge are vastly beneficial to improve the generalization ability of the student model. Inspired by their work, we introduce a new type of knowledge -- cross sample similarities for model compression and acceleration. This knowledge can be naturally derived from deep metric learning model. To transfer them, we bring the learning to rank technique into deep metric learning formulation. We test our proposed DarkRank on the pedestrian re-identification task. The results are quite encouraging. Our DarkRank can improve over the baseline method by a large margin. Moreover, it is fully compatible with other existing methods. When combined, the performance can be further boosted.
- Jul 04 2017 cs.GT arXiv:1707.00208v1Selfish routing is a central problem in algorithmic game theory, with one of the principal applications being that of routing in road networks. Inspired by the emergence of routing technologies and autonomous driving, we revisit selfish routing and consider three possible outcomes of it: (i) $\theta$-Positive Nash Equilibrium flow, where every path that has non-zero flow on all of its edges has cost no greater than $\theta$ times the cost of any other path, (ii) $\theta$-Used Nash Equilibrium flow, where every used path that appears in the path flow decomposition has cost no greater than $\theta$ times the cost of any other path, and (iii) $\theta$-Envy Free flow, where every path that appears in the path flow decomposition has cost no greater than $\theta$ times the cost of any other path in the path flow decomposition. We first examine the relations of these outcomes among each other and then measure their possible impact on the network's performance. Afterwards, we examine the computational complexity of finding such flows of minimum social cost and give a range for $\theta$ for which this task is easy and a range for $\theta$ for which this task is NP-hard. Finally, we propose deterministic strategies which, in a worst case approach, can be used by a central planner in order to provide good such flows, and further introduce a natural idea for randomly routing players after giving them specific guarantees about their costs in the randomized routing, as a tool for the central planner to implement a desired flow.
- Jul 04 2017 cs.CV arXiv:1707.00383v1In this paper, we propose an alternative method to estimate room layouts of cluttered indoor scenes. This method enjoys the benefits of two novel techniques. The first one is semantic transfer (ST), which is: (1) a formulation to integrate the relationship between scene clutter and room layout into convolutional neural networks; (2) an architecture that can be end-to-end trained; (3) a practical strategy to initialize weights for very deep networks under unbalanced training data distribution. ST allows us to extract highly robust features under various circumstances, and in order to address the computation redundance hidden in these features we develop a principled and efficient inference scheme named physics inspired optimization (PIO). PIO's basic idea is to formulate some phenomena observed in ST features into mechanics concepts. Evaluations on public datasets LSUN and Hedau show that the proposed method is more accurate than state-of-the-art methods.
- Jun 30 2017 cs.CL arXiv:1706.09742v1We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR. Compared to the event last year (AP16-OLR), the new challenge involves more languages and focuses more on short utterances. The data is offered by SpeechOcean and the NSFC M2ASR project. Two types of baselines are constructed to assist the participants, one is based on the i-vector model and the other is based on various neural networks. We report the baseline results evaluated with various metrics defined by the AP17-OLR evaluation plan and demonstrate that the combined database is a reasonable data resource for multilingual research. All the data is free for participants, and the Kaldi recipes for the baselines have been published online.
- Objective: Amyotrophic lateral sclerosis (ALS) is a rare disease, but is also one of the most common motor neuron diseases, and people of all races and ethnic backgrounds are affected. There is currently no cure. Brain computer interfaces (BCIs) can establish a communication channel directly between the brain and an external device by recognizing brain activities that reflect user intent. Therefore, this technology could help ALS patients in promoting functional independence through BCI-based speller systems and motor assistive devices. Methods: In this paper, two kinds of ERP-based speller systems were tested on 18 ALS patients to: (1) assess performance when they spelled 42 characters online continuously, without a break; and (2) to compare performance between a matrix-based speller paradigm (MS-P, mean visual angle 6 degree) and a new speller paradigm that used a larger visual angle called the large visual angle speller paradigm (LS-P, mean visual angle 8 degree). Results: Although results showed that there were no significant differences between the two paradigms in accuracy trend over continuous use (p>0.05), the fatigue during the LS-P condition was significantly lower than that of MS-P (p<0.05). Results also showed that continuous use slightly reduced the performance of this ERP-based BCI. Conclusion: 15 subjects obtained higher than 80% feedback accuracy (online output accuracy) and 9 subjects obtained higher than 90% feedback accuracy in one of the two paradigms, thus validating the BCI approaches in this study. Significance: Most ALS subjects in this study could spell effectively after continuous use of an ERP-based BCI. The new LS-P display may be easier for subjects to use, resulting in lower fatigue.
- We present an efficient algorithm for recent generalizations of optimal mass transport theory to matrix-valued and vector-valued densities. These generalizations lead to several applications including diffusion tensor imaging, color images processing, and multi-modality imaging. The algorithm is based on sequential quadratic programming (SQP). By approximating the Hessian of the cost and solving each iteration in an inexact manner, we are able to solve each iteration with relatively low cost while still maintaining a fast convergent rate. The core of the algorithm is solving a weighted Poisson equation, where different efficient preconditioners may be employed. We utilize incomplete Cholesky factorization, which yields an efficient and straightforward solver for our problem. Several illustrative examples are presented for both the matrix and vector-valued cases.
- This paper proposes a speaker recognition (SRE) task with trivial speech events, such as cough and laugh. These trivial events are ubiquitous in conversations and less subjected to intentional change, therefore offering valuable particularities to discover the genuine speaker from disguised speech. However, trivial events are often short and idiocratic in spectral patterns, making SRE extremely difficult. Fortunately, we found a very powerful deep feature learning structure that can extract highly speaker-sensitive features. By employing this tool, we studied the SRE performance on three types of trivial events: cough, laugh and "Wei" (a short Chinese "Hello"). The results show that there is rich speaker information within these trivial events, even for cough that is intuitively less speaker distinguishable. With the deep feature approach, the EER can reach 10%-14% with the three trivial events, despite their extremely short durations (0.2-1.0 seconds).
- Jun 22 2017 cs.CV arXiv:1706.06792v1Deep Convolutional Neural Networks (CNNs) are capable of learning unprecedentedly effective features from images. Some researchers have struggled to enhance the parameters' efficiency using grouped convolution. However, the relation between the optimal number of convolutional groups and the recognition performance remains an open problem. In this paper, we propose a series of Basic Units (BUs) and a two-level merging strategy to construct deep CNNs, referred to as a joint Grouped Merging Net (GM-Net), which can produce joint grouped and reused deep features while maintaining the feature discriminability for classification tasks. Our GM-Net architectures with the proposed BU_A (dense connection) and BU_B (straight mapping) lead to significant reduction in the number of network parameters and obtain performance improvement in image classification tasks. Extensive experiments are conducted to validate the superior performance of the GM-Net than the state-of-the-arts on the benchmark datasets, e.g., MNIST, CIFAR-10, CIFAR-100 and SVHN.
- Although the recent progress in the deep neural network has led to the development of learnable local feature descriptors, there is no explicit answer for estimation of the necessary size of a neural network. Specifically, the local feature is represented in a low dimensional space, so the neural network should have more compact structure. The small networks required for local feature descriptor learning may be sensitive to initial conditions and learning parameters and more likely to become trapped in local minima. In order to address the above problem, we introduce an adaptive pruning Siamese Architecture based on neuron activation to learn local feature descriptors, making the network more computationally efficient with an improved recognition rate over more complex networks. Our experiments demonstrate that our learned local feature descriptors outperform the state-of-art methods in patch matching.
- Jun 13 2017 cs.SY arXiv:1706.03612v1We propose a framework to engineer synthetic-inertia and droop-control parameters for distributed energy resources (DERs) so that the system frequency in a network composed of DERs and synchronous generators conforms to prescribed transient and steady-state performance specifications. Our approach is grounded in a second-order lumped-parameter model that captures the dynamics of synchronous generators and frequency-responsive DERs endowed with inertial and droop control. A key feature of this reduced-order model is that its parameters can be related to those of the originating higher-order dynamical model. This allows one to systematically design the DER inertial and droop-control coefficients leveraging classical frequency-domain response characteristics of second-order systems. Time-domain simulations validate the accuracy of the model-reduction method and demonstrate how DER controllers can be designed to meet steady-state-regulation and transient-performance specifications.
- Jun 13 2017 cs.NE arXiv:1706.03609v1We extended the work of proposed activation function, Noisy Softplus, to fit into training of layered up spiking neural networks (SNNs). Thus, any ANN employing Noisy Softplus neurons, even of deep architecture, can be trained simply by the traditional algorithm, for example Back Propagation (BP), and the trained weights can be directly used in the spiking version of the same network without any conversion. Furthermore, the training method can be generalised to other activation units, for instance Rectified Linear Units (ReLU), to train deep SNNs off-line. This research is crucial to provide an effective approach for SNN training, and to increase the classification accuracy of SNNs with biological characteristics and to close the gap between the performance of SNNs and ANNs.
- We study Markov chain models where the transition mechanism depends nonlinearly on the current state. One specific choice for such a model, where the state represents "belief," was proposed in \citejia2015opinion to model opinion dynamics and is referred to as the DeGroot-Friedkin model. Herein, we consider a general class of such nonlinear Markov chain models and develop a theory for assessing stability. Our approach relies on establishing that the differential of the nonlinear dynamics (under suitable analyticity conditions) is contractive in the $\ell_1$ metric. We apply the theory to two type of nonlinear random walks, i.e., nonlinearly adapting the transition probabilities, where the adaptation is exponential and linear, respectively. The latter includes the DeGroot-Friedkin model and generalizations. We also discuss continuous-time generalization as well as interacting (particle) models and discuss their relevance with regard to modeling social dynamics over influence networks. Finally, we view the nonlinear adaptation of the transition mechanism as feedback and quantify the effect of external bias on the stationary distribution.
- Jun 09 2017 cs.SY arXiv:1706.02695v1Stand-alone direct current (DC) microgrids may belong to different owners and adopt various control strategies. This brings great challenge to its optimal operation due to the difficulty of implementing a unified control. This paper addresses the distributed optimal control of DC microgrids, which intends to break the restriction of diversity to some extent. Firstly, we formulate the optimal power flow (OPF) problem of stand-alone DC microgrids as an exact second order cone program (SOCP) and prove the uniqueness of the optimal solution. Then a dynamic solving algorithm based on primal-dual decomposition method is proposed, the convergence of which is proved theoretically as well as the optimality of its equilibrium point. It should be stressed that the algorithm can provide control commands for the three types of microgrids: (i) power control, (ii) voltage control and (iii) droop control. This implies that each microgrid does not need to change its original control strategy in practice, which is less influenced by the diversity of microgrids. Moreover, the control commands for power controlled and voltage controlled microgrids satisfy generation limits and voltage limits in both transient process and steady state. Finally, a six-microgrid DC system based on the microgrid benchmark is adopted to validate the effectiveness and plug-n-play property of our designs.
- Jun 09 2017 cs.CV arXiv:1706.02425v1In this paper, we investigated a C-arm tomographic technique as a new three dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone detection over view angle less than 180o. Our C-arm tomographic technique provides a series of two dimensional (2D) images with a single scan over 40o view angle. Experimental studies were performed with a kidney phantom that was formed from a pig kidney with two embedded kidney stones. Different reconstruction methods were developed for C-arm tomographic technique to generate 3D kidney information including: point by point back projection (BP), filtered back projection (FBP), simultaneous algebraic reconstruction technique (SART) and maximum likelihood expectation maximization (MLEM). Computer simulation study was also done with simulated 3D spherical object to evaluate the reconstruction results. Preliminary results demonstrated the capability of our C-arm tomographic technique to generate 3D kidney information for kidney stone detection with low exposure of radiation. The kidney stones are visible on reconstructed planes with identifiable shapes and sizes.
- Jun 08 2017 cs.SD arXiv:1706.02101v1For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database.
- Convolutional neural networks (CNNs) with deep architectures have substantially advanced the state-of-the-art in computer vision tasks. However, deep networks are typically resource-intensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binary-weight CNNs, targeting at more faithful inference and better trade-off for practical applications. Our basic idea is to exploit binary structure directly in pre-trained filter banks and produce binary-weight models via tensor expansion. The whole process can be treated as a coarse-to-fine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binary-weight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little.
- Speech signals are complex intermingling of various informative factors, and this information blending makes decoding any of the individual factors extremely difficult. A natural idea is to factorize each speech frame into independent factors, though it turns out to be even more difficult than decoding each individual factor. A major encumbrance is that the speaker trait, a major factor in speech signals, has been suspected to be a long-term distributional pattern and so not identifiable at the frame level. In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN). This discovery motivated a cascade deep factorization (CDF) framework that infers speech factors in a sequential way, and factors previously inferred are used as conditional variables when inferring other factors. Our experiment on an automatic emotion recognition (AER) task demonstrated that this approach can effectively factorize speech signals, and using these factors, the original speech spectrum can be recovered with high accuracy. This factorization and reconstruction approach provides a novel tool for many speech processing tasks.
- Jun 06 2017 cs.DC arXiv:1706.01022v1Traditionally, a regional dispatch center uses the equivalent method to deal with external grids, which fails to reflect the interactions among regions. This paper proposes a distributed N-1 contingency analysis (DCA) solution, where dispatch centers join a coordinated computation using their private data and computing resources. A distributed screening method is presented to determine the Critical Contingency Set (DCCS) in DCA. Then, the distributed power flow is formulated as a set of boundary equations, which is solved by a Jacobi-Free Newton-GMRES (JFNG) method. During solving the distributed power flow, only boundary conditions are exchanged. Acceleration techniques are also introduced, including reusing preconditioners and optimal resource scheduling during parallel processing of multiple contingencies. The proposed method is implemented on a real EMS platform, where tests using the Southwest Regional Grid of China are carried out to validate its feasibility.
- Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2\Lambda$ is distributed as a $\chi^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks' theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/n\rightarrow\kappa$ for some constant $\kappa < 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2\Lambda~\stackrel{\mathrm{d}}{\rightarrow}~\alpha(\kappa)\chi_k^2$, where the scaling factor $\alpha(\kappa)$ is greater than one as soon as the dimensionality ratio $\kappa$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $\kappa=0.3$, $\alpha(\kappa)\approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
- Jun 05 2017 cs.CY arXiv:1706.00487v1Objectives: The fee-for-service approach to healthcare leads to the management of a patient's conditions in an independent manner, inducing various negative consequences. It is recognized that a bundled care approach to healthcare-one that manages a collection of health conditions together-may enable greater efficacy and cost savings. However, it is not always evident which sets of conditions should be managed in a bundled program. Study Design: Retrospective inference of clusters of health conditions from an electronic medical record (EMR) system. A survey of healthcare experts to ascertain the plausibility of the clusters for bundled care programs. Methods: We designed a data-driven framework to infer clusters of health conditions via their shared clinical workflows according to EMR utilization by healthcare employees. We evaluated the framework with approximately 16,500 inpatient stays from a large medical center. The plausibility of the clusters for bundled care was assessed through a survey of a panel of healthcare experts using an analysis of variance (ANOVA) under a 95% confidence interval. Results: The framework inferred four condition clusters: 1) fetal abnormalities, 2) late pregnancies, 3) prostate problems, and 4) chronic diseases (with congestive heart failure featuring prominently). Each cluster was deemed plausible by the experts for bundled care. Conclusions: The findings suggest that data from EMRs may provide a basis for discovering new directions in bundled care. Still, translating such findings into actual care management will require further refinement, implementation, and evaluation.
- Some recent works revealed that deep neural networks (DNNs) are vulnerable to so-called adversarial attacks where input examples are intentionally perturbed to fool DNNs. In this work, we revisit the DNN training process that includes adversarial examples into the training dataset so as to improve DNN's resilience to adversarial attacks, namely, adversarial training. Our experiments show that different adversarial strengths, i.e., perturbation levels of adversarial examples, have different working zones to resist the attack. Based on the observation, we propose a multi-strength adversarial training method (MAT) that combines the adversarial training examples with different adversarial strengths to defend adversarial attacks. Two training structures - mixed MAT and parallel MAT - are developed to facilitate the tradeoffs between training time and memory occupation. Our results show that MAT can substantially minimize the accuracy degradation of deep learning systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.
- May 30 2017 cs.CY arXiv:1705.09713v1OBJECTIVE: To test the hypothesis that variation in care coordination is related to LOS. DESIGN We applied a spectral co-clustering methodology to simultaneously infer groups of patients and care coordination patterns, in the form of interaction networks of health care professionals, from electronic medical record (EMR) utilization data. The care coordination pattern for each patient group was represented by standard social network characteristics and its relationship with hospital LOS was assessed via a negative binomial regression with a 95% confidence interval. SETTING AND PATIENTS This study focuses on 5,588 adult patients hospitalized for trauma at the Vanderbilt University Medical Center. The EMRs were accessed by healthcare professionals from 179 operational areas during 158,467 operational actions. MAIN OUTCOME MEASURES: Hospital LOS for trauma inpatients, as an indicator of care coordination efficiency. RESULTS: Three general types of care coordination patterns were discovered, each of which was affiliated with a specific patient group. The first patient group exhibited the shortest hospital LOS and was managed by a care coordination pattern that involved the smallest number of operational areas (102 areas, as opposed to 125 and 138 for the other patient groups), but exhibited the largest number of collaborations between operational areas (e.g., an average of 27.1 connections per operational area compared to 22.5 and 23.3 for the other two groups). The hospital LOS for the second and third patient groups was 14 hours (P = 0.024) and 10 hours (P = 0.042) longer than the first patient group, respectively.
- The capacity of a fractal wireless network with direct social interactions is studied in this paper. Specifically, we mathematically formulate the self-similarity of a fractal wireless network by a power-law degree distribution $ P(k) $, and we capture the connection feature between two nodes with degree $ k_{1} $ and $ k_{2} $ by a joint probability distribution $ P(k_{1},k_{2}) $. It is proved that if the source node communicates with one of its direct contacts randomly, the maximum capacity is consistent with the classical result $ \Theta\left(\frac{1}{\sqrt{n\log n}}\right) $ achieved by Kumar \citeGupta2000The. On the other hand, if the two nodes with distance $ d $ communicate according to the probability $ d^{-\beta} $, the maximum capacity can reach up to $ \Theta\left(\frac{1}{\log n}\right) $, which exhibits remarkable improvement compared with the well-known result in \citeGupta2000The.
- May 30 2017 cs.CV arXiv:1705.09882v1This work targets person re-identification (ReID) from depth sensors such as Kinect. Since depth is invariant to illumination and less sensitive than color to day-by-day appearance changes, a natural question is whether depth is an effective modality for Person ReID, especially in scenarios where individuals wear different colored clothes or over a period of several months. We explore the use of recurrent Deep Neural Networks for learning high-level shape information from low-resolution depth images. In order to tackle the small sample size problem, we introduce regularization and a hard temporal attention unit. The whole model can be trained end to end with a hybrid supervised loss. We carry out a thorough experimental evaluation of the proposed method on three person re-identification datasets, which include side views, views from the top and sequences with varying degree of partial occlusion, pose and viewpoint variations. To that end, we introduce a new dataset with RGB-D and skeleton data. In a scenario where subjects are recorded after three months with new clothes, we demonstrate large performance gains attained using Depth ReID compared to a state-of-the-art Color ReID. Finally, we show further improvements using the temporal attention unit in multi-shot setting.
- The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on natural and biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.
- In this paper we consider the cluster estimation problem under the Stochastic Block Model. We show that the semidefinite programming (SDP) formulation for this problem achieves an error rate that decays exponentially in the signal-to-noise ratio. The error bound implies weak recovery in the sparse graph regime with bounded expected degrees, as well as exact recovery in the dense regime. An immediate corollary of our results yields error bounds under the Censored Block Model. Moreover, these error bounds are robust, continuing to hold under heterogeneous edge probabilities and a form of the so-called monotone attack. Significantly, this error rate is achieved by the SDP solution itself without any further pre- or post-processing, and improves upon existing polynomially-decaying error bounds proved using the Grothendieck\textquoteright s inequality. Our analysis has two key ingredients: (i) showing that the graph has a well-behaved spectrum, even in the sparse regime, after discounting an exponentially small number of edges, and (ii) an order-statistics argument that governs the final error rate. Both arguments highlight the implicit regularization effect of the SDP formulation.
- May 23 2017 cs.CC arXiv:1705.07312v1We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$. We show that any randomized algorithm needs a running time at least $\Omega(|\mathcal{S}|^2|\mathcal{A}|)$ to compute an $\epsilon$-optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to $\Omega\left( \frac{|\mathcal{S}| |\mathcal{A}|}{\epsilon} \right)$. These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input.
- High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels -1,0,1 which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence. Our experiments show that applying TernGrad on AlexNet does not incur any accuracy loss and can even improve accuracy. The accuracy loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a performance model is proposed to study the scalability of TernGrad. Experiments show significant speed gains for various deep neural networks.
- We consider the problem of distributed statistical machine learning in adversarial settings, where some unknown and time-varying subset of working machines may be compromised and behave arbitrarily to prevent an accurate model from being learned. This setting captures the potential adversarial attacks faced by Federated Learning -- a modern machine learning paradigm that is proposed by Google researchers and has been intensively studied for ensuring user privacy. Formally, we focus on a distributed system consisting of a parameter server and $m$ working machines. Each working machine keeps $N/m$ data samples, where $N$ is the total number of samples. The goal is to collectively learn the underlying true model parameter of dimension $d$. In classical batch gradient descent methods, the gradients reported to the server by the working machines are aggregated via simple averaging, which is vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine gradient descent method based on the geometric median of means of the gradients. We show that our method can tolerate $q \le (m-1)/2$ Byzantine failures, and the parameter estimate converges in $O(\log N)$ rounds with an estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error rate $\sqrt{d/N}$ in the centralized and failure-free setting. The total computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each working machine and $O(md + kd \log^3 N)$ at the central server, and the total communication cost is of $O(m d \log N)$. We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. We prove that the aggregated gradient converges uniformly to the true gradient function.
- Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.
- Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.
- Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.
- In this paper, we address a major challenge confronting the Cloud Service Providers (CSPs) utilizing a tiered storage architecture - how to maximize their overall profit over a variety of storage tiers that offer distinct characteristics, as well as file placement and access request scheduling policies.