results for au:Xu_Y in:cs

- This letter investigates the problem of anti-jamming communications in dynamic and unknown environment through on-line learning. Different from existing studies which need to know (estimate) the jamming patterns and parameters, we use the spectrum waterfall, i.e., the raw spectrum environment, directly. Firstly, to cope with the challenge of infinite state of raw spectrum information, a deep anti-jamming Q-network is constructed. Then, a deep anti-jamming reinforcement learning algorithm is proposed to obtain the optimal anti-jamming strategies. Finally, simulation results validate the the proposed approach. The proposed approach is relying only on the local observed information and does not need to estimate the jamming patterns and parameters, which implies that it can be widely used various anti-jamming scenarios.
- Oct 10 2017 cs.CV arXiv:1710.03011v1Almost all existing visual saliency models focus on predicting a universal saliency map across all observers. Yet psychology studies suggest that visual attention of different observers can vary a lot under some specific circumstances, especially when they view scenes with multiple salient objects. However, few work explores this visual attention difference probably because of lacking a proper dataset, as well as complex correlation between visual attention, personal preferences, and image contents. In this paper, we set out to study this heterogenous visual attention pattern between different observers and build the first dataset for personalized saliency detection. Further, we propose to decompose a personalized saliency map (referred to as PSM) into a universal saliency map (referred to as USM) which can be predicted by any existing saliency detection models and a discrepancy between them. Then personalized saliency detection is casted as the task of discrepancy estimation between PSM and USM. To tackle this task we propose two solutions: i) The discrepancy estimation for different observers are casted as different but related tasks. Then we feed the image and its USM into a multi-task convolutional neural network framework to estimate the discrepancy between PSM and USM for each observer; ii) As the discrepancy is related to both image contents and the observers' person-specific information, we feed the image and its associated USM into a convolutional neural network with person-specific information encoded filters to estimate the discrepancy. Extensive experimental results demonstrate the effectiveness of our models for PSM prediction as well their generalization capability for unseen observers.
- Oct 06 2017 cs.GT arXiv:1710.01918v1This paper investigates the incentive mechanism design from a novel and practically important perspective in which mobile users as contributors do not join simultaneously and a requester desires large efforts from early contributors. A two-stage Tullock contest framework is constructed:at the second stage the potential contributors compete for splittable reward by exerting efforts, and at the first stage the requester can orchestrate the incentive mechanism to maximize his crowdsensing efficiency given the rewarding budget. A general reward discrimination mechanism is developed for timeliness sensitive crowdsensing where an earlier contributor usually has a larger maximum achievable reward and thus allocates more efforts. Owning to the lack of joining time information, two practical implementations, namely earliest-n and termination time, are announced to the contributors. For each of them, we formulate a Stackelberg Bayesian game in which the joining time of a contributor is his type and not available to his opponents. The uniqueness of Bayesian Nash equilibrium (BNE) is proved in each strategy. To maximize the requester's efficiency, we compute the optimal number of rewarded contributors in the earliest-n scheme and the optimal deadline in the termination time scheme. Our contest framework is applicable not only to the closed crowdsensing with fixed number of contributors, but also to the open crowdsensing that the arrival of contributors is governed by a stochastic process. Extensive simulations manifest that with appropriate reward discriminations, the requester is able to achieve a much higher efficiency with the optimal selection of the number of rewarded contributiors and the termination time.
- In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge. The audio clips in this task, which are extracted from YouTube videos, are manually labeled with one or a few audio tags but without timestamps of the audio events, which is called as weakly labeled data. Two sub-tasks are defined in this challenge including audio tagging and sound event detection using this weakly labeled data. A convolutional recurrent neural network (CRNN) with learnable gated linear units (GLUs) non-linearity applied on the log Mel spectrogram is proposed. In addition, a temporal attention method is proposed along the frames to predicate the locations of each audio event in a chunk from the weakly labeled data. We ranked the 1st and the 2nd as a team in these two sub-tasks of DCASE 2017 challenge with F value 55.6\% and Equal error 0.73, respectively.
- Sep 20 2017 cs.GT arXiv:1709.06367v1We consider the well-studied game-theoretic version of machine scheduling in which jobs correspond to self-interested users and machines correspond to resources. Here each user chooses a machine trying to minimize her own cost, and such selfish behavior typically results in some equilibrium which is not globally optimal: An equilibrium is an allocation where no user can reduce her own cost by moving to another machine, which in general need not minimize the makespan, i.e., the maximum load over the machines. We provide tight bounds on two well-studied notions in algorithmic game theory, namely, the price of anarchy and the strong price of anarchy on machine scheduling setting which lies in between the related and the unrelated machine case. Both notions study the social cost (makespan) of the worst equilibrium compared to the optimum, with the strong price of anarchy restricting to a stronger form of equilibria. Our results extend a prior study comparing the price of anarchy to the strong price of anarchy for two related machines (Epstein, Acta Informatica 2010), thus providing further insights on the relation between these concepts. Our exact bounds give a qualitative and quantitative comparison between the two models. The bounds also show that the setting is indeed easier than the two unrelated machines: In the latter, the strong price of anarchy is $2$, while in ours it is strictly smaller.
- Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over times. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose changes are mostly attributed to the subject's interactions with the surrounding, e.g., crossing behind another objects, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluents and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate the above joint task as an iterative search of feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrated that our method clearly outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.
- Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing method that takes view-centric proposals from pre-trained computer vision models and produces spatio-temporal parse graphs that represents a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations and that knowledge segments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed joint parsing framework models such correlations and constraints explicitly and generates semantic parse graphs about the scene. Quantitative experiments show that scene-centric predictions in the parse graph outperform view-centric predictions.
- Sep 14 2017 cs.CV arXiv:1709.04344v1How to effectively approximate real-valued parameters with binary codes plays a central role in neural network binarization. In this work, we reveal an important fact that binarizing different layers has a widely-varied effect on the compression ratio of network and the loss of performance. Based on this fact, we propose a novel and flexible neural network binarization method by introducing the concept of layer-wise priority which binarizes parameters in inverse order of their layer depth. In each training step, our method selects a specific network layer, minimizes the discrepancy between the original real-valued weights and its binary approximations, and fine-tunes the whole network accordingly. During the iteration of the above process, it is significant that we can flexibly decide whether to binarize the remaining floating layers or not and explore a trade-off between the loss of performance and the compression ratio of model. The resulting binary network is applied for efficient pedestrian detection. Extensive experimental results on several benchmarks show that under the same compression ratio, our method achieves much lower miss rate and faster detection speed than the state-of-the-art neural network binarization method.
- Sep 08 2017 cs.CV arXiv:1709.02054v3Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon "attention drift". To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.
- Sep 05 2017 cs.SD arXiv:1709.00551v1In this technique report, we present a bunch of methods for the task 4 of Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017) challenge. This task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are YouTube video excerpts focusing on transportation and warnings due to their industry applications. There are two tasks, audio tagging and sound event detection from weakly labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU) based recurrent neural network (RNN) are adopted as our basic framework. We proposed a learnable gating activation function for selecting informative local features. Attention-based scheme is used for localizing the specific events in a weakly-supervised mode. A new batch-level balancing strategy is also proposed to tackle the data unbalancing problem. Fusion of posteriors from different systems are found effective to improve the performance. In a summary, we get 61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the sound event detection subtask on the development set. While the official multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the audio tagging and 1.02 for the sound event detection.
- Aug 22 2017 cs.AI arXiv:1708.05930v1In this paper, a new type of 3D bin packing problem (BPP) is proposed, in which a number of cuboid-shaped items must be put into a bin one by one orthogonally. The objective is to find a way to place these items that can minimize the surface area of the bin. This problem is based on the fact that there is no fixed-sized bin in many real business scenarios and the cost of a bin is proportional to its surface area. Our research shows that this problem is NP-hard. Based on previous research on 3D BPP, the surface area is determined by the sequence, spatial locations and orientations of items. Among these factors, the sequence of items plays a key role in minimizing the surface area. Inspired by recent achievements of deep reinforcement learning (DRL) techniques, especially Pointer Network, on combinatorial optimization problems such as TSP, a DRL-based method is applied to optimize the sequence of items to be packed into the bin. Numerical results show that the method proposed in this paper achieve about 5% improvement than heuristic method.
- Existing block-diagonal representation researches mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intra-class representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image datasets, three character datasets, and the fifteen scene multi-categories dataset. It not only shows superior potential on image recognition but also outperforms state-of-the-art methods.
- This paper investigates the application of simultaneous wireless information and power transfer (SWIPT) to cooperative non-orthogonal multiple access (NOMA). A new cooperative multiple-input single-output (MISO) SWIPT NOMA protocol is proposed, where a user with a strong channel condition acts as an energy-harvesting (EH) relay to help a user with a poor channel condition. The power splitting (PS) scheme is adopted at the EH relay. By jointly optimizing the PS ratio and the beamforming vectors, the design objective is to maximize the data rate of the "strong user" while satisfying the QoS requirement of the "weak user". It boils down to a challenging nonconvex problem. To resolve this issue, the semidefinite relaxation (SDR) technique is applied to relax the quadratic terms related with the beamformers, and then it is solved to its global optimality by two-dimensional exhaustive search. We prove the rank-one optimality, which establishes the equivalence between the relaxed problem and the original one. To further reduce the high complexity due to the exhaustive search, an iterative algorithm based on successive convex approximation (SCA) is proposed, which can at least attain its stationary point efficiently. In view of the potential application scenarios, e.g., IoT, the single-input single-output (SISO) case of the cooperative SWIPT NOMA system is also studied. The formulated problem is proved to be strictly unimodal with respect to the PS ratio. Hence, a golden section search (GSS) based algorithm with closed-form solution at each step is proposed to find the unique global optimal solution. It is worth pointing out that the SCA method can also converge to the optimal solution in SISO cases. In the numerical simulation, the proposed algorithm is numerically shown to converge within a few iterations, and the SWIPT-aided NOMA protocol outperforms the existing transmission protocols.
- This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.
- May 30 2017 cs.NI arXiv:1705.09999v1Network Function Virtualization (NFV) shed new light for the design, deployment, and management of cloud networks. Many network functions such as firewalls, load balancers, and intrusion detection systems can be virtualized by servers. However, network operators often have to sacrifice programmability in order to achieve high throughput, especially at networks' edge where complex network functions are required. Here, we design, implement, and evaluate Hybrid Modular Switch (HyMoS). The hybrid hardware/software switch is designed to meet requirements for modern-day NFV applications in providing high-throughput, with a high degree of programmability. HyMoS utilizes P4-compatible Network Interface Cards (NICs), PCI Express interface and CPU to act as line cards, switch fabric, and fabric controller respectively. In our implementation of HyMos, PCI Express interface is turned into a non-blocking switch fabric with a throughput of hundreds of Gigabits per second. Compared to existing NFV infrastructure, HyMoS offers modularity in hardware and software as well as a higher degree of programmability by supporting a superset of P4 language.
- Determining deep holes is an important topic in decoding Reed-Solomon codes. Let $l\ge 1$ be an integer and $a_1,\ldots,a_l$ be arbitrarily given $l$ distinct elements of the finite field ${\bf F}_q$ of $q$ elements with the odd prime number $p$ as its characteristic. Let $D={\bf F}_q\backslash\{a_1,\ldots,a_l\}$ and $k$ be an integer such that $2\le k\le q-l-1$. In this paper, we study the deep holes of generalized projective Reed-Solomon code ${\rm GPRS}_q(D, k)$ of length $q-l+1$ and dimension $k$ over ${\bf F}_q$. For any $f(x)\in {\bf F}_q[x]$, we let $f(D)=(f(y_1),\ldots,f(y_{q-l}))$ if $D=\{y_1, ..., y_{q-l}\}$ and $c_{k-1}(f(x))$ be the coefficient of $x^{k-1}$ of $f(x)$. By using Dür's theorem on the relation between the covering radius and minimum distance of ${\rm GPRS}_q(D, k)$, we show that if $u(x)\in {\bf F}_q[x]$ with $\deg (u(x))=k$, then the received codeword $(u(D), c_{k-1}(u(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if and only if the sum $\sum\limits_{y\in I}y$ is nonzero for any subset $I\subseteq D$ with $\#(I)=k$. We show also that if $j$ is an integer with $1\leq j\leq l$ and $u_j(x):= \lambda_j(x-a_j)^{q-2}+\nu_j x^{k-1}+f_{\leq k-2}^{(j)}(x)$ with $\lambda_j\in {\bf F}_q^*$, $\nu_j\in {\bf F}_q$ and $f_{\leq{k-2}}^{(j)}(x)\in{\bf F}_q[x]$ being a polynomial of degree at most $k-2$, then $(u_j(D), c_{k-1}(u_j(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if and only if the sum $\binom{q-2}{k-1}(-a_j)^{q-1-k}\prod\limits_{y\in I}(a_j-y)+e$ is nonzero for any subset $I\subseteq D$ with $\#(I)=k$, where $e$ is the identity of the group ${\bf F}_q^*$. This implies that $(u_j(D), c_{k-1}(u_j(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if $p|k$.
- Recent several years have witnessed the surge of asynchronous (async-) parallel computing methods due to the extremely big data involved in many modern applications and also the advancement of multi-core machines and computer clusters. In optimization, most works about async-parallel methods are on unconstrained problems or those with block separable constraints. In this paper, we propose an async-parallel method based on block coordinate update (BCU) for solving convex problems with nonseparable linear constraint. Running on a single node, the method becomes a novel randomized primal-dual BCU with adaptive stepsize for multi-block affinely constrained problems. For these problems, Gauss-Seidel cyclic primal-dual BCU needs strong convexity to have convergence. On the contrary, merely assuming convexity, we show that the objective value sequence generated by the proposed algorithm converges in probability to the optimal value and also the constraint residual to zero. In addition, we establish an ergodic $O(1/k)$ convergence result, where $k$ is the number of iterations. Numerical experiments are performed to demonstrate the efficiency of the proposed method and significantly better speed-up performance than its sync-parallel counterpart.
- The minimum feedback arc set problem asks to delete a minimum number of arcs (directed edges) from a digraph (directed graph) to make it free of any directed cycles. In this work we approach this fundamental cycle-constrained optimization problem by considering a generalized task of dividing the digraph into D layers of equal size. We solve the D-segmentation problem by the replica-symmetric mean field theory and belief-propagation heuristic algorithms. The minimum feedback arc density of a given random digraph ensemble is then obtained by extrapolating the theoretical results to the limit of large D. A divide-and-conquer algorithm (nested-BPR) is devised to solve the minimum feedback arc set problem with very good performance and high efficiency.
- May 16 2017 cs.AR arXiv:1705.04981v1In this paper, we investigate the challenges to apply Statistical Static Timing Analysis (SSTA) in hierarchical design flow, where modules supplied by IP vendors are used to hide design details for IP protection and to reduce the complexity of design and verification. For the three basic circuit types, combinational, flip-flop-based and latch-controlled, we propose methods to extract timing models which contain interfacing as well as compressed internal constraints. Using these compact timing models the runtime of full-chip timing analysis can be reduced, while circuit details from IP vendors are not exposed. We also propose a method to reconstruct the correlation between modules during full-chip timing analysis. This correlation can not be incorporated into timing models because it depends on the layout of the corresponding modules in the chip. In addition, we investigate how to apply the extracted timing models with the reconstructed correlation to evaluate the performance of the complete design. Experiments demonstrate that using the extracted timing models and reconstructed correlation full-chip timing analysis can be several times faster than applying the flattened circuit directly, while the accuracy of statistical timing analysis is still well maintained.
- Background: Mining gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. In this work, we explore the plausibility of detecting gene modules by factorizing gene-phenotype associations from a phenotype ontology rather than the conventionally used gene expression data. In particular, the hierarchical structure of ontology has not been sufficiently utilized in clustering genes while functionally related genes are consistently associated with phenotypes on the same path in the phenotype ontology. Results: We propose a hierarchal Nonnegative Matrix Factorization (NMF)-based method, called Consistent Multiple Nonnegative Matrix Factorization (CMNMF), to factorize genome-phenome association matrix at two levels of the hierarchical structure in phenotype ontology for mining gene functional modules. CMNMF constrains the gene clusters from the association matrices at two consecutive levels to be consistent since the genes are annotated with both the child phenotype and the parent phenotype in the consecutive levels. CMNMF also restricts the identified phenotype clusters to be densely connected in the phenotype ontology hierarchy. In the experiments on mining functionally related genes from mouse phenotype ontology and human phenotype ontology, CMNMF effectively improved clustering performance over the baseline methods. Gene ontology enrichment analysis was also conducted to reveal interesting gene modules. Conclusions: Utilizing the information in the hierarchical structure of phenotype ontology, CMNMF can identify functional gene modules with more biological significance than the conventional methods. CMNMF could also be a better tool for predicting members of gene pathways and protein-protein interactions. Availability: https://github.com/nkiip/CMNMF
- May 02 2017 cs.CV arXiv:1705.00609v1In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.
- There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals.
- We collect and analyze the darkweb (a.k.a. the "onionweb") hyperlink graph. We find properties highly dissimilar to the well-studied world wide web hyperlink graph; for example, our analysis finds that >87% of darkweb sites never link to another site. We compare our results to prior work on world-wide-web and speculate about reasons for their differences. We conclude that in the term "darkweb", the word "web" is a connectivity misnomer. Instead, it is more accurate to view the darkweb as a set of largely isolated dark silos.
- Apr 19 2017 cs.SY arXiv:1704.05411v2Microgrids are resources that can be used to restore critical loads after a natural disaster, enhancing resilience of a distribution network. To deal with the stochastic nature of intermittent energy resources, such as wind turbines (WTs) and photovoltaics (PVs), many methods rely on forecast information. However, some microgrids may not be equipped with power forecasting tools. To fill this gap, a risk-limiting strategy based on measurements is proposed. Gaussian mixture model (GMM) is used to represent a prior joint probability density function (PDF) of power outputs of WTs and PVs over multiple periods. As time rolls forward, the distribution of WT/PV generation is updated based the latest measurement data in a recursive manner. The updated distribution is used as an input for the risk-limiting load restoration problem, enabling an equivalent transformation of the original chance constrained problem into a mixed integer linear programming (MILP). Simulation cases on a distribution system with three microgrids demonstrate the effectiveness of the proposed method. Results also indicate that networked microgrids have better uncertainty management capabilities than stand-alone microgrids.
- For quantitative structure-property relationship (QSPR) studies in chemoinformatics, it is important to get interpretable relationship between chemical properties and chemical features. However, the predictive power and interpretability of QSPR models are usually two different objectives that are difficult to achieve simultaneously. A deep learning architecture using molecular graph encoding convolutional neural networks (MGE-CNN) provided a universal strategy to construct interpretable QSPR models with high predictive power. Instead of using application-specific preset molecular descriptors or fingerprints, the models can be resolved using raw and pertinent features without manual intervention or selection. In this study, we developed acute oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a case study. Three types of high-level predictive models: regression model (deepAOT-R), multi-classification model (deepAOT-C) and multi-task model (deepAOT-CR) for AOT evaluation were constructed. These models highly outperformed previously reported models. For the two external datasets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively.
- Apr 18 2017 cs.CV arXiv:1704.04613v2Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading system. Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification. In our framework, the attention mechanism is adopted to reveal the relevance between each recognized word and the given image, which further enhances the recognition performance. We have performed experiments on two datasets: Con-Text dataset and Drink Bottle dataset, that are proposed for fine-grained classification of business places and drink bottles, respectively. The experimental results consistently demonstrate that the proposed method combining textual and visual cues significantly outperforms classification with only visual representations. Moreover, we have shown that the learned representation improves the retrieval performance on the drink bottle images by a large margin, making it potentially useful in product search.
- Apr 18 2017 cs.DS arXiv:1704.04615v2Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text is widely accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g., suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text index, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are particularly slow, especially for the patterns with many occurrences. Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual quadtree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the quadtree. Compared with the existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient.
- Recently manifold learning algorithm for dimensionality reduction attracts more and more interests, and various linear and nonlinear, global and local algorithms are proposed. The key step of manifold learning algorithm is the neighboring region selection. However, so far for the references we know, few of which propose a generally accepted algorithm to well select the neighboring region. So in this paper, we propose an adaptive neighboring selection algorithm, which successfully applies the LLE and ISOMAP algorithms in the test. It is an algorithm that can find the optimal K nearest neighbors of the data points on the manifold. And the theoretical basis of the algorithm is the approximated curvature of the data point on the manifold. Based on Riemann Geometry, Jacob matrix is a proper mathematical concept to predict the approximated curvature. By verifying the proposed algorithm on embedding Swiss roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results show that the proposed adaptive neighboring selection algorithm is feasible and able to find the optimal value of K, making the residual variance relatively small and better visualization of the results. By quantitative analysis, the embedding quality measured by residual variance is increased 45.45% after using the proposed algorithm in LLE.
- Traditionally, Internet Access Providers (APs) only charge end-users for Internet access services; however, to recoup infrastructure costs and increase revenues, some APs have recently adopted two-sided pricing schemes under which both end-users and content providers are charged. Meanwhile, with the rapid growth of traffic, network congestion could seriously degrade user experiences and influence providers' utility. To optimize profit and social welfare, APs and regulators need to design appropriate pricing strategies and regulatory policies that take the effects of network congestion into consideration. In this paper, we model two-sided networks under which users' traffic demands are influenced by exogenous pricing and endogenous congestion parameters and derive the system congestion under an equilibrium. We characterize the structures and sensitivities of profit- and welfare-optimal two-sided pricing schemes and reveal that 1) the elasticity of system throughput plays a crucial role in determining the structures of optimal pricing, 2) the changes of optimal pricing under varying AP's capacity and users' congestion sensitivity are largely driven by the type of data traffic, e.g., text or video, and 3) APs and regulators will be incentivized to shift from one-sided to two-sided pricing when APs' capacities and user demand for video traffic grow. Our results can help APs design optimal two-sided pricing and guide regulators to legislate desirable policies.
- In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is $O(k\log(pq)+\log(k!))$, and the necessary number of samples is $\Omega(k\log (pq)-\log(k!))$. We further propose a greedy algorithm for regression, and evaluate its effectiveness through synthetic as well as real-world experiments.
- Mar 30 2017 cs.LG arXiv:1703.10094v2The inverse mapping of GANs'(Generative Adversarial Nets) generator has a great potential value.Hence, some works have been developed to construct the inverse function of generator by directly learning or adversarial learning.While the results are encouraging, the problem is highly challenging and the existing ways of training inverse models of GANs have many disadvantages, such as hard to train or poor performance.Due to these reasons, we propose a new approach based on using inverse generator ($IG$) model as encoder and pre-trained generator ($G$) as decoder of an AutoEncoder network to train the $IG$ model. In the proposed model, the difference between the input and output, which are both the generated image of pre-trained GAN's generator, of AutoEncoder is directly minimized. The optimizing method can overcome the difficulty in training and inverse model of an non one-to-one function.We also applied the inverse model of GANs' generators to image searching and translation.The experimental results prove that the proposed approach works better than the traditional approaches in image searching.
- Mar 22 2017 cs.SD arXiv:1703.07172v1We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters. This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target. Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based post-processing in this work. A series of experiments show that joint LPS and MFCC learning improves the SE performance, and IBM-based post-processing further enhances listening quality of the reconstructed speech.
- Mar 20 2017 cs.SD arXiv:1703.06052v1Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (EER) reduced from 0.13 to 0.11, compared with the convolutional recurrent baseline system.
- Mar 09 2017 cs.DS arXiv:1703.02693v1This paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority Sample and Hold. PrSH combines aspects of the well-known Sample and Hold algorithm with Priority Sampling. In particular, it achieves a reduced computational cost for rate adaptation in a fixed cache by using a single persistent random variable across the lifetime of each key in the cache. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PrSH. We prove that PrSH provides unbiased estimates of the true aggregates. We analyze the computational complexity of PrSH and its variants, and provide a detailed evaluation of its accuracy on synthetic and trace data. Weighted relative error is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive Sample and Hold; there is also substantial improvement for rank queries.
- Mar 01 2017 cs.GT arXiv:1702.08794v1The recent online platforms propose multiple items for bidding. The state of the art, however, is limited to the analysis of one item auction without resubmission. In this paper we study multi-item lowest unique bid auctions (LUBA) with resubmission in discrete bid spaces under budget constraints. We show that the game does not have pure Bayes-Nash equilibria (except in very special cases). However, at least one mixed Bayes-Nash equilibria exists for arbitrary number of bidders and items. The equilibrium is explicitly computed for two-bidder setup with resubmission possibilities. In the general setting we propose a distributed strategic learning algorithm to approximate equilibria. Computer simulations indicate that the error quickly decays in few number of steps. When the number of bidders per item follows a Poisson distribution, it is shown that the seller can get a non-negligible revenue on several items, and hence making a partial revelation of the true value of the items. Finally, the attitude of the bidders towards the risk is considered. In contrast to risk-neutral agents who bids very small values, the cumulative distribution and the bidding support of risk-sensitive agents are more distributed.
- Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
- Feb 22 2017 cs.DS arXiv:1702.06256v2The maximum duo-preservation string mapping (Max-Duo) problem is the complement of the well studied minimum common string partition (MCSP) problem, both of which have applications in many fields including text compression and bioinformatics. $k$-Max-Duo is the restricted version of Max-Duo, where every letter of the alphabet occurs at most $k$ times in each of the strings, which is readily reduced into the well known maximum independent set (MIS) problem on a graph of maximum degree $\Delta \le 6(k-1)$. In particular, $2$-Max-Duo can then be approximated arbitrarily close to $1.8$ using the state-of-the-art approximation algorithm for the MIS problem. $2$-Max-Duo was proved APX-hard and very recently a $(1.6 + \epsilon)$-approximation was claimed, for any $\epsilon > 0$. In this paper, we present a vertex-degree reduction technique, based on which, we show that $2$-Max-Duo can be approximated arbitrarily close to $1.4$.
- Block Coordinate Update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal-dual BCU method for solving linearly constrained convex program in multi-block variables. The method is an accelerated version of a primal-dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an $O(1/t)$ convergence rate under weak convexity assumption. We show that the rate can be accelerated to $O(1/t^2)$ if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.
- We study the \em maximum duo-preservation string mapping (\sc Max-Duo) problem, which is the complement of the well studied \em minimum common string partition (\sc MCSP) problem. Both problems have applications in many fields including text compression and bioinformatics. Motivated by an earlier local search algorithm, we present an improved approximation and show that its performance ratio is no greater than ${35}/{12} < 2.917$. This beats the current best $3.25$-approximation for \sc Max-Duo. The performance analysis of our algorithm is done through a complex yet interesting amortization. Two lower bounds on the locality gap of our algorithm are also provided.
- Jan 31 2017 cs.AI arXiv:1701.08665v1Based on the in-depth analysis of the essence and features of vague phenomena, this paper focuses on establishing the axiomatical foundation of membership degree theory for vague phenomena, presents an axiomatic system to govern membership degrees and their interconnections. On this basis, the concept of vague partition is introduced, further, the concept of fuzzy set introduced by Zadeh in 1965 is redefined based on vague partition from the perspective of axiomatization. The thesis defended in this paper is that the relationship among vague attribute values should be the starting point to recognize and model vague phenomena from a quantitative view.
- Jan 30 2017 cs.CV arXiv:1701.08006v1Naturalness of warping is gaining extensive attention in image stitching. Recent warps such as SPHP, AANAP and GSP, use a global similarity to effectively mitigate projective distortion (which enlarges regions), however, they necessarily bring in perspective distortion (which generates inconsistency). In this paper, we propose a quasi-homography warp, which balances perspective distortion against projective distortion in the non-overlapping region, to create natural-looking mosaics. Our approach formulates the warp as a solution of a system of bivariate equations, where perspective distortion and projective distortion are characterized as slope preservation and scale linearization respectively. Our proposed warp only relies on a global homography thus is totally parameter-free. A comprehensive experiment shows that quasi-homography outperforms some state-of-the-art warps in urban scenes, including homography, AutoStitch and SPHP. A user study demonstrates that quasi-homography wins most users' favor as well, comparing to homography and SPHP.
- Jan 17 2017 cs.NI arXiv:1701.04076v1As Internet applications have become more diverse in recent years, users having heavy demand for online video services are more willing to pay higher prices for better services than light users that mainly use e-mails and instant messages. This encourages the Internet Service Providers (ISPs) to explore service differentiations so as to optimize their profits and allocation of network resources. Much prior work has focused on the viability of network service differentiation by comparing with the case of a single-class service. However, the optimal service differentiation for an ISP subject to resource constraints has remained unsolved. In this work, we establish an optimal control framework to derive the analytical solution to an ISP's optimal service differentiation, i.e. the optimal service qualities and associated prices. By analyzing the structures of the solution, we reveal how an ISP should adjust the service qualities and prices in order to meet varying capacity constraints and users' characteristics. We also obtain the conditions under which ISPs have strong incentives to implement service differentiation and whether regulators should encourage such practices.
- In this paper, we propose three novel models to enhance word embedding by implicitly using morphological information. Experiments on word similarity and syntactic analogy show that the implicit models are superior to traditional explicit ones. Our models outperform all state-of-the-art baselines and significantly improve the performance on both tasks. Moreover, our performance on the smallest corpus is similar to the performance of CBOW on the corpus which is five times the size of ours. Parameter analysis indicates that the implicit models can supplement semantic information during the word embedding training process.
- Jan 05 2017 cs.CV arXiv:1701.00794v1In this paper, we develop a new weakly-supervised learning algorithm to learn to segment cancerous regions in histopathology images. Our work is under a multiple instance learning framework (MIL) with a new formulation, deep weak supervision (DWS); we also propose an effective way to introduce constraints to our neural networks to assist the learning process. The contributions of our algorithm are threefold: (1) We build an end-to-end learning system that segments cancerous regions with fully convolutional networks (FCN) in which image-to-image weakly-supervised learning is performed. (2) We develop a deep week supervision formulation to exploit multi-scale learning under weak supervision within fully convolutional networks. (3) Constraints about positive instances are introduced in our approach to effectively explore additional weakly-supervised information that is easy to obtain and enjoys a significant boost to the learning process. The proposed algorithm, abbreviated as DWS-MIL, is easy to implement and can be trained efficiently. Our system demonstrates state-of-the-art results on large-scale histopathology image datasets and can be applied to various applications in medical imaging beyond histopathology images such as MRI, CT, and ultrasound images.
- Integrated Computational Materials Engineering (ICME) aims to accelerate optimal design of complex material systems by integrating material science and design automation. For tractable ICME, it is required that (1) a structural feature space be identified to allow reconstruction of new designs, and (2) the reconstruction process be property-preserving. The majority of existing structural presentation schemes rely on the designer's understanding of specific material systems to identify geometric and statistical features, which could be biased and insufficient for reconstructing physically meaningful microstructures of complex material systems. In this paper, we develop a feature learning mechanism based on convolutional deep belief network to automate a two-way conversion between microstructures and their lower-dimensional feature representations, and to achieves a 1000-fold dimension reduction from the microstructure space. The proposed model is applied to a wide spectrum of heterogeneous material systems with distinct microstructural features including Ti-6Al-4V alloy, Pb63-Sn37 alloy, Fontainebleau sandstone, and Spherical colloids, to produce material reconstructions that are close to the original samples with respect to 2-point correlation functions and mean critical fracture strength. This capability is not achieved by existing synthesis methods that rely on the Markovian assumption of material microstructures.
- Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for arbitrarily large delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.
- Dec 07 2016 cs.LG arXiv:1612.01663v1In this paper, we address learning problems for high dimensional data. Previously, oblivious random projection based approaches that project high dimensional features onto a random subspace have been used in practice for tackling high-dimensionality challenge in machine learning. Recently, various non-oblivious randomized reduction methods have been developed and deployed for solving many numerical problems such as matrix product approximation, low-rank matrix approximation, etc. However, they are less explored for the machine learning tasks, e.g., classification. More seriously, the theoretical analysis of excess risk bounds for risk minimization, an important measure of generalization performance, has not been established for non-oblivious randomized reduction methods. It therefore remains an open problem what is the benefit of using them over previous oblivious random projection based approaches. To tackle these challenges, we propose an algorithmic framework for employing non-oblivious randomized reduction method for general empirical risk minimizing in machine learning tasks, where the original high-dimensional features are projected onto a random subspace that is derived from the data with a small matrix approximation error. We then derive the first excess risk bound for the proposed non-oblivious randomized reduction approach without requiring strong assumptions on the training data. The established excess risk bound exhibits that the proposed approach provides much better generalization performance and it also sheds more insights about different randomized reduction approaches. Finally, we conduct extensive experiments on both synthetic and real-world benchmark datasets, whose dimension scales to $O(10^7)$, to demonstrate the efficacy of our proposed approach.
- Dec 02 2016 cs.AI arXiv:1612.00094v1In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.
- Nov 29 2016 cs.CV arXiv:1611.08983v9Sparse coding has achieved a great success in various image processing studies. However, there is not any benchmark to measure the sparsity of image patch/group because sparse discriminant conditions cannot keep unchanged. This paper analyzes the sparsity of group based on the strategy of the rank minimization. Firstly, an adaptive dictionary for each group is designed. Then, we prove that group-based sparse coding is equivalent to the rank minimization problem, and thus the sparse coefficient of each group is measured by estimating the singular values of each group. Based on that measurement, the weighted Schatten $p$-norm minimization (WSNM) has been found to be the closest solution to the real singular values of each group. Thus, WSNM can be equivalently transformed into a non-convex $\ell_p$-norm minimization problem in group-based sparse coding. To make the proposed scheme tractable and robust, the alternating direction method of multipliers (ADMM) is used to solve the $\ell_p$-norm minimization problem. Experimental results on two applications: image inpainting and image compressive sensing (CS) recovery have shown that the proposed scheme outperforms many state-of-the-art methods.
- Nov 23 2016 cs.CV arXiv:1611.07143v2This paper proposes a multi-level feature learning framework for human action recognition using a single body-worn inertial sensor. The framework consists of three phases, respectively designed to analyze signal-based (low-level), components (mid-level) and semantic (high-level) information. Low-level features capture the time and frequency domain property while mid-level representations learn the composition of the action. The Max-margin Latent Pattern Learning (MLPL) method is proposed to learn high-level semantic descriptions of latent action patterns as the output of our framework. The proposed method achieves the state-of-the-art performances, 88.7%, 98.8% and 72.6% (weighted F1 score) respectively, on Skoda, WISDM and OPP datasets.
- Nov 22 2016 cs.CV arXiv:1611.06661v2We propose a new image instance segmentation method that segments individ- ual glands (instances) in colon histology images. This process is challenging since the glands not only need to be segmented from a complex background, they must also be individually identified. We leverage the idea of image-to-image prediction in recent deep learning by designing an algorithm that automatically exploits and fuses complex multichannel information - regional, location and boundary cues - in gland histology images. Our proposed algorithm, a deep multichannel framework, alleviates heavy feature design due to the use of con- volutional neural networks and is able to meet multifarious requirements by altering channels. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge and other currently prevalent instance segmentation methods, we observe state-of-the-art results based on the evaluation metrics. Keywords: Instance segmentation, convolutional neural networks, segmentation, multichannel, histology image.
- Nov 21 2016 cs.CV arXiv:1611.06159v1In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos in East Asian languages. Our end-to-end system consists of multiple stages. Subtitles are firstly detected by a novel image operator based on the sequence information of consecutive video frames. Then, an ensemble of Convolutional Neural Networks (CNNs) trained on synthetic data is adopted for detecting and recognizing East Asian characters. Finally, a dynamic programming approach leveraging language models is applied to constitute results of the entire body of text lines. The proposed system achieves average end-to-end accuracies of 98.2% and 98.3% on 40 videos in Simplified Chinese and 40 videos in Traditional Chinese respectively, which is a significant outperformance of other existing methods. The near-perfect accuracy of our system dramatically narrows the gap between human cognitive ability and state-of-the-art algorithms used for such a task.
- Oct 26 2016 cs.SY arXiv:1610.07687v1Air conditioning systems are responsible for the major percentage of energy consumption in buildings. Shared spaces constitute considerable office space area, in which most office employees perform their meetings and daily tasks, and therefore the ACs in these areas have significant impact on the energy usage of the entire office building. The cost of this energy consumption, however, is not paid by the shared space users, and the AC's temperature set-point is not determined based on the users' preferences. This latter factor is compounded by the fact that different people may have different choices of temperature set-points and sensitivities to change of temperature. Therefore, it is a challenging task to design an office policy to decide on a particular set-point based on such a diverse preference set. As a result, users are not aware of the energy consumption in shared spaces, which may potentially increase the energy wastage and related cost of office buildings. In this context, this paper proposes an energy policy for an office shared space by exploiting an established temperature control mechanism. In particular, we choose meeting rooms in an office building as the test case and design a policy according to which each user of the room can give a preference on the temperature set-point and is paid for felt discomfort if the set-point is not fixed according to the given preference. On the other hand, users who enjoy the thermal comfort compensate the other users of the room. Thus, the policy enables the users to be cognizant and responsible for the payment on the energy consumption of the office space they are sharing, and at the same time ensures that the users are satisfied either via thermal comfort or through incentives. The policy is also shown to be beneficial for building management. Through experiment based case studies, we show the effectiveness of the proposed policy.
- Neural networks are among the state-of-the-art techniques for language modeling. Existing neural language models typically map discrete words to distributed, dense vector representations. After information processing of the preceding context words by hidden layers, an output layer estimates the probability of the next word. Such approaches are time- and memory-intensive because of the large numbers of parameters for word embeddings and the output layer. In this paper, we propose to compress neural language models by sparse word representations. In the experiments, the number of parameters in our model increases very slowly with the growth of the vocabulary size, which is almost imperceptible. Moreover, our approach not only reduces the parameter space to a large extent, but also improves the performance in terms of the perplexity measure.
- Oct 07 2016 cs.SD arXiv:1610.01797v1Audio tagging aims to assign one or several tags to an audio clip. Most of the datasets are weakly labelled, which means only the tags of the clip are known, without knowing the occurrence time of the tags. The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user. Previous works have used the bag of frames model assume the tags occur all the time, which is not the case in practice. We propose a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously. The JDC model has the ability to attend to informative and ignore uninformative sounds. Then only informative regions are used for classification. Experimental results on the "CHiME Home" dataset show that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%. More interestingly, the audio event detector is trained successfully without needing the event level label.
- Neuromorphic chip refers to an unconventional computing architecture that is modelled on biological brains. It is ideally suited for processing sensory data for intelligence computing, decision-making or context cognition. Despite rapid development, conventional artificial synapses exhibit poor connection flexibility and require separate data acquisition circuitry, resulting in limited functionalities and significant hardware redundancy. Here we report a novel light-stimulated artificial synapse based on a graphene-nanotube hybrid phototransistor that can directly convert optical stimuli into a "neural image" for further neuronal analysis. Our optically-driven synapses involve multiple steps of plasticity mechanisms and importantly exhibit flexible tuning of both short- and long-term plasticity. Furthermore, our neuromorphic phototransistor can take multiple pre-synaptic light stimuli via wavelength-division multiplexing and allows advanced optical processing through charge-trap-mediated optical coupling. The capability of complex neuromorphic functionalities in a simple silicon-compatible device paves the way for novel neuromorphic computing architectures involving photonics.
- Understanding the real achievable performance of mobile ad hoc networks (MANETs) under practical network constraints is of great importance for their applications in future highly heterogeneous wireless network environments. This paper explores, for the first time, the performance modeling for MANETs under a general limited buffer constraint, where each network node maintains a limited source buffer of size $B_s$ to store its locally generated packets and also a limited shared relay buffer of size $B_r$ to store relay packets for other nodes. Based on the Queuing theory and birth-death chain theory, we first develop a general theoretical framework to fully depict the source/relay buffer occupancy process in such a MANET, which applies to any distributed MAC protocol and any mobility model that leads to the uniform distribution of nodes' locations in steady state. With the help of this framework, we then derive the exact expressions of several key network performance metrics, including achievable throughput, throughput capacity, and expected end-to-end delay. We further conduct case studies under two network scenarios and provide the corresponding theoretical/simulation results to demonstrate the application as well as the efficiency of our theoretical framework. Finally, we present extensive numerical results to illustrate the impacts of buffer constraint on the performance of a buffer-limited MANET.
- The application of physical layer security in ad hoc networks has attracted considerable academic attention recently. However, the available studies mainly focus on the single-hop and two-hop network scenarios, and the price in terms of degradation of communication quality of service (QoS) caused by improving security is largely uninvestigated. As a step to address these issues, this paper explores the physical layer security-aware routing and performance tradeoffs in a multi-hop ad hoc network. Specifically, for any given end-to-end path we first derive its connection outage probability (COP) and secrecy outage probability (SOP) in closed-form, which serve as the performance metrics of communication QoS and transmission security, respectively. Based on the closed-form expressions, we then study the security-QoS tradeoffs to minimize COP (resp. SOP) conditioned on that SOP (resp. COP) is guaranteed. With the help of analysis of a given path, we further propose the routing algorithms which can achieve the optimal performance tradeoffs for any pair of source and destination nodes in a distributed manner. Finally, simulation and numerical results are presented to validate the efficiency of our theoretical analysis, as well as to illustrate the security-QoS tradeoffs and the routing performance.
- This paper investigates the compress-and-forward scheme for an uplink cloud radio access network (C-RAN) model, where multi-antenna base-stations (BSs) are connected to a cloud-computing based central processor (CP) via capacity-limited fronthaul links. The BSs compress the received signals with Wyner-Ziv coding and send the representation bits to the CP; the CP performs the decoding of all the users' messages. Under this setup, this paper makes progress toward the optimal structure of the fronthaul compression and CP decoding strategies for the compress-and-forward scheme in C-RAN. On the CP decoding strategy design, this paper shows that under a sum fronthaul capacity constraint, a generalized successive decoding strategy of the quantization and user message codewords that allows arbitrary interleaved order at the CP achieves the same rate region as the optimal joint decoding. Further, it is shown that a practical strategy of successively decoding the quantization codewords first, then the user messages, achieves the same maximum sum rate as joint decoding under individual fronthaul constraints. On the joint optimization of user transmission and BS quantization strategies, this paper shows that if the input distributions are assumed to be Gaussian, then under joint decoding, the optimal quantization scheme for maximizing the achievable rate region is Gaussian. Moreover, Gaussian input and Gaussian quantization with joint decoding achieve to within a constant gap of the capacity region of the Gaussian multiple-input multiple-output (MIMO) uplink C-RAN model. Finally, this paper addresses the computational aspect of optimizing uplink MIMO C-RAN by showing that under fixed Gaussian input, the sum rate maximization problem over the Gaussian quantization noise covariance matrices can be formulated as convex optimization problems, thereby facilitating its efficient solution.
- We study the problem of estimating the continuous response over time to interventions using observational time series---a retrospective dataset where the policy by which the data are generated is unknown to the learner. We are motivated by applications where response varies by individuals and therefore, estimating responses at the individual-level is valuable for personalizing decision-making. We refer to this as the problem of estimating individualized treatment response (ITR) curves. In statistics, G-computation formula (Robins, 1986) has been commonly used for estimating treatment responses from observational data containing sequential treatment assignments. However, past studies have focused predominantly on obtaining point-in-time estimates at the population level. We leverage the G-computation formula and develop a novel Bayesian nonparametric (BNP) method that can flexibly model functional data and provide posterior inference over the treatment response curves at both the individual and population level. On a challenging dataset containing time series from patients admitted to a hospital, we estimate responses to treatments used in managing kidney function and show that the resulting fits are more accurate than alternative approaches. Accurate methods for obtaining ITRs from observational data can dramatically accelerate the pace at which personalized treatment plans become possible.
- Aug 12 2016 cs.NI arXiv:1608.03380v1Millimeter wave (mmWave) systems are emerging as an essential technology to enable extremely high data rate wireless communications. The main limiting factors of mmWave systems are blockage (high penetration loss) and deafness (misalignment between the beams of the transmitter and receiver). To alleviate these problems, it is imperative to incorporate efficient association and relaying between terminals and access points. Unfortunately, the existing association techniques are designed for the traditional interference-limited networks, and thus are highly suboptimal for mmWave communications due to narrow-beam operations and the resulting non-negligible interference-free behavior. This paper introduces a distributed approach that solves the joint association and relaying problem in mmWave networks considering the load balancing at access points. The problem is posed as a novel stochastic optimization problem, which is solved by distributed auction algorithms where the clients and relays act asynchronously to achieve optimal client-relay-access point association. It is shown that the algorithms provably converge to a solution that maximizes the aggregate logarithmic utility within a desired bound. Numerical results allow to quantify the performance enhancements introduced by the relays, and the substantial improvements of the network throughput and fairness among the clients by the proposed association method as compared to standard approaches. It is concluded that mmWave communications with proper association and relaying mechanisms can support extremely high data rates, connection reliability, and fairness among the clients.
- Aug 05 2016 cs.CV arXiv:1608.01536v1Saliency integration approaches have aroused general concern on unifying saliency maps from multiple saliency models. In fact, saliency integration is a weighted aggregation of multiple saliency maps, such that measuring the weights of saliency models is essential. In this paper, we propose an unsupervised model for saliency integration, namely the arbitrator model (AM), based on the Bayes' probability theory. The proposed AM incorporates saliency models of varying expertise and a prior map based on the consistency of the evidence from multiple saliency models and a reference saliency map from generally accepted knowledge. Also, we suggest two methods to learn the expertise of the saliency models without ground truth. The experimental results are from various combinations of twenty-two state-of-the-art saliency models on five datasets. The evaluation results show that the AM model improves the performance substantially compared to the existing state-of-the-art approaches, regardless of the chosen candidate saliency models.
- A single-letter lower bound on the sum rate of multiple description coding with tree-structured distortion constraints is established by generalizing Ozarow's celebrated converse argument through the introduction of auxiliary random variables that form a Markov tree. For the quadratic vector Gaussian case, this lower bound is shown to be achievable by an extended version of the El Gamal-Cover scheme, yielding a complete sum-rate characterization.
- Jul 28 2016 cs.SI physics.soc-ph arXiv:1607.08203v1Information technologies today can inform each of us about the best alternatives for shortest paths from origins to destinations, but they do not contain incentives or alternatives that manage the information efficiently to get collective benefits. To obtain such benefits, we need to have not only good estimates of how the traffic is formed but also to have target strategies to reduce enough vehicles from the best possible roads in a feasible way. The opportunity is that during large events the traffic inconveniences in large cities are unusually high, yet temporary, and the entire population may be more willing to adopt collective recommendations for social good. In this paper, we integrate for the first time big data resources to quantify the impact of events and propose target strategies for collective good at urban scale. In the context of the Olympic Games in Rio de Janeiro, we first predict the expected increase in traffic. To that end, we integrate data from: mobile phones, Airbnb, Waze, and transit information, with game schedules and information of venues. Next, we evaluate the impact of the Olympic Games to the travel of commuters, and propose different route choice scenarios during the peak hours. Moreover, we gather information on the trips that contribute the most to the global congestion and that could be redirected from vehicles to transit. Interestingly, we show that (i) following new route alternatives during the event with individual shortest path can save more collective travel time than keeping the routine routes, uncovering the positive value of information technologies during events; (ii) with only a small proportion of people selected from specific areas switching from driving to public transport, the collective travel time can be reduced to a great extent. Results are presented on-line for the evaluation of the public and policy makers.
- Jul 19 2016 cs.CV arXiv:1607.04889v2In this paper, we propose a new image instance segmentation method that segments individual glands (instances) in colon histology images. This is a task called instance segmentation that has recently become increasingly important. The problem is challenging since not only do the glands need to be segmented from the complex background, they are also required to be individually identified. Here we leverage the idea of image-to-image prediction in recent deep learning by building a framework that automatically exploits and fuses complex multichannel information, regional, location and boundary patterns in gland histology images. Our proposed system, deep multichannel framework, alleviates heavy feature design due to the use of convolutional neural networks and is able to meet multifarious requirement by altering channels. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge and other currently prevalent methods of instance segmentation, we observe state-of-the-art results based on a number of evaluation metrics.
- Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classification task. For the acoustic modeling, a large set of contextual frames of the chunk are fed into the DNN to perform a multi-label classification for the expected tags, considering that only chunk (or utterance) level rather than frame-level labels are available. Dropout and background noise aware training are also adopted to improve the generalization capability of the DNNs. For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features. The new features, which are smoothed against background noise and more compact with contextual information, can further improve the performance of the DNN baseline. Compared with the standard Gaussian Mixture Model (GMM) baseline of the DCASE 2016 audio tagging challenge, our proposed method obtains a significant equal error rate (EER) reduction from 0.21 to 0.13 on the development set. The proposed aDAE system can get a relative 6.7% EER reduction compared with the strong DNN baseline on the development set. Finally, the results also show that our approach obtains the state-of-the-art performance with 0.15 EER on the evaluation set of the DCASE 2016 audio tagging task while EER of the first prize of this challenge is 0.17.
- In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework. Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds. Firstly, the parameters of the DNN are initialized by the proposed hierarchical pre-training. Multi-level objective function is then adopted to add more constraint on the cross-entropy based loss function. A series of experiments were conducted on the Task1 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. The final DNN-based system achieved a 22.9% relative improvement on average scene classification error as compared with the Gaussian Mixture Model (GMM)-based benchmark system across four standard folds.
- Jul 13 2016 cs.CV arXiv:1607.03222v2In this paper, we propose a new image instance segmentation method that segments individual glands (instances) in colon histology images. This is a task called instance segmentation that has recently become increasingly important. The problem is challenging since not only do the glands need to be segmented from the complex background, they are also required to be individually identified. Here we leverage the idea of image-to-image prediction in recent deep learning by building a framework that automatically exploits and fuses complex multichannel information, regional and boundary patterns, with side supervision (deep supervision on side responses) in gland histology images. Our proposed system, deep multichannel side supervision (DMCS), alleviates heavy feature design due to the use of convolutional neural networks guided by side supervision. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge, we observe state-of-the-art results based on a number of evaluation metrics.
- In this paper, we firstly propose a novel construction of $16$-quadrature amplitude modulation (QAM) near-complementary sequences with low peak-to-mean envelope power ratio (PMEPR) in orthogonal frequency division multiplexing (OFDM) systems. The proposed $16$-QAM near-complementary sequences can be constructed by utilizing novel nonlinear offsets, where the length of the sequences is $n=2^m$. The family size of the newly constructed $16$-QAM near-complementary sequences is $8\times (\frac{m!}{2})\times 4^{m+1}$, and the PMEPR of these sequences is proven to satisfy ${\textrm{PMEPR}}\leq 2.4$. Thus, the proposed construction can generate a number of $16$-QAM near-complementary sequences with low PMEPR, resulting in the improvement of the code rate in OFDM systems. Furthermore, we also propose a novel construction of $64$-QAM near-complementary sequences with low PMEPR, which is the first proven construction of $64$-QAM near-complementary sequences. The PMEPRs of two types of the proposed $64$-QAM near-complementary sequences are proven to satisfy that ${\textrm{PMEPR}}\leq 3.62$ or ${\textrm{PMEPR}}\leq 2.48$, respectively. The family size of the newly constructed $64$-QAM near-complementary sequences is $64\times (\frac{m!}{2})\times 4^{m+1}$.
- Jul 08 2016 cs.NI arXiv:1607.02045v1Wireless Mesh Networks (WMNs) technology has been used in recent years for broadband access in both cities and rural areas. A key development is to equip routers with multiple directional antennas so that these routers can transmit to, or receive from multiple neighbors simultaneously. The Multi-Transmit-Receive (MTR) feature can boost network capacity significantly if suitable scheduling policy is applied. In this paper, we propose a distributed link scheduler called PCP-TDMA that fully utilizes the MTR capability. In particular, it activates every link at least once within the shortest period of time. We evaluated the performance of PCP-TDMA in various network topologies, and compared it against a centralized algorithm called ALGO-2, and two distributed approaches: JazzyMAC and ROMA. The results show that PCP-TDMA achieves similar performance with the centralized algorithm in all scenarios, and outperforms the distributed approaches significantly. Specifically, in a fully connected network, the resulting superframe length of PCP-TDMA is less than 1/3 and 1/2 of JazzyMAC and ROMA, respectively.