results for au:Xu_Y in:cs

- Existing block-diagonal representation researches mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intra-class representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image datasets, three character datasets, and the fifteen scene multi-categories dataset. It not only shows superior potential on image recognition but also outperforms state-of-the-art methods.
- This paper investigates the application of simultaneous wireless information and power transfer (SWIPT) to cooperative non-orthogonal multiple access (NOMA). A new cooperative multiple-input single-output (MISO) SWIPT NOMA protocol is proposed, where a user with a strong channel condition acts as an energy-harvesting (EH) relay to help a user with a poor channel condition. The power splitting (PS) scheme is adopted at the EH relay. By jointly optimizing the PS ratio and the beamforming vectors, the design objective is to maximize the data rate of the "strong user" while satisfying the QoS requirement of the "weak user". It boils down to a challenging nonconvex problem. To resolve this issue, the semidefinite relaxation (SDR) technique is applied to relax the quadratic terms related with the beamformers, and then it is solved to its global optimality by two-dimensional exhaustive search. We prove the rank-one optimality, which establishes the equivalence between the relaxed problem and the original one. To further reduce the high complexity due to the exhaustive search, an iterative algorithm based on successive convex approximation (SCA) is proposed, which can at least attain its stationary point efficiently. In view of the potential application scenarios, e.g., IoT, the single-input single-output (SISO) case of the cooperative SWIPT NOMA system is also studied. The formulated problem is proved to be strictly unimodal with respect to the PS ratio. Hence, a golden section search (GSS) based algorithm with closed-form solution at each step is proposed to find the unique global optimal solution. It is worth pointing out that the SCA method can also converge to the optimal solution in SISO cases. In the numerical simulation, the proposed algorithm is numerically shown to converge within a few iterations, and the SWIPT-aided NOMA protocol outperforms the existing transmission protocols.
- This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.
- May 30 2017 cs.NI arXiv:1705.09999v1Network Function Virtualization (NFV) shed new light for the design, deployment, and management of cloud networks. Many network functions such as firewalls, load balancers, and intrusion detection systems can be virtualized by servers. However, network operators often have to sacrifice programmability in order to achieve high throughput, especially at networks' edge where complex network functions are required. Here, we design, implement, and evaluate Hybrid Modular Switch (HyMoS). The hybrid hardware/software switch is designed to meet requirements for modern-day NFV applications in providing high-throughput, with a high degree of programmability. HyMoS utilizes P4-compatible Network Interface Cards (NICs), PCI Express interface and CPU to act as line cards, switch fabric, and fabric controller respectively. In our implementation of HyMos, PCI Express interface is turned into a non-blocking switch fabric with a throughput of hundreds of Gigabits per second. Compared to existing NFV infrastructure, HyMoS offers modularity in hardware and software as well as a higher degree of programmability by supporting a superset of P4 language.
- Determining deep holes is an important topic in decoding Reed-Solomon codes. Let $l\ge 1$ be an integer and $a_1,\ldots,a_l$ be arbitrarily given $l$ distinct elements of the finite field ${\bf F}_q$ of $q$ elements with the odd prime number $p$ as its characteristic. Let $D={\bf F}_q\backslash\{a_1,\ldots,a_l\}$ and $k$ be an integer such that $2\le k\le q-l-1$. In this paper, we study the deep holes of generalized projective Reed-Solomon code ${\rm GPRS}_q(D, k)$ of length $q-l+1$ and dimension $k$ over ${\bf F}_q$. For any $f(x)\in {\bf F}_q[x]$, we let $f(D)=(f(y_1),\ldots,f(y_{q-l}))$ if $D=\{y_1, ..., y_{q-l}\}$ and $c_{k-1}(f(x))$ be the coefficient of $x^{k-1}$ of $f(x)$. By using Dür's theorem on the relation between the covering radius and minimum distance of ${\rm GPRS}_q(D, k)$, we show that if $u(x)\in {\bf F}_q[x]$ with $\deg (u(x))=k$, then the received codeword $(u(D), c_{k-1}(u(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if and only if the sum $\sum\limits_{y\in I}y$ is nonzero for any subset $I\subseteq D$ with $\#(I)=k$. We show also that if $j$ is an integer with $1\leq j\leq l$ and $u_j(x):= \lambda_j(x-a_j)^{q-2}+\nu_j x^{k-1}+f_{\leq k-2}^{(j)}(x)$ with $\lambda_j\in {\bf F}_q^*$, $\nu_j\in {\bf F}_q$ and $f_{\leq{k-2}}^{(j)}(x)\in{\bf F}_q[x]$ being a polynomial of degree at most $k-2$, then $(u_j(D), c_{k-1}(u_j(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if and only if the sum $\binom{q-2}{k-1}(-a_j)^{q-1-k}\prod\limits_{y\in I}(a_j-y)+e$ is nonzero for any subset $I\subseteq D$ with $\#(I)=k$, where $e$ is the identity of the group ${\bf F}_q^*$. This implies that $(u_j(D), c_{k-1}(u_j(x)))$ is a deep hole of ${\rm GPRS}_q(D, k)$ if $p|k$.
- Recent several years have witnessed the surge of asynchronous (async-) parallel computing methods due to the extremely big data involved in many modern applications and also the advancement of multi-core machines and computer clusters. In optimization, most works about async-parallel methods are on unconstrained problems or those with block separable constraints. In this paper, we propose an async-parallel method based on block coordinate update (BCU) for solving convex problems with nonseparable linear constraint. Running on a single node, the method becomes a novel randomized primal-dual BCU with adaptive stepsize for multi-block affinely constrained problems. For these problems, Gauss-Seidel cyclic primal-dual BCU needs strong convexity to have convergence. On the contrary, merely assuming convexity, we show that the objective value sequence generated by the proposed algorithm converges in probability to the optimal value and also the constraint residual to zero. In addition, we establish an ergodic $O(1/k)$ convergence result, where $k$ is the number of iterations. Numerical experiments are performed to demonstrate the efficiency of the proposed method and significantly better speed-up performance than its sync-parallel counterpart.
- The minimum feedback arc set problem asks to delete a minimum number of arcs (directed edges) from a digraph (directed graph) to make it free of any directed cycles. In this work we approach this fundamental cycle-constrained optimization problem by considering a generalized task of dividing the digraph into D layers of equal size. We solve the D-segmentation problem by the replica-symmetric mean field theory and belief-propagation heuristic algorithms. The minimum feedback arc density of a given random digraph ensemble is then obtained by extrapolating the theoretical results to the limit of large D. A divide-and-conquer algorithm (nested-BPR) is devised to solve the minimum feedback arc set problem with very good performance and high efficiency.
- May 16 2017 cs.AR arXiv:1705.04981v1In this paper, we investigate the challenges to apply Statistical Static Timing Analysis (SSTA) in hierarchical design flow, where modules supplied by IP vendors are used to hide design details for IP protection and to reduce the complexity of design and verification. For the three basic circuit types, combinational, flip-flop-based and latch-controlled, we propose methods to extract timing models which contain interfacing as well as compressed internal constraints. Using these compact timing models the runtime of full-chip timing analysis can be reduced, while circuit details from IP vendors are not exposed. We also propose a method to reconstruct the correlation between modules during full-chip timing analysis. This correlation can not be incorporated into timing models because it depends on the layout of the corresponding modules in the chip. In addition, we investigate how to apply the extracted timing models with the reconstructed correlation to evaluate the performance of the complete design. Experiments demonstrate that using the extracted timing models and reconstructed correlation full-chip timing analysis can be several times faster than applying the flattened circuit directly, while the accuracy of statistical timing analysis is still well maintained.
- Background: Mining gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. In this work, we explore the plausibility of detecting gene modules by factorizing gene-phenotype associations from a phenotype ontology rather than the conventionally used gene expression data. In particular, the hierarchical structure of ontology has not been sufficiently utilized in clustering genes while functionally related genes are consistently associated with phenotypes on the same path in the phenotype ontology. Results: We propose a hierarchal Nonnegative Matrix Factorization (NMF)-based method, called Consistent Multiple Nonnegative Matrix Factorization (CMNMF), to factorize genome-phenome association matrix at two levels of the hierarchical structure in phenotype ontology for mining gene functional modules. CMNMF constrains the gene clusters from the association matrices at two consecutive levels to be consistent since the genes are annotated with both the child phenotype and the parent phenotype in the consecutive levels. CMNMF also restricts the identified phenotype clusters to be densely connected in the phenotype ontology hierarchy. In the experiments on mining functionally related genes from mouse phenotype ontology and human phenotype ontology, CMNMF effectively improved clustering performance over the baseline methods. Gene ontology enrichment analysis was also conducted to reveal interesting gene modules. Conclusions: Utilizing the information in the hierarchical structure of phenotype ontology, CMNMF can identify functional gene modules with more biological significance than the conventional methods. CMNMF could also be a better tool for predicting members of gene pathways and protein-protein interactions. Availability: https://github.com/nkiip/CMNMF
- May 02 2017 cs.CV arXiv:1705.00609v1In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.
- There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals.
- We collect and analyze the darkweb (a.k.a. the "onionweb") hyperlink graph. We find properties highly dissimilar to the well-studied world wide web hyperlink graph; for example, our analysis finds that >87% of darkweb sites never link to another site. We compare our results to prior work on world-wide-web and speculate about reasons for their differences. We conclude that in the term "darkweb", the word "web" is a connectivity misnomer. Instead, it is more accurate to view the darkweb as a set of largely isolated dark silos.
- Apr 19 2017 cs.SY arXiv:1704.05411v1Microgrids are resources to restore critical loads after a natural disaster, enhancing the resiliency of a distribution network. To deal with stochastic power generated by intermittent energy resources within microgrids, such as wind turbines (WTs) and photovoltaics (PVs), most existing methods require forecast information. However, some microgrids may not be equipped with power forecasting tools. To fill this gap, a risk-limiting strategy based on real-time measurements is proposed. The Gaussian mixture model (GMM) is used to represent a prior joint distribution of power outputs of WTs and PVs over multiple periods based on historical data. As time rolls forward, with latest measurement data, the distributions of WT/PV generation are updated in a recursive manner. The updated distributions are used as inputs for load restoration, enabling an equivalent transformation of the original probabilistic-constrained problem to a mixed integer linear programming (MILP) that is computationally tractable. Simulation cases on a distribution system with three microgrids demonstrate the effectiveness of the proposed method. Results also indicate that networked microgrids have better uncertainty management capabilities than stand-alone microgrids.
- For quantitative structure-property relationship (QSPR) studies in chemoinformatics, it is important to get interpretable relationship between chemical properties and chemical features. However, the predictive power and interpretability of QSPR models are usually two different objectives that are difficult to achieve simultaneously. A deep learning architecture using molecular graph encoding convolutional neural networks (MGE-CNN) provided a universal strategy to construct interpretable QSPR models with high predictive power. Instead of using application-specific preset molecular descriptors or fingerprints, the models can be resolved using raw and pertinent features without manual intervention or selection. In this study, we developed acute oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a case study. Three types of high-level predictive models: regression model (deepAOT-R), multi-classification model (deepAOT-C) and multi-task model (deepAOT-CR) for AOT evaluation were constructed. These models highly outperformed previously reported models. For the two external datasets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively.
- Apr 18 2017 cs.CV arXiv:1704.04613v2Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading system. Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification. In our framework, the attention mechanism is adopted to reveal the relevance between each recognized word and the given image, which further enhances the recognition performance. We have performed experiments on two datasets: Con-Text dataset and Drink Bottle dataset, that are proposed for fine-grained classification of business places and drink bottles, respectively. The experimental results consistently demonstrate that the proposed method combining textual and visual cues significantly outperforms classification with only visual representations. Moreover, we have shown that the learned representation improves the retrieval performance on the drink bottle images by a large margin, making it potentially useful in product search.
- Apr 18 2017 cs.DS arXiv:1704.04615v2Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text is widely accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g., suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text index, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are particularly slow, especially for the patterns with many occurrences. Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual quadtree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the quadtree. Compared with the existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient.
- Recently manifold learning algorithm for dimensionality reduction attracts more and more interests, and various linear and nonlinear, global and local algorithms are proposed. The key step of manifold learning algorithm is the neighboring region selection. However, so far for the references we know, few of which propose a generally accepted algorithm to well select the neighboring region. So in this paper, we propose an adaptive neighboring selection algorithm, which successfully applies the LLE and ISOMAP algorithms in the test. It is an algorithm that can find the optimal K nearest neighbors of the data points on the manifold. And the theoretical basis of the algorithm is the approximated curvature of the data point on the manifold. Based on Riemann Geometry, Jacob matrix is a proper mathematical concept to predict the approximated curvature. By verifying the proposed algorithm on embedding Swiss roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results show that the proposed adaptive neighboring selection algorithm is feasible and able to find the optimal value of K, making the residual variance relatively small and better visualization of the results. By quantitative analysis, the embedding quality measured by residual variance is increased 45.45% after using the proposed algorithm in LLE.
- Traditionally, Internet Access Providers (APs) only charge end-users for Internet access services; however, to recoup infrastructure costs and increase revenues, some APs have recently adopted two-sided pricing schemes under which both end-users and content providers are charged. Meanwhile, with the rapid growth of traffic, network congestion could seriously degrade user experiences and influence providers' utility. To optimize profit and social welfare, APs and regulators need to design appropriate pricing strategies and regulatory policies that take the effects of network congestion into consideration. In this paper, we model two-sided networks under which users' traffic demands are influenced by exogenous pricing and endogenous congestion parameters and derive the system congestion under an equilibrium. We characterize the structures and sensitivities of profit- and welfare-optimal two-sided pricing schemes and reveal that 1) the elasticity of system throughput plays a crucial role in determining the structures of optimal pricing, 2) the changes of optimal pricing under varying AP's capacity and users' congestion sensitivity are largely driven by the type of data traffic, e.g., text or video, and 3) APs and regulators will be incentivized to shift from one-sided to two-sided pricing when APs' capacities and user demand for video traffic grow. Our results can help APs design optimal two-sided pricing and guide regulators to legislate desirable policies.
- In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is $O(k\log(pq)+\log(k!))$, and the necessary number of samples is $\Omega(k\log (pq)-\log(k!))$. We further propose a greedy algorithm for regression, and evaluate its effectiveness through synthetic as well as real-world experiments.
- Mar 22 2017 cs.SD arXiv:1703.07172v1We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters. This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target. Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based post-processing in this work. A series of experiments show that joint LPS and MFCC learning improves the SE performance, and IBM-based post-processing further enhances listening quality of the reconstructed speech.
- Mar 20 2017 cs.SD arXiv:1703.06052v1Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (EER) reduced from 0.13 to 0.11, compared with the convolutional recurrent baseline system.
- Mar 09 2017 cs.DS arXiv:1703.02693v1This paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority Sample and Hold. PrSH combines aspects of the well-known Sample and Hold algorithm with Priority Sampling. In particular, it achieves a reduced computational cost for rate adaptation in a fixed cache by using a single persistent random variable across the lifetime of each key in the cache. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PrSH. We prove that PrSH provides unbiased estimates of the true aggregates. We analyze the computational complexity of PrSH and its variants, and provide a detailed evaluation of its accuracy on synthetic and trace data. Weighted relative error is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive Sample and Hold; there is also substantial improvement for rank queries.
- Mar 01 2017 cs.GT arXiv:1702.08794v1The recent online platforms propose multiple items for bidding. The state of the art, however, is limited to the analysis of one item auction without resubmission. In this paper we study multi-item lowest unique bid auctions (LUBA) with resubmission in discrete bid spaces under budget constraints. We show that the game does not have pure Bayes-Nash equilibria (except in very special cases). However, at least one mixed Bayes-Nash equilibria exists for arbitrary number of bidders and items. The equilibrium is explicitly computed for two-bidder setup with resubmission possibilities. In the general setting we propose a distributed strategic learning algorithm to approximate equilibria. Computer simulations indicate that the error quickly decays in few number of steps. When the number of bidders per item follows a Poisson distribution, it is shown that the seller can get a non-negligible revenue on several items, and hence making a partial revelation of the true value of the items. Finally, the attitude of the bidders towards the risk is considered. In contrast to risk-neutral agents who bids very small values, the cumulative distribution and the bidding support of risk-sensitive agents are more distributed.
- Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
- Feb 22 2017 cs.DS arXiv:1702.06256v1The \em maximum duo-preservation string mapping (\sc Max-Duo) problem is the complement of the well studied \em minimum common string partition (\sc MCSP) problem, both of which have applications in many fields including text compression and bioinformatics. $k$-\sc Max-Duo is the restricted version of \sc Max-Duo, where every letter of the alphabet occurs at most $k$ times in each of the strings, which is readily reduced into the well known \em maximum independent set (\sc MIS) problem on a graph of maximum degree $\Delta \le 6(k-1)$. In particular, $2$-\sc Max-Duo can then be approximated arbitrarily close to $1.8$ using the state-of-the-art approximation algorithm for the \sc MIS problem. In this paper, we present a vertex-degree reduction technique and, based on which, we show that $2$-\sc Max-Duo can be approximated arbitrarily close to $1.4$.
- Block Coordinate Update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal-dual BCU method for solving linearly constrained convex program in multi-block variables. The method is an accelerated version of a primal-dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an $O(1/t)$ convergence rate under weak convexity assumption. We show that the rate can be accelerated to $O(1/t^2)$ if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.
- We study the \em maximum duo-preservation string mapping (\sc Max-Duo) problem, which is the complement of the well studied \em minimum common string partition (\sc MCSP) problem. Both problems have applications in many fields including text compression and bioinformatics. Motivated by an earlier local search algorithm, we present an improved approximation and show that its performance ratio is no greater than ${35}/{12} < 2.917$. This beats the current best $3.25$-approximation for \sc Max-Duo. The performance analysis of our algorithm is done through a complex yet interesting amortization. Two lower bounds on the locality gap of our algorithm are also provided.
- Jan 31 2017 cs.AI arXiv:1701.08665v1Based on the in-depth analysis of the essence and features of vague phenomena, this paper focuses on establishing the axiomatical foundation of membership degree theory for vague phenomena, presents an axiomatic system to govern membership degrees and their interconnections. On this basis, the concept of vague partition is introduced, further, the concept of fuzzy set introduced by Zadeh in 1965 is redefined based on vague partition from the perspective of axiomatization. The thesis defended in this paper is that the relationship among vague attribute values should be the starting point to recognize and model vague phenomena from a quantitative view.
- Jan 30 2017 cs.CV arXiv:1701.08006v1Naturalness of warping is gaining extensive attention in image stitching. Recent warps such as SPHP, AANAP and GSP, use a global similarity to effectively mitigate projective distortion (which enlarges regions), however, they necessarily bring in perspective distortion (which generates inconsistency). In this paper, we propose a quasi-homography warp, which balances perspective distortion against projective distortion in the non-overlapping region, to create natural-looking mosaics. Our approach formulates the warp as a solution of a system of bivariate equations, where perspective distortion and projective distortion are characterized as slope preservation and scale linearization respectively. Our proposed warp only relies on a global homography thus is totally parameter-free. A comprehensive experiment shows that quasi-homography outperforms some state-of-the-art warps in urban scenes, including homography, AutoStitch and SPHP. A user study demonstrates that quasi-homography wins most users' favor as well, comparing to homography and SPHP.
- Jan 17 2017 cs.NI arXiv:1701.04076v1As Internet applications have become more diverse in recent years, users having heavy demand for online video services are more willing to pay higher prices for better services than light users that mainly use e-mails and instant messages. This encourages the Internet Service Providers (ISPs) to explore service differentiations so as to optimize their profits and allocation of network resources. Much prior work has focused on the viability of network service differentiation by comparing with the case of a single-class service. However, the optimal service differentiation for an ISP subject to resource constraints has remained unsolved. In this work, we establish an optimal control framework to derive the analytical solution to an ISP's optimal service differentiation, i.e. the optimal service qualities and associated prices. By analyzing the structures of the solution, we reveal how an ISP should adjust the service qualities and prices in order to meet varying capacity constraints and users' characteristics. We also obtain the conditions under which ISPs have strong incentives to implement service differentiation and whether regulators should encourage such practices.
- In this paper, we propose three novel models to enhance word embedding by implicitly using morphological information. Experiments on word similarity and syntactic analogy show that the implicit models are superior to traditional explicit ones. Our models outperform all state-of-the-art baselines and significantly improve the performance on both tasks. Moreover, our performance on the smallest corpus is similar to the performance of CBOW on the corpus which is five times the size of ours. Parameter analysis indicates that the implicit models can supplement semantic information during the word embedding training process.
- Jan 05 2017 cs.CV arXiv:1701.00794v1In this paper, we develop a new weakly-supervised learning algorithm to learn to segment cancerous regions in histopathology images. Our work is under a multiple instance learning framework (MIL) with a new formulation, deep weak supervision (DWS); we also propose an effective way to introduce constraints to our neural networks to assist the learning process. The contributions of our algorithm are threefold: (1) We build an end-to-end learning system that segments cancerous regions with fully convolutional networks (FCN) in which image-to-image weakly-supervised learning is performed. (2) We develop a deep week supervision formulation to exploit multi-scale learning under weak supervision within fully convolutional networks. (3) Constraints about positive instances are introduced in our approach to effectively explore additional weakly-supervised information that is easy to obtain and enjoys a significant boost to the learning process. The proposed algorithm, abbreviated as DWS-MIL, is easy to implement and can be trained efficiently. Our system demonstrates state-of-the-art results on large-scale histopathology image datasets and can be applied to various applications in medical imaging beyond histopathology images such as MRI, CT, and ultrasound images.
- Integrated Computational Materials Engineering (ICME) aims to accelerate optimal design of complex material systems by integrating material science and design automation. For tractable ICME, it is required that (1) a structural feature space be identified to allow reconstruction of new designs, and (2) the reconstruction process be property-preserving. The majority of existing structural presentation schemes rely on the designer's understanding of specific material systems to identify geometric and statistical features, which could be biased and insufficient for reconstructing physically meaningful microstructures of complex material systems. In this paper, we develop a feature learning mechanism based on convolutional deep belief network to automate a two-way conversion between microstructures and their lower-dimensional feature representations, and to achieves a 1000-fold dimension reduction from the microstructure space. The proposed model is applied to a wide spectrum of heterogeneous material systems with distinct microstructural features including Ti-6Al-4V alloy, Pb63-Sn37 alloy, Fontainebleau sandstone, and Spherical colloids, to produce material reconstructions that are close to the original samples with respect to 2-point correlation functions and mean critical fracture strength. This capability is not achieved by existing synthesis methods that rely on the Markovian assumption of material microstructures.
- Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for arbitrarily large delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.
- Dec 07 2016 cs.LG arXiv:1612.01663v1In this paper, we address learning problems for high dimensional data. Previously, oblivious random projection based approaches that project high dimensional features onto a random subspace have been used in practice for tackling high-dimensionality challenge in machine learning. Recently, various non-oblivious randomized reduction methods have been developed and deployed for solving many numerical problems such as matrix product approximation, low-rank matrix approximation, etc. However, they are less explored for the machine learning tasks, e.g., classification. More seriously, the theoretical analysis of excess risk bounds for risk minimization, an important measure of generalization performance, has not been established for non-oblivious randomized reduction methods. It therefore remains an open problem what is the benefit of using them over previous oblivious random projection based approaches. To tackle these challenges, we propose an algorithmic framework for employing non-oblivious randomized reduction method for general empirical risk minimizing in machine learning tasks, where the original high-dimensional features are projected onto a random subspace that is derived from the data with a small matrix approximation error. We then derive the first excess risk bound for the proposed non-oblivious randomized reduction approach without requiring strong assumptions on the training data. The established excess risk bound exhibits that the proposed approach provides much better generalization performance and it also sheds more insights about different randomized reduction approaches. Finally, we conduct extensive experiments on both synthetic and real-world benchmark datasets, whose dimension scales to $O(10^7)$, to demonstrate the efficacy of our proposed approach.
- Dec 02 2016 cs.AI arXiv:1612.00094v1In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.
- Nov 29 2016 cs.CV arXiv:1611.08983v9Sparse coding has achieved a great success in various image processing studies. However, there is not any benchmark to measure the sparsity of image patch/group because sparse discriminant conditions cannot keep unchanged. This paper analyzes the sparsity of group based on the strategy of the rank minimization. Firstly, an adaptive dictionary for each group is designed. Then, we prove that group-based sparse coding is equivalent to the rank minimization problem, and thus the sparse coefficient of each group is measured by estimating the singular values of each group. Based on that measurement, the weighted Schatten $p$-norm minimization (WSNM) has been found to be the closest solution to the real singular values of each group. Thus, WSNM can be equivalently transformed into a non-convex $\ell_p$-norm minimization problem in group-based sparse coding. To make the proposed scheme tractable and robust, the alternating direction method of multipliers (ADMM) is used to solve the $\ell_p$-norm minimization problem. Experimental results on two applications: image inpainting and image compressive sensing (CS) recovery have shown that the proposed scheme outperforms many state-of-the-art methods.
- Nov 23 2016 cs.CV arXiv:1611.07143v1This paper proposes a multi-level feature learning framework for human action recognition using body-worn inertial sensors. The framework consists of three phases, respectively designed to analyze signal-based (low-level), components (mid-level) and semantic (high-level) information. Low-level features, extracted from raw signals, capture the time and frequency domain property while mid-level representations, obtained through the dictionary learning method, learn the composition of the action. The Max-margin Latent Pattern Learning (MLPL) method is proposed and implemented on the concatenation of low- and mid-level features to learn high-level semantic descriptions of latent action patterns as the output of our framework. Various experiments on Opp, Skoda and WISDM datasets show that the semantic feature learned by this framework possesses higher representation ability than low- and mid-level features. Compared with existing methods, the proposed method achieves state-of-the-art performances.
- Nov 22 2016 cs.CV arXiv:1611.06661v2We propose a new image instance segmentation method that segments individ- ual glands (instances) in colon histology images. This process is challenging since the glands not only need to be segmented from a complex background, they must also be individually identified. We leverage the idea of image-to-image prediction in recent deep learning by designing an algorithm that automatically exploits and fuses complex multichannel information - regional, location and boundary cues - in gland histology images. Our proposed algorithm, a deep multichannel framework, alleviates heavy feature design due to the use of con- volutional neural networks and is able to meet multifarious requirements by altering channels. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge and other currently prevalent instance segmentation methods, we observe state-of-the-art results based on the evaluation metrics. Keywords: Instance segmentation, convolutional neural networks, segmentation, multichannel, histology image.
- Nov 21 2016 cs.CV arXiv:1611.06159v1In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos in East Asian languages. Our end-to-end system consists of multiple stages. Subtitles are firstly detected by a novel image operator based on the sequence information of consecutive video frames. Then, an ensemble of Convolutional Neural Networks (CNNs) trained on synthetic data is adopted for detecting and recognizing East Asian characters. Finally, a dynamic programming approach leveraging language models is applied to constitute results of the entire body of text lines. The proposed system achieves average end-to-end accuracies of 98.2% and 98.3% on 40 videos in Simplified Chinese and 40 videos in Traditional Chinese respectively, which is a significant outperformance of other existing methods. The near-perfect accuracy of our system dramatically narrows the gap between human cognitive ability and state-of-the-art algorithms used for such a task.
- Oct 26 2016 cs.SY arXiv:1610.07687v1Air conditioning systems are responsible for the major percentage of energy consumption in buildings. Shared spaces constitute considerable office space area, in which most office employees perform their meetings and daily tasks, and therefore the ACs in these areas have significant impact on the energy usage of the entire office building. The cost of this energy consumption, however, is not paid by the shared space users, and the AC's temperature set-point is not determined based on the users' preferences. This latter factor is compounded by the fact that different people may have different choices of temperature set-points and sensitivities to change of temperature. Therefore, it is a challenging task to design an office policy to decide on a particular set-point based on such a diverse preference set. As a result, users are not aware of the energy consumption in shared spaces, which may potentially increase the energy wastage and related cost of office buildings. In this context, this paper proposes an energy policy for an office shared space by exploiting an established temperature control mechanism. In particular, we choose meeting rooms in an office building as the test case and design a policy according to which each user of the room can give a preference on the temperature set-point and is paid for felt discomfort if the set-point is not fixed according to the given preference. On the other hand, users who enjoy the thermal comfort compensate the other users of the room. Thus, the policy enables the users to be cognizant and responsible for the payment on the energy consumption of the office space they are sharing, and at the same time ensures that the users are satisfied either via thermal comfort or through incentives. The policy is also shown to be beneficial for building management. Through experiment based case studies, we show the effectiveness of the proposed policy.
- Neural networks are among the state-of-the-art techniques for language modeling. Existing neural language models typically map discrete words to distributed, dense vector representations. After information processing of the preceding context words by hidden layers, an output layer estimates the probability of the next word. Such approaches are time- and memory-intensive because of the large numbers of parameters for word embeddings and the output layer. In this paper, we propose to compress neural language models by sparse word representations. In the experiments, the number of parameters in our model increases very slowly with the growth of the vocabulary size, which is almost imperceptible. Moreover, our approach not only reduces the parameter space to a large extent, but also improves the performance in terms of the perplexity measure.
- Oct 07 2016 cs.SD arXiv:1610.01797v1Audio tagging aims to assign one or several tags to an audio clip. Most of the datasets are weakly labelled, which means only the tags of the clip are known, without knowing the occurrence time of the tags. The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user. Previous works have used the bag of frames model assume the tags occur all the time, which is not the case in practice. We propose a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously. The JDC model has the ability to attend to informative and ignore uninformative sounds. Then only informative regions are used for classification. Experimental results on the "CHiME Home" dataset show that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%. More interestingly, the audio event detector is trained successfully without needing the event level label.
- Neuromorphic chip refers to an unconventional computing architecture that is modelled on biological brains. It is ideally suited for processing sensory data for intelligence computing, decision-making or context cognition. Despite rapid development, conventional artificial synapses exhibit poor connection flexibility and require separate data acquisition circuitry, resulting in limited functionalities and significant hardware redundancy. Here we report a novel light-stimulated artificial synapse based on a graphene-nanotube hybrid phototransistor that can directly convert optical stimuli into a "neural image" for further neuronal analysis. Our optically-driven synapses involve multiple steps of plasticity mechanisms and importantly exhibit flexible tuning of both short- and long-term plasticity. Furthermore, our neuromorphic phototransistor can take multiple pre-synaptic light stimuli via wavelength-division multiplexing and allows advanced optical processing through charge-trap-mediated optical coupling. The capability of complex neuromorphic functionalities in a simple silicon-compatible device paves the way for novel neuromorphic computing architectures involving photonics.
- Understanding the real achievable performance of mobile ad hoc networks (MANETs) under practical network constraints is of great importance for their applications in future highly heterogeneous wireless network environments. This paper explores, for the first time, the performance modeling for MANETs under a general limited buffer constraint, where each network node maintains a limited source buffer of size $B_s$ to store its locally generated packets and also a limited shared relay buffer of size $B_r$ to store relay packets for other nodes. Based on the Queuing theory and birth-death chain theory, we first develop a general theoretical framework to fully depict the source/relay buffer occupancy process in such a MANET, which applies to any distributed MAC protocol and any mobility model that leads to the uniform distribution of nodes' locations in steady state. With the help of this framework, we then derive the exact expressions of several key network performance metrics, including achievable throughput, throughput capacity, and expected end-to-end delay. We further conduct case studies under two network scenarios and provide the corresponding theoretical/simulation results to demonstrate the application as well as the efficiency of our theoretical framework. Finally, we present extensive numerical results to illustrate the impacts of buffer constraint on the performance of a buffer-limited MANET.
- The application of physical layer security in ad hoc networks has attracted considerable academic attention recently. However, the available studies mainly focus on the single-hop and two-hop network scenarios, and the price in terms of degradation of communication quality of service (QoS) caused by improving security is largely uninvestigated. As a step to address these issues, this paper explores the physical layer security-aware routing and performance tradeoffs in a multi-hop ad hoc network. Specifically, for any given end-to-end path we first derive its connection outage probability (COP) and secrecy outage probability (SOP) in closed-form, which serve as the performance metrics of communication QoS and transmission security, respectively. Based on the closed-form expressions, we then study the security-QoS tradeoffs to minimize COP (resp. SOP) conditioned on that SOP (resp. COP) is guaranteed. With the help of analysis of a given path, we further propose the routing algorithms which can achieve the optimal performance tradeoffs for any pair of source and destination nodes in a distributed manner. Finally, simulation and numerical results are presented to validate the efficiency of our theoretical analysis, as well as to illustrate the security-QoS tradeoffs and the routing performance.
- This paper investigates the compress-and-forward scheme for an uplink cloud radio access network (C-RAN) model, where multi-antenna base-stations (BSs) are connected to a cloud-computing based central processor (CP) via capacity-limited fronthaul links. The BSs compress the received signals with Wyner-Ziv coding and send the representation bits to the CP; the CP performs the decoding of all the users' messages. Under this setup, this paper makes progress toward the optimal structure of the fronthaul compression and CP decoding strategies for the compress-and-forward scheme in C-RAN. On the CP decoding strategy design, this paper shows that under a sum fronthaul capacity constraint, a generalized successive decoding strategy of the quantization and user message codewords that allows arbitrary interleaved order at the CP achieves the same rate region as the optimal joint decoding. Further, it is shown that a practical strategy of successively decoding the quantization codewords first, then the user messages, achieves the same maximum sum rate as joint decoding under individual fronthaul constraints. On the joint optimization of user transmission and BS quantization strategies, this paper shows that if the input distributions are assumed to be Gaussian, then under joint decoding, the optimal quantization scheme for maximizing the achievable rate region is Gaussian. Moreover, Gaussian input and Gaussian quantization with joint decoding achieve to within a constant gap of the capacity region of the Gaussian multiple-input multiple-output (MIMO) uplink C-RAN model. Finally, this paper addresses the computational aspect of optimizing uplink MIMO C-RAN by showing that under fixed Gaussian input, the sum rate maximization problem over the Gaussian quantization noise covariance matrices can be formulated as convex optimization problems, thereby facilitating its efficient solution.
- We study the problem of estimating the continuous response over time to interventions using observational time series---a retrospective dataset where the policy by which the data are generated is unknown to the learner. We are motivated by applications where response varies by individuals and therefore, estimating responses at the individual-level is valuable for personalizing decision-making. We refer to this as the problem of estimating individualized treatment response (ITR) curves. In statistics, G-computation formula (Robins, 1986) has been commonly used for estimating treatment responses from observational data containing sequential treatment assignments. However, past studies have focused predominantly on obtaining point-in-time estimates at the population level. We leverage the G-computation formula and develop a novel Bayesian nonparametric (BNP) method that can flexibly model functional data and provide posterior inference over the treatment response curves at both the individual and population level. On a challenging dataset containing time series from patients admitted to a hospital, we estimate responses to treatments used in managing kidney function and show that the resulting fits are more accurate than alternative approaches. Accurate methods for obtaining ITRs from observational data can dramatically accelerate the pace at which personalized treatment plans become possible.
- Aug 12 2016 cs.NI arXiv:1608.03380v1Millimeter wave (mmWave) systems are emerging as an essential technology to enable extremely high data rate wireless communications. The main limiting factors of mmWave systems are blockage (high penetration loss) and deafness (misalignment between the beams of the transmitter and receiver). To alleviate these problems, it is imperative to incorporate efficient association and relaying between terminals and access points. Unfortunately, the existing association techniques are designed for the traditional interference-limited networks, and thus are highly suboptimal for mmWave communications due to narrow-beam operations and the resulting non-negligible interference-free behavior. This paper introduces a distributed approach that solves the joint association and relaying problem in mmWave networks considering the load balancing at access points. The problem is posed as a novel stochastic optimization problem, which is solved by distributed auction algorithms where the clients and relays act asynchronously to achieve optimal client-relay-access point association. It is shown that the algorithms provably converge to a solution that maximizes the aggregate logarithmic utility within a desired bound. Numerical results allow to quantify the performance enhancements introduced by the relays, and the substantial improvements of the network throughput and fairness among the clients by the proposed association method as compared to standard approaches. It is concluded that mmWave communications with proper association and relaying mechanisms can support extremely high data rates, connection reliability, and fairness among the clients.
- Aug 05 2016 cs.CV arXiv:1608.01536v1Saliency integration approaches have aroused general concern on unifying saliency maps from multiple saliency models. In fact, saliency integration is a weighted aggregation of multiple saliency maps, such that measuring the weights of saliency models is essential. In this paper, we propose an unsupervised model for saliency integration, namely the arbitrator model (AM), based on the Bayes' probability theory. The proposed AM incorporates saliency models of varying expertise and a prior map based on the consistency of the evidence from multiple saliency models and a reference saliency map from generally accepted knowledge. Also, we suggest two methods to learn the expertise of the saliency models without ground truth. The experimental results are from various combinations of twenty-two state-of-the-art saliency models on five datasets. The evaluation results show that the AM model improves the performance substantially compared to the existing state-of-the-art approaches, regardless of the chosen candidate saliency models.
- A single-letter lower bound on the sum rate of multiple description coding with tree-structured distortion constraints is established by generalizing Ozarow's celebrated converse argument through the introduction of auxiliary random variables that form a Markov tree. For the quadratic vector Gaussian case, this lower bound is shown to be achievable by an extended version of the El Gamal-Cover scheme, yielding a complete sum-rate characterization.
- Jul 28 2016 cs.SI physics.soc-ph arXiv:1607.08203v1Information technologies today can inform each of us about the best alternatives for shortest paths from origins to destinations, but they do not contain incentives or alternatives that manage the information efficiently to get collective benefits. To obtain such benefits, we need to have not only good estimates of how the traffic is formed but also to have target strategies to reduce enough vehicles from the best possible roads in a feasible way. The opportunity is that during large events the traffic inconveniences in large cities are unusually high, yet temporary, and the entire population may be more willing to adopt collective recommendations for social good. In this paper, we integrate for the first time big data resources to quantify the impact of events and propose target strategies for collective good at urban scale. In the context of the Olympic Games in Rio de Janeiro, we first predict the expected increase in traffic. To that end, we integrate data from: mobile phones, Airbnb, Waze, and transit information, with game schedules and information of venues. Next, we evaluate the impact of the Olympic Games to the travel of commuters, and propose different route choice scenarios during the peak hours. Moreover, we gather information on the trips that contribute the most to the global congestion and that could be redirected from vehicles to transit. Interestingly, we show that (i) following new route alternatives during the event with individual shortest path can save more collective travel time than keeping the routine routes, uncovering the positive value of information technologies during events; (ii) with only a small proportion of people selected from specific areas switching from driving to public transport, the collective travel time can be reduced to a great extent. Results are presented on-line for the evaluation of the public and policy makers.
- Jul 19 2016 cs.CV arXiv:1607.04889v2In this paper, we propose a new image instance segmentation method that segments individual glands (instances) in colon histology images. This is a task called instance segmentation that has recently become increasingly important. The problem is challenging since not only do the glands need to be segmented from the complex background, they are also required to be individually identified. Here we leverage the idea of image-to-image prediction in recent deep learning by building a framework that automatically exploits and fuses complex multichannel information, regional, location and boundary patterns in gland histology images. Our proposed system, deep multichannel framework, alleviates heavy feature design due to the use of convolutional neural networks and is able to meet multifarious requirement by altering channels. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge and other currently prevalent methods of instance segmentation, we observe state-of-the-art results based on a number of evaluation metrics.
- Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classification task. For the acoustic modeling, a large set of contextual frames of the chunk are fed into the DNN to perform a multi-label classification for the expected tags, considering that only chunk (or utterance) level rather than frame-level labels are available. Dropout and background noise aware training are also adopted to improve the generalization capability of the DNNs. For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features. The new features, which are smoothed against background noise and more compact with contextual information, can further improve the performance of the DNN baseline. Compared with the standard Gaussian Mixture Model (GMM) baseline of the DCASE 2016 audio tagging challenge, our proposed method obtains a significant equal error rate (EER) reduction from 0.21 to 0.13 on the development set. The proposed aDAE system can get a relative 6.7% EER reduction compared with the strong DNN baseline on the development set. Finally, the results also show that our approach obtains the state-of-the-art performance with 0.15 EER on the evaluation set of the DCASE 2016 audio tagging task while EER of the first prize of this challenge is 0.17.
- In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework. Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds. Firstly, the parameters of the DNN are initialized by the proposed hierarchical pre-training. Multi-level objective function is then adopted to add more constraint on the cross-entropy based loss function. A series of experiments were conducted on the Task1 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. The final DNN-based system achieved a 22.9% relative improvement on average scene classification error as compared with the Gaussian Mixture Model (GMM)-based benchmark system across four standard folds.
- Jul 13 2016 cs.CV arXiv:1607.03222v2In this paper, we propose a new image instance segmentation method that segments individual glands (instances) in colon histology images. This is a task called instance segmentation that has recently become increasingly important. The problem is challenging since not only do the glands need to be segmented from the complex background, they are also required to be individually identified. Here we leverage the idea of image-to-image prediction in recent deep learning by building a framework that automatically exploits and fuses complex multichannel information, regional and boundary patterns, with side supervision (deep supervision on side responses) in gland histology images. Our proposed system, deep multichannel side supervision (DMCS), alleviates heavy feature design due to the use of convolutional neural networks guided by side supervision. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge, we observe state-of-the-art results based on a number of evaluation metrics.
- In this paper, we firstly propose a novel construction of $16$-quadrature amplitude modulation (QAM) near-complementary sequences with low peak-to-mean envelope power ratio (PMEPR) in orthogonal frequency division multiplexing (OFDM) systems. The proposed $16$-QAM near-complementary sequences can be constructed by utilizing novel nonlinear offsets, where the length of the sequences is $n=2^m$. The family size of the newly constructed $16$-QAM near-complementary sequences is $8\times (\frac{m!}{2})\times 4^{m+1}$, and the PMEPR of these sequences is proven to satisfy ${\textrm{PMEPR}}\leq 2.4$. Thus, the proposed construction can generate a number of $16$-QAM near-complementary sequences with low PMEPR, resulting in the improvement of the code rate in OFDM systems. Furthermore, we also propose a novel construction of $64$-QAM near-complementary sequences with low PMEPR, which is the first proven construction of $64$-QAM near-complementary sequences. The PMEPRs of two types of the proposed $64$-QAM near-complementary sequences are proven to satisfy that ${\textrm{PMEPR}}\leq 3.62$ or ${\textrm{PMEPR}}\leq 2.48$, respectively. The family size of the newly constructed $64$-QAM near-complementary sequences is $64\times (\frac{m!}{2})\times 4^{m+1}$.
- Jul 08 2016 cs.NI arXiv:1607.02045v1Wireless Mesh Networks (WMNs) technology has been used in recent years for broadband access in both cities and rural areas. A key development is to equip routers with multiple directional antennas so that these routers can transmit to, or receive from multiple neighbors simultaneously. The Multi-Transmit-Receive (MTR) feature can boost network capacity significantly if suitable scheduling policy is applied. In this paper, we propose a distributed link scheduler called PCP-TDMA that fully utilizes the MTR capability. In particular, it activates every link at least once within the shortest period of time. We evaluated the performance of PCP-TDMA in various network topologies, and compared it against a centralized algorithm called ALGO-2, and two distributed approaches: JazzyMAC and ROMA. The results show that PCP-TDMA achieves similar performance with the centralized algorithm in all scenarios, and outperforms the distributed approaches significantly. Specifically, in a fully connected network, the resulting superframe length of PCP-TDMA is less than 1/3 and 1/2 of JazzyMAC and ROMA, respectively.
- In this paper, we propose two \bf accelerated stochastic subgradient methods for stochastic non-strongly convex optimization problems by leveraging a generic local error bound condition. The novelty of the proposed methods lies at smartly leveraging the recent historical solution to tackle the variance in the stochastic subgradient. The key idea of both methods is to iteratively solve the original problem approximately in a local region around a recent historical solution with size of the local region gradually decreasing as the solution approaches the optimal set. The difference of the two methods lies at how to construct the local region. The first method uses an explicit ball constraint and the second method uses an implicit regularization approach. For both methods, we establish the improved iteration complexity in a high probability for achieving an $\epsilon$-optimal solution. Besides the improved order of iteration complexity with a high probability, the proposed algorithms also enjoy a logarithmic dependence on the distance of the initial solution to the optimal set. We also consider applications in machine learning and demonstrate that the proposed algorithms enjoy faster convergence than the traditional stochastic subgradient method. For example, when applied to the $\ell_1$ regularized polyhedral loss minimization (e.g., hinge loss, absolute loss), the proposed stochastic methods have a logarithmic iteration complexity.
- Jul 05 2016 cs.PF arXiv:1607.00714v2In recent years, there is an increasing demand of big memory systems so to perform large scale data analytics. Since DRAM memories are expensive, some researchers are suggesting to use other memory systems such as non-volatile memory (NVM) technology to build large-memory computing systems. However, whether the NVM technology can be a viable alternative (either economically and technically) to DRAM remains an open question. To answer this question, it is important to consider how to design a memory system from a "system perspective", that is, incorporating different performance characteristics and price ratios from hybrid memory devices. This paper presents an analytical model of a "hybrid page cache system" so to understand the diverse design space and performance impact of a hybrid cache system. We consider (1) various architectural choices, (2) design strategies, and (3) configuration of different memory devices. Using this model, we provide guidelines on how to design hybrid page cache to reach a good trade-off between high system throughput (in I/O per sec or IOPS) and fast cache reactivity which is defined by the time to fill the cache. We also show how one can configure the DRAM capacity and NVM capacity under a fixed budget. We pick PCM as an example for NVM and conduct numerical analysis. Our analysis indicates that incorporating PCM in a page cache system significantly improves the system performance, and it also shows larger benefit to allocate more PCM in page cache in some cases. Besides, for the common setting of performance-price ratio of PCM, "flat architecture" offers as a better choice, but "layered architecture" outperforms if PCM write performance can be significantly improved in the future.
- Motivated by big data applications, first-order methods have been extremely popular in recent years. However, naive gradient methods generally converge slowly. Hence, much efforts have been made to accelerate various first-order methods. This paper proposes two accelerated methods towards solving structured linearly constrained convex programming, for which we assume composite convex objective. The first method is the accelerated linearized augmented Lagrangian method (LALM). At each update to the primal variable, it allows linearization to the differentiable function and also the augmented term, and thus it enables easy subproblems. Assuming merely weak convexity, we show that LALM owns $O(1/t)$ convergence if parameters are kept fixed during all the iterations and can be accelerated to $O(1/t^2)$ if the parameters are adapted, where $t$ is the number of total iterations. The second method is the accelerated linearized alternating direction method of multipliers (LADMM). In addition to the composite convexity, it further assumes two-block structure on the objective. Different from classic ADMM, our method allows linearization to the objective and also augmented term to make the update simple. Assuming strong convexity on one block variable, we show that LADMM also enjoys $O(1/t^2)$ convergence with adaptive parameters. This result is a significant improvement over that in [Goldstein et. al, SIIMS'14], which requires strong convexity on both block variables and no linearization to the objective or augmented term. Numerical experiments are performed on quadratic programming, image denoising, and support vector machine. The proposed accelerated methods are compared to nonaccelerated ones and also existing accelerated methods. The results demonstrate the validness of acceleration and superior performance of the proposed methods over existing ones.
- Acoustic event detection for content analysis in most cases relies on lots of labeled data. However, manually annotating data is a time-consuming task, which thus makes few annotated resources available so far. Unlike audio event detection, automatic audio tagging, a multi-label acoustic event classification task, only relies on weakly labeled data. This is highly desirable to some practical applications using audio analysis. In this paper we propose to use a fully deep neural network (DNN) framework to handle the multi-label classification task in a regression way. Considering that only chunk-level rather than frame-level labels are available, the whole or almost whole frames of the chunk were fed into the DNN to perform a multi-label regression for the expected tags. The fully DNN, which is regarded as an encoding function, can well map the audio features sequence to a multi-tag vector. A deep pyramid structure was also designed to extract more robust high-level features related to the target tags. Further improved methods were adopted, such as the Dropout and background noise aware training, to enhance its generalization capability for new audio recordings in mismatched environments. Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input. The results show that our approach obtained a 15% relative improvement compared with the official GMM-based method of DCASE 2016 challenge.
- Jun 16 2016 cs.CV arXiv:1606.04616v1Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text. In this paper, we propose a novel method for robust scene character recognition. Specifically, we first use robust principal component analysis (PCA) to denoise character image by recovering the missing low-rank component and filtering out the sparse noise term, and then use a simple Histogram of oriented Gradient (HOG) to perform image feature extraction, and finally, use a sparse representation based classifier for recognition. In experiments on four public datasets, namely the Char74K dataset, ICADAR 2003 robust reading dataset, Street View Text (SVT) dataset and IIIT5K-word dataset, our method was demonstrated to be competitive with the state-of-the-art methods.
- Jun 15 2016 cs.DS arXiv:1606.04157v1We consider the single machine scheduling problem with job-dependent machine deterioration. In the problem, we are given a single machine with an initial non-negative maintenance level, and a set of jobs each with a non-preemptive processing time and a machine deterioration. Such a machine deterioration quantifies the decrement in the machine maintenance level after processing the job. To avoid machine breakdown, one should guarantee a non-negative maintenance level at any time point; and whenever necessary, a maintenance activity must be allocated for restoring the machine maintenance level. The goal of the problem is to schedule the jobs and the maintenance activities such that the total completion time of jobs is minimized. There are two variants of maintenance activities: in the partial maintenance case each activity can be allocated to increase the machine maintenance level to any level not exceeding the maximum; in the full maintenance case every activity must be allocated to increase the machine maintenance level to the maximum. In a recent work, the problem in the full maintenance case has been proven NP-hard; several special cases of the problem in the partial maintenance case were shown solvable in polynomial time, but the complexity of the general problem is left open. In this paper we first prove that the problem in the partial maintenance case is NP-hard, thus settling the open problem; we then design a $2$-approximation algorithm.
- Jun 13 2016 cs.DS arXiv:1606.03185v2We investigate the maximum happy vertices (MHV) problem and its complement, the minimum unhappy vertices (MUHV) problem. We first show that the MHV and MUHV problems are a special case of the supermodular and submodular multi-labeling (Sup-ML and Sub-ML) problems, respectively, by re-writing the objective functions as set functions. The convex relaxation on the Lovász extension, originally presented for the submodular multi-partitioning (Sub-MP) problem, can be extended for the Sub-ML problem, thereby proving that the Sub-ML (Sup-ML, respectively) can be approximated within a factor of $2 - \frac{2}{k}$ ($\frac{2}{k}$, respectively). These general results imply that the MHV and the MUHV problems can also be approximated within $\frac{2}{k}$ and $2 - \frac{2}{k}$, respectively, using the same approximation algorithms. For MHV, this $\frac{2}{k}$-approximation algorithm improves the previous best approximation ratio $\max \{\frac{1}{k}, \frac{1}{\Delta + 1}\}$, where $\Delta$ is the maximum vertex degree of the input graph. We also show that an existing LP relaxation is the same as the concave relaxation on the Lovász extension for the Sup-ML problem; we then prove an upper bound of $\frac{2}{k}$ on the integrality gap of the LP relaxation. These suggest that the $\frac{2}{k}$-approximation algorithm is the best possible based on the LP relaxation. For MUHV, we formulate a novel LP relaxation and prove that it is the same as the convex relaxation on the Lovász extension for the Sub-ML problem; we then show a lower bound of $2 - \frac{2}{k}$ on the integrality gap of the LP relaxation. Similarly, these suggest that the $(2 - \frac{2}{k})$-approximation algorithm is the best possible based on the LP relaxation. Lastly, we prove that this $(2 - \frac{2}{k})$-approximation is optimal for the MUHV problem, assuming the Unique Games Conjecture.
- The one-bit compressed sensing framework aims to reconstruct a sparse signal by only using the sign information of its linear measurements. To compensate for the loss of scale information, past studies in the area have proposed recovering the signal by imposing an additional constraint on the L2-norm of the signal. Recently, an alternative strategy that captures scale information by introducing a threshold parameter to the quantization process was advanced. In this paper, we analyze the typical behavior of the thresholding 1-bit compressed sensing utilizing the replica method of statistical mechanics, so as to gain an insight for properly setting the threshold value. Our result shows that, fixing the threshold at a constant value yields better performance than varying it randomly when the constant is optimally tuned, statistically. Unfortunately, the optimal threshold value depends on the statistical properties of the target signal, which may not be known in advance. In order to handle this inconvenience, we develop a heuristic that adaptively tunes the threshold parameter based on the frequency of positive (or negative) values in the binary outputs. Numerical experiments show that the heuristic exhibits satisfactory performance while incurring low computational cost.
- Apr 27 2016 cs.NI arXiv:1604.07572v1The explosion of mobile video traffic imposes tremendous challenges on present cellular networks. To alleviate the pressure on backhaul links and to enhance the quality of experience (QoE) of video streaming service, small cell base stations (SBS) with caching ability are introduced to assist the content delivery. In this paper, we present the first study on the optimal caching strategy of scalable video coding (SVC) streaming in small cell networks with the consideration of channel diversity and video scalability. We formulate an integer programming problem to maximize the average SVC QoE under the constraint of cache size at each SBS. By establishing connections between QoE and caching state of each video, we simplify the proactive caching of SVC as a multiple-choice knapsack problem (MCKP), and propose a low-complexity algorithm using dynamic programming. Our proactive caching strategy reveals the structural properties of cache allocation to each video based on their popularity profiles. Simulation results manifest that the SBSs with caching ability can greatly improve the average QoE of SVC streaming service, and that our proposed caching strategy acquires significant performance gain compared with other conventional caching policies.
- Apr 07 2016 cs.NI arXiv:1604.01675v1Unraveling quality of experience (QoE) of video streaming is very challenging in bandwidth shared wireless networks. It is unclear how QoE metrics such as starvation probability and buffering time interact with dynamics of streaming traffic load. In this paper, we collect view records from one of the largest streaming providers in China over two weeks and perform an in-depth measurement study on flow arrival and viewing time that shed light on the real traffic pattern. Our most important observation is that the viewing time of streaming users fits a hyper-exponential distribution quite well. This implies that all the views can be categorized into two classes, short and long views with separated time scales. We then map the measured traffic pattern to bandwidth shared cellular networks and propose an analytical framework to compute the closed-form starvation probability on the basis of ordinary differential equations (ODEs). Our framework can be naturally extended to investigate practical issues including the progressive downloading and the finite video duration. Extensive trace-driven simulations validate the accuracy of our models. Our study reveals that the starvation metrics of the short and long views possess different sensitivities to the scheduling priority at base station. Hence, a better QoE tradeoff between the short and long views has a potential to be leveraged by offering them different scheduling weights. The flow differentiation involves tremendous technical and non-technical challenges because video content is owned by content providers but not the network operators and the viewing time of each session is unknown beforehand. To overcome these difficulties, we propose an online Bayesian approach to infer the viewing time of each incoming flow with the "least" information from content providers.
- Transfer learning is aimed to make use of valuable knowledge in a source domain to help model performance in a target domain. It is particularly important to neural networks, which are very likely to be overfitting. In some fields like image processing, many studies have shown the effectiveness of neural network-based transfer learning. For neural NLP, however, existing studies have only casually applied transfer learning, and conclusions are inconsistent. In this paper, we conduct systematic case studies and provide an illuminating picture on the transferability of neural networks in NLP.
- Mar 16 2016 cs.CV arXiv:1603.04838v2Hierarchies, such as the tree of shapes, are popular representations for image simplification and segmentation thanks to their multiscale structures. Selecting meaningful level lines (boundaries of shapes) yields to simplify image while preserving intact salient structures. Many image simplification and segmentation methods are driven by the optimization of an energy functional, for instance the celebrated Mumford-Shah functional. In this paper, we propose an efficient approach to hierarchical image simplification and segmentation based on the minimization of the piecewise-constant Mumford-Shah functional. This method conforms to the current trend that consists in producing hierarchical results rather than a unique partition. Contrary to classical approaches which compute optimal hierarchical segmentations from an input hierarchy of segmentations, we rely on the tree of shapes, a unique and well-defined representation equivalent to the image. Simply put, we compute for each level line of the image an attribute function that characterizes its persistence under the energy minimization. Then we stack the level lines from meaningless ones to salient ones through a saliency map based on extinction values defined on the tree-based shape space. Qualitative illustrations and quantitative evaluation on Weizmann segmentation evaluation database demonstrate the state-of-the-art performance of our method.