results for au:Li_J in:cs

- In this paper, we unify causal and non-causal feature feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and non-causal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and non-causal feature selection take different assumptions of dependency among features to find Markov blanket, and their algorithms are shown different level of approximation for finding Markov blanket. In this framework, we are able to analyze the sample and error bounds of casual and non-causal methods. We conducted extensive experiments to show the correctness of our theoretical analysis.
- Feb 16 2018 cs.LG arXiv:1802.05640v1Gradient boosting using decision trees as base learners, so called Gradient Boosted Decision Trees (GBDT), is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, various GDBT construction algorithms and implementation have been designed and heavily optimized in some very popular open sourced toolkits such as XGBoost and LightGBM. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees. We show PL Trees can accelerate convergence of GBDT. Moreover, our new algorithm fits better to modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. We propose optimization techniques to speedup our algorithm. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time. Our algorithm also produces much concise tree ensembles, thus can often reduce testing time costs.
- The focus of this paper is on multi-user MIMO transmissions for millimeter wave systems with a hybrid precoding architecture at the base-station. To enable multi-user transmissions, the base-station uses a cell-specific codebook of beamforming vectors over an initial beam alignment phase. Each user uses a user-specific codebook of beamforming vectors to learn the top-P (where P >= 1) beam pairs in terms of the observed SNR in a single-user setting. The top-P beam indices along with their SNRs are fed back from each user and the base-station leverages this information to generate beam weights for simultaneous transmissions. A typical method to generate the beam weights is to use only the best beam for each user and either steer energy along this beam, or to utilize this information to reduce multi-user interference. The other beams are used as fall back options to address blockage or mobility. Such an approach completely discards information learned about the channel condition(s) even though each user feeds back this information. With this background, this work develops an advanced directional precoding structure for simultaneous transmissions at the cost of an additional marginal feedback overhead. This construction relies on three main innovations: 1) Additional feedback to allow the base-station to reconstruct a rank-P approximation of the channel matrix between it and each user, 2) A zeroforcing structure that leverages this information to combat multi-user interference by remaining agnostic of the receiver beam knowledge in the precoder design, and 3) A hybrid precoding architecture that allows both amplitude and phase control at low-complexity and cost to allow the implementation of the zeroforcing structure. Numerical studies show that the proposed scheme results in a significant sum rate performance improvement over naive schemes even with a coarse initial beam alignment codebook.
- We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. The algorithm is a slight variant of the ProxSVRG algorithm [Reddi et al., 2016b]. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results (in terms of the number of stochastic gradient oracle calls and proximal operations), and improves/generalizes some others. In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016b]. Finally, for nonconvex functions satisfied Polyak-Łojasiewicz condition, we show that ProxSVRG+ achieves global linear convergence rate without restart. ProxSVRG+ is always no worse than ProxGD and ProxSVRG/SAGA, and sometimes outperforms them (and generalizes the results of SCSG) in this case.
- Feb 13 2018 cs.DS arXiv:1802.03671v1A long series of recent results and breakthroughs have led to faster and better distributed approximation algorithms for single source shortest paths (SSSP) and related problems in the CONGEST model. The runtime of all these algorithms, however, is $\tilde{\Omega}(\sqrt{n})$, regardless of the network topology, even on nice networks with a (poly)logarithmic network diameter $D$. While this is known to be necessary for some pathological networks, most topologies of interest are arguably not of this type. We give the first distributed approximation algorithms for shortest paths problems that adjust to the topology they are run on, thus achieving significantly faster running times on many topologies of interest. The running time of our algorithms depends on and is close to $Q$, where $Q$ is the quality of the best shortcut that exists for the given topology. While $Q = \tilde{\Theta}(\sqrt{n} + D)$ for pathological worst-case topologies, many topologies of interest have $Q = \tilde{\Theta}(D)$, which results in near instance optimal running times for our algorithm, given the trivial $\Omega(D)$ lower bound. The problems we consider are as follows: (1) an approximate shortest path tree and SSSP distances, (2) a polylogarithmic size distance label for every node such that from the labels of any two nodes alone one can determine their distance (approximately), and (3) an (approximately) optimal flow for the transshipment problem. Our algorithms have a tunable tradeoff between running time and approximation ratio. Our fastest algorithms have an arbitrarily good polynomial approximation guarantee and an essentially optimal $\tilde{O}(Q)$ running time. On the other end of the spectrum, we achieve polylogarithmic approximations in $\tilde{O}(Q \cdot n^{\epsilon})$ rounds for any $\epsilon > 0$. It seems likely that eventually, our non-trivial approximation algorithms for the...
- High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
- In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the distribution bias between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the distribution bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to the representative state-of-the-art metric learning methodologies.
- Maintaining reliable millimeter wave (mmWave) connections to many fast-moving mobiles is a key challenge in the theory and practice of 5G systems. In this paper, we develop a new algorithm that can jointly track the beam direction and channel coefficient of mmWave propagation paths using phased antenna arrays. Despite the significant difficulty in this problem, our algorithm can simultaneously achieve fast tracking speed, high tracking accuracy, and low pilot overhead. In static scenarios, this algorithm can converge to the minimum Cramér-Rao lower bound of beam direction with high probability. Simulations reveal that this algorithm greatly outperforms several existing algorithms. Even at SNRs as low as 5dB, our algorithm is capable of tracking a mobile moving at an angular velocity of 5.45 degrees per second and achieving over 95\% of channel capacity with a 32-antenna phased array, by inserting only 10 pilots per second.
- In this paper, given the beamforming vector of confidential messages and artificial noise (AN) projection matrix and total power constraint, a power allocation (PA) strategy of maximizing secrecy rate (Max-SR) is proposed for secure directional modulation (DM) networks. By the method of Lagrange multiplier, the analytic expression of the proposed PA strategy is derived. To confirm the benefit from the Max-SR-based PA strategy, we take the null-space projection (NSP) beamforming scheme as an example and derive its closed-form expression of optimal PA strategy. From simulation results, we find the following facts: in the medium and high signal-to-noise-ratio (SNR) regions, compared with three typical PA parameters such $\beta=0.1, 0.5$, and $0.9$, the optimal PA shows a substantial SR performance gain with maximum gain percent up to more than $60\%$. Additionally, as the PA factor increases from 0 to 1, the achievable SR increases accordingly in the low SNR region whereas it first increases and then decreases in the medium and high SNR regions, where the SR can be approximately viewed as a convex function of the PA factor. Finally, as the number of antennas increases, the optimal PA factor becomes large and tends to one in the medium and high SNR region. In other words, the contribution of AN to SR can be trivial in such a situation.
- Multi-person articulated pose tracking in complex unconstrained videos is an important and challenging problem. In this paper, going along the road of top-down approaches, we propose a decent and efficient pose tracker based on pose flows. First, we design an online optimization framework to build association of cross-frame poses and form pose flows. Second, a novel pose flow non maximum suppression (NMS) is designed to robustly reduce redundant pose flows and re-link temporal disjoint pose flows. Extensive experiments show our method significantly outperforms best reported results on two standard Pose Tracking datasets (PoseTrack dataset and PoseTrack Challenge dataset) by 13 mAP 25 MOTA and 6 mAP 3 MOTA respectively. Moreover, in the case of working on detected poses in individual frames, the extra computation of proposed pose tracker is very minor, requiring 0.01 second per frame only.
- Jan 31 2018 cs.NI arXiv:1801.09812v1The new generation of LED-based illuminating infrastructures has enabled a "dual-paradigm" where LEDs are used for both illumination and communication purposes. The ubiquity of lighting makes visible light communication (VLC) well suited for communication with mobile devices and sensor nodes in indoor environment. Existing research on VLC has primarily been focused on advancing the performance of one-way communication. In this paper, we present Retro-VLC, a low-power duplex VLC system that enables a mobile device to perform bi-directional communication with the illuminating LEDs over the same light carrier. The design features a retro-reflector fabric that backscatters light, an LCD shutter that modulates information bits on the backscattered light carrier, and several low-power optimization techniques. We have prototyped the Reader system and made a few battery-free tag devices. Experimental results show that the tag can achieve a 10kbps downlink speed and 0.5kbps uplink speed over a distance of 2.4m. We also outline several potential applications of the proposed Retro-VLC system.
- As a representative sequential pattern mining problem, counting the frequency of serial episodes from a streaming sequence has drawn continuous attention in academia due to its wide application in practice, e.g., telecommunication alarms, stock market, transaction logs, bioinformatics, etc. Although a number of serial episodes mining algorithms have been developed recently, most of them are neither stream-oriented, as they require multi-pass of dataset, nor time-aware, as they fail to take into account the time constraint of serial episodes. In this paper, we propose two novel one-pass algorithms, ONCE and ONCE+, each of which can respectively compute two popular frequencies of given episodes satisfying predefined time-constraint as signals in a stream arrives one-after-another. ONCE is only used for non-overlapped frequency where the occurrences of a serial episode in sequence are not intersected. ONCE+ is designed for the distinct frequency where the occurrences of a serial episode do not share any event. Theoretical study proves that our algorithm can correctly mine the frequency of target time constraint serial episodes in a given stream. Experimental study over both real-world and synthetic datasets demonstrates that the proposed algorithm can work, with little time and space, in signal-intensive streams where millions of signals arrive within a single second. Moreover, the algorithm has been applied in a real stream processing system, where the efficacy and efficiency of this work is tested in practical applications.
- Jan 30 2018 cs.PF arXiv:1801.09212v1The past decades witness FLOPS (Floating-point Operations per Second), as an important computation-centric performance metric, guides computer architecture evolution, bridges hardware and software co-design, and provides quantitative performance number for system optimization. However, for emerging datacenter computing (in short, DC) workloads, such as internet services or big data analytics, previous work reports on the modern CPU architecture that the average proportion of floating-point instructions only takes 1% and the average FLOPS efficiency is only 0.1%, while the average CPU utilization is high as 63%. These contradicting performance numbers imply that FLOPS is inappropriate for evaluating DC computer systems. To address the above issue, we propose a new computation-centric metric BOPS (Basic OPerations per Second). In our definition, Basic Operations include all of arithmetic, logical, comparing and array addressing operations for integer and floating point. BOPS is the average number of BOPs (Basic OPerations) completed each second. To that end, we present a dwarf-based measuring tool to evaluate DC computer systems in terms of our new metrics. On the basis of BOPS, also we propose a new roofline performance model for DC computing. Through the experiments, we demonstrate that our new metrics--BOPS, measuring tool, and new performance model indeed facilitate DC computer system design and optimization.
- Jan 26 2018 cs.AI arXiv:1801.08295v1In this paper, we study the problem of discovering the Markov blanket (MB) of a target variable from multiple interventional datasets. Datasets attained from interventional experiments contain richer causal information than passively observed data (observational data) for MB discovery. However, almost all existing MB discovery methods are designed for finding MBs from a single observational dataset. To identify MBs from multiple interventional datasets, we face two challenges: (1) unknown intervention variables; (2) nonidentical data distributions. To tackle the challenges, we theoretically analyze (a) under what conditions we can find the correct MB of a target variable, and (b) under what conditions we can identify the causes of the target variable via discovering its MB. Based on the theoretical analysis, we propose a new algorithm for discovering MBs from multiple interventional datasets, and present the conditions/assumptions which assure the correctness of the algorithm. To our knowledge, this work is the first to present the theoretical analyses about the conditions for MB discovery in multiple interventional datasets and the algorithm to find the MBs in relation to the conditions. Using benchmark Bayesian networks and real-world datasets, the experiments have validated the effectiveness and efficiency of the proposed algorithm in the paper.
- Jan 26 2018 cs.CV arXiv:1801.08360v1Due to the impressive learning power, deep learning has achieved a remarkable performance in supervised hash function learning. In this paper, we propose a novel asymmetric supervised deep hashing method to preserve the semantic structure among different categories and generate the binary codes simultaneously. Specifically, two asymmetric deep networks are constructed to reveal the similarity between each pair of images according to their semantic labels. The deep hash functions are then learned through two networks by minimizing the gap between the learned features and discrete codes. Furthermore, since the binary codes in the Hamming space also should keep the semantic affinity existing in the original space, another asymmetric pairwise loss is introduced to capture the similarity between the binary codes and real-value features. This asymmetric loss not only improves the retrieval performance, but also contributes to a quick convergence at the training phase. By taking advantage of the two-stream deep structures and two types of asymmetric pairwise functions, an alternating algorithm is designed to optimize the deep features and high-quality binary codes efficiently. Experimental results on three real-world datasets substantiate the effectiveness and superiority of our approach as compared with state-of-the-art.
- Jan 22 2018 cs.DS arXiv:1801.06237v1Distributed network optimization algorithms, such as minimum spanning tree, minimum cut, and shortest path, are an active research area in distributed computing. This paper presents a fast distributed algorithm for such problems in the CONGEST model, on networks that exclude a fixed minor. On general graphs, many optimization problems, including the ones mentioned above, require $\tilde\Omega(\sqrt n)$ rounds of communication in the CONGEST model, even if the network graph has a much smaller diameter. Naturally, the next step in algorithm design is to design efficient algorithms which bypass this lower bound on a restricted class of graphs. Currently, the only known method of doing so uses the low-congestion shortcut framework of Ghaffari and Haeupler [SODA'16]. Building off of their work, this paper proves that excluded minor graphs admit high-quality shortcuts, leading to an $\tilde O(D^2)$ round algorithm for the aforementioned problems, where $D$ is the diameter of the network graph. To work with excluded minor graph families, we utilize the Graph Structure Theorem of Robertson and Seymour. To the best of our knowledge, this is the first time the Graph Structure Theorem has been used for an algorithmic result in the distributed setting. Even though the proof is involved, merely showing the existence of good shortcuts is sufficient to obtain simple, efficient distributed algorithms. In particular, the shortcut framework can efficiently construct near-optimal shortcuts and then use them to solve the optimization problems. This, combined with the very general family of excluded minor graphs, which includes most other important graph classes, makes this result of significant interest.
- Let $\epsilon\in[0, 1/2]$ be the noise parameter and $p>1$. We study the isoperimetric problem that for fixed mean $\E f$ which Boolean function $f$ maximizes the $p$-th moment $\E(T_\epsilon f)^p$ of the noise operator $T_{\epsilon}$ acting on Boolean functions $f:\{0, 1\}^n\mapsto \{0, 1\}$. Our findings are: in the low noise scenario, i.e., $\epsilon$ is small, the maximum is achieved by the lexicographical function; in the high noise scenario, i.e., $\epsilon$ is close to 1/2, the maximum is achieved by Boolean functions with the maximal degree-1 Fourier weight; and when $p$ is a large integer, the maximum is achieved by some monotone function, which particularly implies that, among balanced Boolean functions, the maximum is achieved by any function which is 0 on all strings with fewer than $n/2$ 1's. Our results recover Mossel and O'Donnell's results about the problem of non-interactive correlation distillation, and confirm Courtade and Kumar's Conjecture on the most informative Boolean function in the low noise and high noise regimes. We also observe that Courtade and Kumar's Conjecture is equivalent to that the dictator function maximizes $\E(T_\epsilon f)^p$ for $p$ close to 1.
- There has been a growing interest in the commercialization of millimeter wave (mmW) technology as a part of the Fifth-Generation New Radio (5G-NR) wireless standardization efforts. In this direction, many sets of independent measurement campaigns show that wireless propagation at mmW carrier frequencies is only marginally worse than propagation at sub-6 GHz carrier frequencies for small-cell coverage --- one of the most important use-cases for 5G-NR. On the other hand, the biggest determinants of viability of mmW systems in practice are penetration and blockage of mmW signals through different materials in the scattering environment. With this background, the focus of this paper is on understanding the impact of blockage of mmW signals and reduced spatial coverage due to penetration through the human hand, body, vehicles, etc. Leveraging measurements with a 28 GHz mmW experimental prototype and electromagnetic simulation studies, we first propose statistical blockage models to capture the impact of the hand, human body and vehicles. We then study the time-scales at which mmW signals are disrupted by blockage (hand and human body). Our results show that these events can be attributed to physical movements and the time-scales corresponding to blockage are hence on the order of a few 100 ms or more. Building on this fundamental understanding, we finally consider the broader question of robustness of mmW beamforming to handle blockage. Network densification, subarray switching in a user equipment (UE) designed with multiple subarrays, fall back mechanisms such as codebook enhancements and switching to legacy carriers in non-standalone deployments, etc. can address blockage before it leads to a deleterious impact on the mmW link margin.
- Cluster analysis and outlier detection are strongly coupled tasks in data mining area. Cluster structure can be easily destroyed by few outliers; on the contrary, the outliers are defined by the concept of cluster, which are recognized as the points belonging to none of the clusters. However, most existing studies handle them separately. In light of this, we consider the joint cluster analysis and outlier detection problem, and propose the Clustering with Outlier Removal (COR) algorithm. Generally speaking, the original space is transformed into the binary space via generating basic partitions in order to define clusters. Then an objective function based Holoentropy is designed to enhance the compactness of each cluster with a few outliers removed. With further analyses on the objective function, only partial of the problem can be handled by K-means optimization. To provide an integrated solution, an auxiliary binary matrix is nontrivally introduced so that COR completely and efficiently solves the challenging problem via a unified K-means- - with theoretical supports. Extensive experimental results on numerous data sets in various domains demonstrate the effectiveness and efficiency of COR significantly over the rivals including K-means- - and other state-of-the-art outlier detection methods in terms of cluster validity and outlier detection. Some key factors in COR are further analyzed for practical use. Finally, an application on flight trajectory is provided to demonstrate the effectiveness of COR in the real-world scenario.
- We consider the problem of optimally compressing and caching data across a communication network. Given the data generated at edge nodes and a routing path, our goal is to determine the optimal data compression ratios and caching decisions across the network in order to minimize average latency, which can be shown to be equivalent to maximizing the compression and caching gain under an energy consumption constraint. We show that this problem is NP-hard in general and the hardness is caused by the caching decision subproblem, while the compression sub-problem is polynomial-time solvable. We then propose an approximation algorithm that achieves a $(1-1/e)$-approximation solution to the optimum in strongly polynomial time. We show that our proposed algorithm achieve the near-optimal performance in synthetic-based evaluations. In this paper, we consider a tree-structured network as an illustrative example, but our results easily extend to general network topology at the expense of more complicated notations.
- Jan 09 2018 cs.AI arXiv:1801.02334v2Concept-cognitive learning (CCL) is a hot topic in recent years, and it has attracted much attention from the communities of formal concept analysis, granular computing and cognitive computing. However, the relationship among cognitive computing (CC), concept-cognitive computing (CCC) and CCL is not clearly described. To this end, we explain the relationship of CC, CCC and CCL. Then, we propose a generalized concept-cognitive learning (GCCL) from the point of view of machine learning. Finally, experiments on some data sets are conducted to evaluate concept formation and concept-cognitive processes of the proposed GCCL.
- Decoupled fractional Laplacian wave equation can describe the seismic wave propagation in attenuating media. Fourier pseudospectral implementations, which solve the equation in spatial frequency domain, are the only existing methods for solving the equation. For the earth media with curved boundaries, the pseudospectral methods could be less attractive to handle the irregular computational domains. In the paper, we propose a radial basis function collocation method that can easily tackle the irregular domain problems. Unlike the pseudospectral methods, the proposed method solves the equation in physical variable domain. The directional fractional Laplacian is chosen from varied definitions of fractional Laplacian. Particularly, the vector Grünwald-Letnikov formula is employed to approximate fractional directional derivative of radial basis function. The convergence and stability of the method are numerically investigated by using the synthetic solution and the long-time simulations, respectively. The method's flexibility is studied by considering homogeneous and multi-layer media having regular and irregular geometric boundaries.
- Jan 04 2018 cs.DB arXiv:1801.01012v1Finding a list of k teams of experts, referred to as top-k team formation, with the required skills and high collaboration compatibility has been extensively studied. However, existing methods have not considered the specific collaboration relationships among different team members, i.e., structural constraints, which are typically needed in practice. In this study, we first propose a novel graph pattern matching approach for top-k team formation, which incorporates both structural constraints and capacity bounds. Second, we formulate and study the dynamic top-k team formation problem due to the growing need of a dynamic environment. Third, we develop an unified incremental approach, together with an optimization technique, to handle continuous pattern and data updates, separately and simultaneously, which has not been explored before. Finally, using real-life and synthetic data, we conduct an extensive experimental study to show the effectiveness and efficiency of our graph pattern matching approach for (dynamic) top-k team formation.
- Benefitting from multi-user gain brought by multi-antenna techniques, space division multiple access (SDMA) is capable of significantly enhancing spatial throughput (ST) in wireless networks. Nevertheless, we show in this letter that, even when SDMA is applied, ST would diminish to be zero in ultra-dense networks (UDN), where small cell base stations (BSs) are fully densified. More importantly, we compare the performance of SDMA, single-user beamforming (SU-BF) (one user is served in each cell) and full SDMA (the number of served users equals the number of equipped antennas). Surprisingly, it is shown that SU-BF achieves the highest ST and critical density, beyond which ST starts to degrade, in UDN. The results in this work could shed light on the fundamental limitation of SDMA in UDN.
- In this paper, we introduce the concept of Eventness for audio event detection, which can, in part, be thought of as an analogue to Objectness from computer vision. The key observation behind the eventness concept is that audio events reveal themselves as 2-dimensional time-frequency patterns with specific textures and geometric structures in spectrograms. These time-frequency patterns can then be viewed analogously to objects occurring in natural images (with the exception that scaling and rotation invariance properties do not apply). With this key observation in mind, we pose the problem of detecting monophonic or polyphonic audio events as an equivalent visual object(s) detection problem under partial occlusion and clutter in spectrograms. We adapt a state-of-the-art visual object detection model to evaluate the audio event detection task on publicly available datasets. The proposed network has comparable results with a state-of-the-art baseline and is more robust on minority events. Provided large-scale datasets, we hope that our proposed conceptual model of eventness will be beneficial to the audio signal processing community towards improving performance of audio event detection.
- The lack of strong labels has severely limited the state-of-the-art fully supervised audio tagging systems to be scaled to larger dataset. Meanwhile, audio-visual learning models based on unlabeled videos have been successfully applied to audio tagging, but they are inevitably resource hungry and require a long time to train. In this work, we propose a light-weight, multimodal framework for environmental audio tagging. The audio branch of the framework is a convolutional and recurrent neural network (CRNN) based on multiple instance learning (MIL). It is trained with the audio tracks of a large collection of weakly labeled YouTube video excerpts; the video branch uses pretrained state-of-the-art image recognition networks and word embeddings to extract information from the video track and to map visual objects to sound events. Experiments on the audio tagging task of the DCASE 2017 challenge show that the incorporation of video information improves a strong baseline audio tagging system by 5.3\% absolute in terms of $F_1$ score. The entire system can be trained within 6~hours on a single GPU, and can be easily carried over to other audio tasks such as speech sentimental analysis.
- State-of-the-art audio event detection (AED) systems rely on supervised learning using strongly labeled data. However, this dependence severely limits scalability to large-scale datasets where fine resolution annotations are too expensive to obtain. In this paper, we propose a multiple instance learning (MIL) framework for multi-class AED using weakly annotated labels. The proposed MIL framework uses audio embeddings extracted from a pre-trained convolutional neural network as input features. We show that by using audio embeddings the MIL framework can be implemented using a simple DNN with performance comparable to recurrent neural networks. We evaluate our approach by training an audio tagging system using a subset of AudioSet, which is a large collection of weakly labeled YouTube video excerpts. Combined with a late-fusion approach, we improve the F1 score of a baseline audio tagging system by 17\%. We show that audio embeddings extracted by the convolutional neural networks significantly boost the performance of all MIL models. This framework reduces the model complexity of the AED system and is suitable for applications where computational resources are limited.
- Computing shortest paths is one of the central problems in the theory of distributed computing. For the last few years, substantial progress has been made on the approximate single source shortest paths problem, culminating in an algorithm of Becker et al. [DISC'17] which deterministically computes $(1+o(1))$-approximate shortest paths in $\tilde O(D+\sqrt n)$ time, where $D$ is the hop-diameter of the graph. Up to logarithmic factors, this time complexity is optimal, matching the lower bound of Das Sarma et al. [STOC'11]. The question of exact shortest paths however saw no algorithmic progress for decades, until the recent breakthrough of Elkin [STOC'17], which established a sublinear-time algorithm for exact single source shortest paths on undirected graphs. Shortly after, Huang et al. [FOCS'17] provided improved algorithms for exact all pairs shortest paths problem on directed graphs. In this paper, we present a new single-source shortest path algorithm with complexity $\tilde O(n^{3/4}D^{1/4})$. For polylogarithmic $D$, this improves on Elkin's $\tilde{O}(n^{5/6})$ bound and gets closer to the $\tilde{\Omega}(n^{1/2})$ lower bound of Peleg and Rubinovich [FOCS'99]. For larger values of $D$, we present an improved variant of our algorithm which achieves complexity $\tilde{O}\left( n^{3/4+o(1)}+ \min\{ n^{3/4}D^{1/6},n^{6/7}\}+D\right)$, and thus compares favorably with Elkin's bound of $\tilde{O}(n^{5/6} + n^{2/3}D^{1/3} + D ) $ in essentially the entire range of parameters. This algorithm provides also a qualitative improvement, because it works for the more challenging case of directed graphs (i.e., graphs where the two directions of an edge can have different weights), constituting the first sublinear-time algorithm for directed graphs. Our algorithm also extends to the case of exact $\kappa$-source shortest paths...
- Dec 25 2017 cs.IR arXiv:1712.08550v1Nowadays, events usually burst and are propagated online through multiple modern media like social networks and search engines. There exists various research discussing the event dissemination trends on individual medium, while few studies focus on event popularity analysis from a cross-platform perspective. Challenges come from the vast diversity of events and media, limited access to aligned datasets across different media and a great deal of noise in the datasets. In this paper, we design DancingLines, an innovative scheme that captures and quantitatively analyzes event popularity between pairwise text media. It contains two models: TF-SW, a semantic-aware popularity quantification model, based on an integrated weight coefficient leveraging Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series alignment model matching different event phases adapted from Dynamic Time Warping. We also propose three metrics to interpret event popularity trends between pairwise social platforms. Experimental results on eighteen real-world event datasets from an influential social network and a popular search engine validate the effectiveness and applicability of our scheme. DancingLines is demonstrated to possess broad application potentials for discovering the knowledge of various aspects related to events and different media.
- In this paper, we introduce the concept of infinitely split Nash equilibrium in repeated games in which the profile sets are chain-complete posets. Then by using a fixed point theorem on posets in [8], we prove an existence theorem. As an application, we study the repeated extended Bertrant duopoly model of price competition.
- The design of caching algorithms to maximize hit probability has been extensively studied. However, the value of high hit probabilities can vary across contents due to differential service requirements. In this paper, we associate each content with a utility, which is a function of the corresponding content hit rate or hit probability. We formulate a cache optimization problem to maximize the sum of utilities over all contents under stationary and ergodic request process, which is non-convex in general. We find that the problem can be reformulated as a convex optimization problem if the inter-request distribution has a non-increasing hazard rate function. We provide explicit optimal solutions for some inter-request distributions, and compare the solutions to the hit-rate based (HRB) and hit-probability based (HPB) problems. We also propose distributed algorithms that not only can adapt to changes in the system with limited information but also provide solutions in a decentralized way. We find that distributed algorithms that solve HRB are more robust than distributed HPB algorithms. Informed by these results, we further propose a lightweight Poisson approximate online algorithm, which is accurate and efficient in achieving exact hit rates and hit probabilities, and also improves the aggregate utilities.
- Dec 21 2017 cs.CV arXiv:1712.07286v1This paper mainly establishes a large-scale Long sequence Video database for person re-IDentification (LVreID). Different from existing datasets, LVreID presents many important new features. (1) long sequences: the average sequence length is 200 frames, which convey more abundant cues like pose and viewpoint changes that can be explored for feature learning. (2) complex lighting, scene, and background variations: it is captured by 15 cameras located in both indoor and outdoor scenes in 12 time slots. (3) currently the largest size: it contains 3,772 identities and about 3 million bounding boxes. Those unique features in LVreID define a more challenging and realistic person ReID task. Spatial Aligned Temporal Pyramid Pooling (SATPP) network is proposed as a baseline algorithm to leverage the rich visual-temporal cues in LVreID for feature learning. SATPP jointly handles the misalignment issues in detected bounding boxes and efficiently aggregates the discriminative cues embedded in sequential video frames. Extensive experiments show feature extracted by SATPP outperforms several widely used video features. Our experiments also prove the ReID accuracy increases substantially along with longer sequence length. This demonstrates the advantage and necessity of using longer video sequences for person ReID.
- Dec 21 2017 cs.CV arXiv:1712.07576v1We address the problem of affordance reasoning in diverse scenes that appear in the real world. Affordances relate the agent's actions to their effects when taken on the surrounding objects. In our work, we take the egocentric view of the scene, and aim to reason about action-object affordances that respect both the physical world as well as the social norms imposed by the society. We also aim to teach artificial agents why some actions should not be taken in certain situations, and what would likely happen if these actions would be taken. We collect a new dataset that builds upon ADE20k, referred to as ADE-Affordance, which contains annotations enabling such rich visual reasoning. We propose a model that exploits Graph Neural Networks to propagate contextual information from the scene in order to perform detailed affordance reasoning about each object. Our model is showcased through various ablation studies, pointing to successes and challenges in this complex task.
- We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.
- This paper considers the scheduling of parallel real-time tasks with arbitrary-deadlines. Each job of a parallel task is described as a directed acyclic graph (DAG). In contrast to prior work in this area, where decomposition-based scheduling algorithms are proposed based on the DAG-structure and inter-task interference is analyzed as self-suspending behavior, this paper generalizes the federated scheduling approach. We propose a reservation-based algorithm, called reservation-based federated scheduling, that dominates federated scheduling. We provide general constraints for the design of such systems and prove that reservation-based federated scheduling has a constant speedup factor with respect to any optimal DAG task scheduler. Furthermore, the presented algorithm can be used in conjunction with any scheduler and scheduling analysis suitable for ordinary arbitrary-deadline sporadic task sets, i.e., without parallelism.
- Dec 14 2017 cs.NI arXiv:1712.04818v1Modal crosstalk is the main bottleneck in MMF-enabled optical datacenter networks with direct detection. A novel time-slicing-based crosstalk-mitigated MDM scheme is first proposed, then theoretically analyzed and experimentally demonstrated.
- Dec 14 2017 cs.CL arXiv:1712.04762v3We present our approach for computer-aided social media text authorship attribution based on recent advances in short text authorship verification. We use various natural language techniques to create word-level and character-level models that act as hidden layers to simulate a simple neural network. The choice of word-level and character-level models in each layer was informed through validation performance. The output layer of our system uses an unweighted majority vote vector to arrive at a conclusion. We also considered writing bias in social media posts while collecting our training dataset to increase system robustness. Our system achieved a precision, recall, and F-measure of 0.82, 0.926 and 0.869 respectively.
- Prediction of popularity has profound impact for social media, since it offers opportunities to reveal individual preference and public attention from evolutionary social systems. Previous research, although achieves promising results, neglects one distinctive characteristic of social data, i.e., sequentiality. For example, the popularity of online content is generated over time with sequential post streams of social media. To investigate the sequential prediction of popularity, we propose a novel prediction framework called Deep Temporal Context Networks (DTCN) by incorporating both temporal context and temporal attention into account. Our DTCN contains three main components, from embedding, learning to predicting. With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space. Then, based on the embedded data sequence over time, temporal context learning attempts to recurrently learn two adaptive temporal contexts for sequential popularity. Finally, a novel temporal attention is designed to predict new popularity (the popularity of a new user-post pair) with temporal coherence across multiple time-scales. Experiments on our released image dataset with about 600K Flickr photos demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms, with an average of 21.51% relative performance improvement in the popularity prediction (Spearman Ranking Correlation).
- Many applications must ingest and analyze data that are continuously generated over time from geographically distributed sources such as users, sensors and devices. This results in the need for efficient data analytics in geo-distributed systems. Energy efficiency is a fundamental requirement in these geo-distributed data communication systems, and its importance is reflected in much recent work on performance analysis of system energy consumption. However, most works have only focused on communication and computation costs, and do not account for caching costs. Given the increasing interest in cache networks, this is a serious deficiency. In this paper, we consider the energy consumption tradeoff among communication, computation, and caching (C3) for data analytics under a Quality of Information (QoI) guarantee in a geo-distributed system. To attain this goal, we formulate an optimization problem to capture the C3 costs, which turns out to be a non-convex Mixed Integer Non-Linear Programming (MINLP) Problem. We then propose a variant of spatial branch and bound algorithm (V-SBB), that can achieve e-global optimal solution to the original MINLP. We show numerically that V-SBB is more stable and robust than other candidate MINLP solvers under different network scenarios. More importantly, we observe that the energy efficiency under our C3 optimization framework improves by as much as 88% compared to any C2 optimization between communication and computation or caching.
- Dec 11 2017 cs.CV arXiv:1712.03037v1In this paper, we present a frequency domain neural network for image super-resolution. The network employs the convolution theorem so as to cast convolutions in the spatial domain as products in the frequency domain. Moreover, the non-linearity in deep nets, often achieved by a rectifier unit, is here cast as a convolution in the frequency domain. This not only yields a network which is very computationally efficient at testing but also one whose parameters can all be learnt accordingly. The network can be trained using back propagation and is devoid of complex numbers due to the use of the Hartley transform as an alternative to the Fourier transform. Moreover, the network is potentially applicable to other problems elsewhere in computer vision and image processing which are often cast in the frequency domain. We show results on super-resolution and compare against alternatives elsewhere in the literature. In our experiments, our network is one to two orders of magnitude faster than the alternatives with an imperceptible loss of performance.
- Dec 11 2017 cs.CV arXiv:1712.03149v1Aggregating context information from multiple scales has been proved to be effective for improving accuracy of Single Shot Detectors (SSDs) on object detection. However, existing multi-scale context fusion techniques are computationally expensive, which unfavorably diminishes the advantageous speed of SSD. In this work, we propose a novel network topology, called WeaveNet, that can efficiently fuse multi-scale information and boost the detection accuracy with negligible extra cost. The proposed WeaveNet iteratively weaves context information from adjacent scales together to enable more sophisticated context reasoning while maintaining fast speed. Built by stacking light-weight blocks, WeaveNet is easy to train without requiring batch normalization and can be further accelerated by our proposed architecture simplification. Experimental results on PASCAL VOC 2007, PASCAL VOC 2012 benchmarks show signification performance boost brought by WeaveNet. For 320x320 input of batch size = 8, WeaveNet reaches 79.5% mAP on PASCAL VOC 2007 test in 101 fps with only 4 fps extra cost, and further improves to 79.7% mAP with more iterations.
- A large-scale fully-digital receive antenna array can provide very high-resolution direction of arrival (DOA) estimation, but resulting in a significantly high RF-chain circuit cost. Thus, a hybrid analog and digital (HAD) structure is preferred. Two phase alignment (PA) methods, HAD PA (HADPA) and hybrid digital and analog PA (HDAPA), are proposed to estimate DOA based on the parametric method. Compared to analog phase alignment (APA), they can significantly reduce the complexity in the PA phases. Subsequently, a fast root multiple signal classification HDAPA (Root-MUSIC-HDAPA) method is proposed specially for this hybrid structure to implement an approximately analytical solution. Due to the HAD structure, there exists the effect of direction-finding ambiguity. A smart strategy of maximizing the average receive power is adopted to delete those spurious solutions and preserve the true optimal solution by linear searching over a set of limited finite candidate directions. This results in a significant reduction in computational complexity. Eventually, the Cramer-Rao lower bound (CRLB) of finding emitter direction using the HAD structure is derived. Simulation results show that our proposed methods, Root-MUSIC-HDAPA and HDAPA, can achieve the hybrid CRLB with their complexities being significantly lower than those of pure linear searching-based methods, such as APA.
- In this work, an adaptive and robust null-space projection (AR-NSP) scheme is proposed for secure transmission with artificial noise (AN)-aided directional modulation (DM) in wireless networks. The proposed scheme is carried out in three steps. Firstly, the directions of arrival (DOAs) of the signals from the desired user and eavesdropper are estimated by the Root Multiple Signal Classificaiton (Root-MUSIC) algorithm and the related signal-to-noise ratios (SNRs) are estimated based on the ratio of the corresponding eigenvalue to the minimum eigenvalue of the covariance matrix of the received signals. In the second step, the value intervals of DOA estimation errors are predicted based on the DOA and SNR estimations. Finally, a robust NSP beamforming DM system is designed according to the afore-obtained estimations and predictions. Our examination shows that the proposed scheme can significantly outperform the conventional non-adaptive robust scheme and non-robust NSP scheme in terms of achieving a much lower bit error rate (BER) at the desired user and a much higher secrecy rate (SR). In addition, the BER and SR performance gains achieved by the proposed scheme relative to other schemes increase with the value range of DOA estimation error.
- Dec 04 2017 cs.CV arXiv:1712.00133v1Recently, with the enormous growth of online videos, fast video retrieval research has received increasing attention. As an extension of image hashing techniques, traditional video hashing methods mainly depend on hand-crafted features and transform the real-valued features into binary hash codes. As videos provide far more diverse and complex visual information than images, extracting features from videos is much more challenging than that from images. Therefore, high-level semantic features to represent videos are needed rather than low-level hand-crafted methods. In this paper, a deep convolutional neural network is proposed to extract high-level semantic features and a binary hash function is then integrated into this framework to achieve an end-to-end optimization. Particularly, our approach also combines triplet loss function which preserves the relative similarity and difference of videos and classification loss function as the optimization objective. Experiments have been performed on two public datasets and the results demonstrate the superiority of our proposed method compared with other state-of-the-art video retrieval methods.
- Probabilistic timed automata (PTAs) are timed automata (TAs) extended with discrete probability distributions.They serve as a mathematical model for a wide range of applications that involve both stochastic and timed behaviours. In this work, we consider the problem of model-checking linear \emphdense-time properties over PTAs. In particular, we study linear dense-time properties that can be encoded by TAs with infinite acceptance criterion.First, we show that the problem of model-checking PTAs against deterministic-TA specifications can be solved through a product construction. Based on the product construction, we prove that the computational complexity of the problem with deterministic-TA specifications is EXPTIME-complete. Then we show that when relaxed to general (nondeterministic) TAs, the model-checking problem becomes undecidable.Our results substantially extend state of the art with both the dense-time feature and the nondeterminism in TAs.
- Nov 29 2017 cs.CV arXiv:1711.10152v1Generative adversarial network (GAN) has gotten wide re-search interest in the field of deep learning. Variations of GAN have achieved competitive results on specific tasks. However, the stability of training and diversity of generated instances are still worth studying further. Training of GAN can be thought of as a greedy procedure, in which the generative net tries to make the locally optimal choice (minimizing loss function of discriminator) in each iteration. Unfortunately, this often makes generated data resemble only a few modes of real data and rotate between modes. To alleviate these problems, we propose a novel training strategy to restrict greed in training of GAN. With help of our method, the generated samples can cover more instance modes with more stable training process. Evaluating our method on several representative datasets, we demonstrate superiority of improved training strategy on typical GAN models with different distance metrics.
- Nov 29 2017 cs.CL arXiv:1711.10136v1Recently, the acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion was shown as a natural end-to-end model directly targeting words as output units. However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Therefore, such word-based CTC model can only recognize the frequent words modeled by the network output nodes. It also cannot easily handle the hot-words which emerge after the model is trained. In this study, we improve the acoustic-to-word model with a hybrid CTC model which can predict both words and characters at the same time. With a shared-hidden-layer structure and modular design, the alignments of words generated from the word-based CTC and the character-based CTC are synchronized. Whenever the acoustic-to-word model emits an OOV token, we back off that OOV segment to the word output generated from the character-based CTC, hence solving the OOV or hot-words issue. Evaluated on a Microsoft Cortana voice assistant task, the proposed model can reduce the errors introduced by the OOV output token in the acoustic-to-word model by 30%.
- Intelligent code completion has become an essential tool to accelerate modern software development. To facilitate effective code completion for dynamically-typed programming languages, we apply neural language models by learning from large codebases, and investigate the effectiveness of attention mechanism on the code completion task. However, standard neural language models even with attention mechanism cannot correctly predict out-of-vocabulary (OoV) words thus restrict the code completion performance. In this paper, inspired by the prevalence of locally repeated terms in program source code, and the recently proposed pointer networks which can reproduce words from local context, we propose a pointer mixture network for better predicting OoV words in code completion. Based on the context, the pointer mixture network learns to either generate a within-vocabulary word through an RNN component, or copy an OoV word from local context through a pointer component. Experiments on two benchmarked datasets demonstrate the effectiveness of our attention mechanism and pointer mixture network on the code completion task.
- Kernel methods are powerful tools to capture nonlinear patterns behind data. They implicitly learn high (even infinite) dimensional nonlinear features in the Reproducing Kernel Hilbert Space (RKHS) while making the computation tractable by leveraging the kernel trick. Classic kernel methods learn a single layer of nonlinear features, whose representational power may be limited. Motivated by recent success of deep neural networks (DNNs) that learn multi-layer hierarchical representations, we propose a Stacked Kernel Network (SKN) that learns a hierarchy of RKHS-based nonlinear features. SKN interleaves several layers of nonlinear transformations (from a linear space to a RKHS) and linear transformations (from a RKHS to a linear space). Similar to DNNs, a SKN is composed of multiple layers of hidden units, but each parameterized by a RKHS function rather than a finite-dimensional vector. We propose three ways to represent the RKHS functions in SKN: (1)nonparametric representation, (2)parametric representation and (3)random Fourier feature representation. Furthermore, we expand SKN into CNN architecture called Stacked Kernel Convolutional Network (SKCN). SKCN learning a hierarchy of RKHS-based nonlinear features by convolutional operation with each filter also parameterized by a RKHS function rather than a finite-dimensional matrix in CNN, which is suitable for image inputs. Experiments on various datasets demonstrate the effectiveness of SKN and SKCN, which outperform the competitive methods.
- Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain. This can be achieved by adversarial training of deep neural network (DNN) acoustic models to learn an intermediate deep representation that is both senone-discriminative and domain-invariant. Specifically, the DNN is trained to jointly optimize the primary task of senone classification and the secondary task of domain classification with adversarial objective functions. In this work, instead of only focusing on learning a domain-invariant feature (i.e. the shared component between domains), we also characterize the difference between the source and target domain distributions by explicitly modeling the private component of each domain through a private component extractor DNN. The private component is trained to be orthogonal with the shared component and thus implicitly increases the degree of domain-invariance of the shared component. A reconstructor DNN is used to reconstruct the original speech feature from the private and shared components as a regularization. This domain separation framework is applied to the unsupervised environment adaptation task and achieved 11.08% relative WER reduction from the gradient reversal layer training, a representative adversarial training method, for automatic speech recognition on CHiME-3 dataset.
- Nov 21 2017 cs.DS arXiv:1711.07454v1We use the Sum of Squares method to develop new efficient algorithms for learning well-separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that substantially improve upon the statistical guarantees achieved by previous efficient algorithms. Firstly, we study mixtures of $k$ distributions in $d$ dimensions, where the means of every pair of distributions are separated by at least $k^{\varepsilon}$. In the special case of spherical Gaussian mixtures, we give a $(dk)^{O(1/\varepsilon^2)}$-time algorithm that learns the means assuming separation at least $k^{\varepsilon}$, for any $\varepsilon > 0$. This is the first algorithm to improve on greedy ("single-linkage") and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation $k^{1/4}$. We also study robust estimation. When an unknown $(1-\varepsilon)$-fraction of $X_1,\ldots,X_n$ are chosen from a sub-Gaussian distribution with mean $\mu$ but the remaining points are chosen adversarially, we give an algorithm recovering $\mu$ to error $\varepsilon^{1-1/t}$ in time $d^{O(t^2)}$, so long as sub-Gaussian-ness up to $O(t)$ moments can be certified by a Sum of Squares proof. This is the first polynomial-time algorithm with guarantees approaching the information-theoretic limit for non-Gaussian distributions. Previous algorithms could not achieve error better than $\varepsilon^{1/2}$. Both of these results are based on a unified technique. Inspired by recent algorithms of Diakonikolas et al. in robust statistics, we devise an SDP based on the Sum of Squares method for the following setting: given $X_1,\ldots,X_n \in \mathbb{R}^d$ for large $d$ and $n = poly(d)$ with the promise that a subset of $X_1,\ldots,X_n$ were sampled from a probability distribution with bounded moments, recover some information about that distribution.
- Nov 17 2017 cs.CV arXiv:1711.06055v1Face analytics benefits many multimedia applications. It consists of a number of tasks, such as facial emotion recognition and face parsing, and most existing approaches generally treat these tasks independently, which limits their deployment in real scenarios. In this paper we propose an integrated Face Analytics Network (iFAN), which is able to perform multiple tasks jointly for face analytics with a novel carefully designed network architecture to fully facilitate the informative interaction among different tasks. The proposed integrated network explicitly models the interactions between tasks so that the correlations between tasks can be fully exploited for performance boost. In addition, to solve the bottleneck of the absence of datasets with comprehensive training data for various tasks, we propose a novel cross-dataset hybrid training strategy. It allows "plug-in and play" of multiple datasets annotated for different tasks without the requirement of a fully labeled common dataset for all the tasks. We experimentally show that the proposed iFAN achieves state-of-the-art performance on multiple face analytics tasks using a single integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on the Helen dataset for face parsing, a normalized mean error of 5.81% on the MTFL dataset for facial landmark localization and an accuracy of 45.73% on the BNU dataset for emotion recognition with a single model.
- We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time position follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of states. The distance is defined meaningfully even for two HMMs that are estimated from data of different dimensionality, a situation that can arise due to missing variables. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and that between the two transition matrices. Our new distance is tested on tasks of retrieval, classification, and t-SNE visualization of time series. Experiments on both synthetic and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.
- Nov 16 2017 cs.CL arXiv:1711.05380v2Neural machine translation systems encode a source sequence into a vector from which a target sequence is generated via a decoder. Different from the traditional statistical machine translation, source and target words are not directly mapped to each other in translation rules. They are at the two ends of a long information channel in the encoder-decoder neural network, separated by source and target hidden states. This may lead to translations with inconceivable word alignments. In this paper, we try to bridge source and target word embeddings so as to shorten their distance. We propose three strategies to bridge them: 1) a source state bridging model that moves source word embeddings one step closer to their counterparts, 2) a target state bridging model that explores relevant source word embeddings for target state prediction, and 3) a direct link bridging model that directly connects source and target word embeddings so as to minimize their discrepancy. Experiments and analysis demonstrate that the proposed bridging models are able to significantly improve quality of both translation and word alignments.
- Nov 15 2017 cs.IR arXiv:1711.04725v1Given e-commerce scenarios that user profiles are invisible, session-based recommendation is proposed to generate recommendation results from short sessions. Previous work only considers the user's sequential behavior in the current session, whereas the user's main purpose in the current session is not emphasized. In this paper, we propose a novel neural networks framework, i.e., Neural Attentive Recommendation Machine (NARM), to tackle this problem. Specifically, we explore a hybrid encoder with an attention mechanism to model the user's sequential behavior and capture the user's main purpose in the current session, which are combined as a unified session representation later. We then compute the recommendation scores for each candidate item with a bi-linear matching scheme based on this unified session representation. We train NARM by jointly learning the item and session representations as well as their matchings. We carried out extensive experiments on two benchmark datasets. Our experimental results show that NARM outperforms state-of-the-art baselines on both datasets. Furthermore, we also find that NARM achieves a significant improvement on long sessions, which demonstrates its advantages in modeling the user's sequential behavior and main purpose simultaneously.
- In this work, we propose a covert communication scheme where the transmitter attempts to hide its transmission to a full-duplex receiver, from a warden that is to detect this covert transmission using a radiometer. Specifically, we first derive the detection error rate at the warden, based on which the optimal detection threshold for its radiometer is analytically determined and its expected detection error rate over wireless fading channels is achieved in a closed-form expression. Our analysis indicates that the artificial noise deliberately produced by the receiver with a random transmit power, although causes self-interference, offers the capability of achieving a positive effective covert rate for any transmit power (can be infinity) subject to any given covertness requirement on the expected detection error rate. This work is the first study on the use of the full-duplex receiver with controlled artificial noise for achieving covert communications and invites further investigation in this regard.
- Caching algorithms are usually described by the eviction method and analyzed using a metric of hit probability. Since contents have different importance (e.g. popularity), the utility of a high hit probability, and the cost of transmission can vary across contents. In this paper, we consider timer-based (TTL) policies across a cache network, where contents have differentiated timers over which we optimize. Each content is associated with a utility measured in terms of the corresponding hit probability. We start our analysis from a linear cache network: we propose a utility maximization problem where the objective is to maximize the sum of utilities and a cost minimization problem where the objective is to minimize the content transmission cost across the network. These frameworks enable us to design online algorithms for cache management, for which we prove achieving optimal performance. Informed by the results of our analysis, we formulate a non-convex optimization problem for a general cache network. We show that the duality gap is zero, hence we can develop a distributed iterative primal-dual algorithm for content management in the network. Finally, we consider two applications of our cache network model: (i) directly mapping to content distribution and (ii) generalization to wireless sensor network by jointly considering content caching and content compression. We characterize the tradeoff among caching, compression and communication via a nonlinear non-convex optimization problem. We show that it can be transformed into an equivalent convex problem. The obtained numerical results provide us with insights into how to optimize the performance.
- Nov 08 2017 cs.CL arXiv:1711.02212v1Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training. Otherwise, the networks may fail to find a good local optimum. This is particularly true for low-latency online networks, such as unidirectional LSTMs. Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system. However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon. In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources. We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements. We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19% relative improvement in word error rate compared to a randomly-initialized baseline system.
- Nov 07 2017 cs.DB arXiv:1711.01960v1Currently data explosion poses great challenges to approximate aggregation on efficiency and accuracy. To address this problem, we propose a novel approach to calculate aggregation answers in high accuracy using only a small share of data. We introduce leverages to reflect individual differences of samples from the statistical perspective. Two kinds of estimators, the leverage-based estimator and the sketch estimator (a "rough picture" of the aggregation answer), are in constraint relations and iteratively improved according to the actual conditions until their difference is below a threshold. Due to the iteration mechanism and the leverages, our approach achieves high accuracy. Moreover, some features, including not requiring recording sampled data and easy to extend to various execution modes (such as, the online mode), make our approach well suited to deal with big data. Experiments show that our approach has extraordinary performance, and when compared with the uniform sampling, our approach can achieve high-quality answers with only 1/3 of the same sample size.
- Nov 07 2017 physics.soc-ph cs.SI arXiv:1711.01404v1Scientific coauthorship, generated by collaborations and competitions among researchers, reflects effective organizations of human resources. Researchers, their expected benefits through collaborations, and their cooperative costs constitute the elements of a game. Hence we propose a cooperative game model to explore the evolution mechanisms of scientific coauthorship networks. The model generates geometric hypergraphs, where the costs are modelled by space distances, and the benefits are expressed by node reputations, i. e. geometric zones that depend on node position in space and time. Modelled cooperative strategies conditioned on positive benefit-minus-cost reflect the spatial reciprocity principle in collaborations, and generate high clustering and degree assortativity, two typical features of coauthorship networks. Modelled reputations generate the generalized Poisson parts and fat tails appeared in specific distributions of empirical data, e. g. paper team size distribution. The combined effect of modelled costs and reputations reproduces the transitions emerged in degree distribution, in the correlation between degree and local clustering coefficient, etc. The model provides an example of how individual strategies induce network complexity, as well as an application of game theory to social affiliation networks.
- Falsely identifying different authors as one is called merging error in the name disambiguation of coauthorship networks. Research on the measurement and distribution of merging errors helps to collect high quality coauthorship networks. In the aspect of measurement, we provide a Bayesian model to measure the errors through author similarity. We illustratively use the model and coauthor similarity to measure the errors caused by initial-based name disambiguation methods. The empirical result on large-scale coauthorship networks shows that using coauthor similarity cannot increase the accuracy of disambiguation through surname and the initial of the first given name. In the aspect of distribution, expressing coauthorship data as hypergraphs and supposing the merging error rate is proper to hyperdegree with an exponent, we find that hypergraphs with a range of network properties highly similar to those of low merging error hypergraphs can be constructed from high merging error hypergraphs. It implies that focusing on the error correction of high hyperdegree nodes is a labor- and time-saving approach of improving the data quality for coauthorship network analysis.
- Nov 03 2017 cs.CV arXiv:1711.00648v5It is a difficult task to classify images with multiple class labels using only a small number of labeled examples, especially when the label (class) distribution is imbalanced. Emotion classification is such an example of imbalanced label distribution, because some classes of emotions like \emphdisgusted are relatively rare comparing to other labels like \it happy or sad. In this paper, we propose a data augmentation method using generative adversarial networks (GAN). It can complement and complete the data manifold and find better margins between neighboring classes. Specifically, we design a framework with a CNN model as the classifier and a cycle-consistent adversarial networks (CycleGAN) as the generator. In order to avoid gradient vanishing problem, we employ the least-squared loss as adversarial loss. We also propose several evaluation methods on three benchmark datasets to validate GAN's performance. Empirical results show that we can obtain 5%~10% increase in the classification accuracy after employing the GAN-based data augmentation techniques.
- Nov 01 2017 cs.RO arXiv:1710.11319v1Constructing a smart wheelchair on a commercially available powered wheelchair (PWC) platform avoids a host of seating, mechanical design and reliability issues but requires methods of predicting and controlling the motion of a device never intended for robotics. Analog joystick inputs are subject to black-box transformations which may produce intuitive and adaptable motion control for human operators, but complicate robotic control approaches; furthermore, installation of standard axle mounted odometers on a commercial PWC is difficult. In this work, we present an integrated hardware and software system for predicting the motion of a commercial PWC platform that does not require any physical or electronic modification of the chair beyond plugging into an industry standard auxiliary input port. This system uses an RGB-D camera and an Arduino interface board to capture motion data, including visual odometry and joystick signals, via ROS communication. Future motion is predicted using an autoregressive sparse Gaussian process model. We evaluate the proposed system on real-world short-term path prediction experiments. Experimental results demonstrate the system's efficacy when compared to a baseline neural network model.
- Automatic relation extraction (RE) for types of interest is of great importance for interpreting massive text corpora in an efficient manner. Traditional RE models have heavily relied on human-annotated corpus for training, which can be costly in generating labeled data and become obstacles when dealing with more relation types. Thus, more RE extraction systems have shifted to be built upon training data automatically acquired by linking to knowledge bases (distant supervision). However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy. In recent years, as increasing attention has been brought to tackling question-answering (QA) tasks, user feedback or datasets of such tasks become more accessible. In this paper, we propose a novel framework, ReQuest, to leverage question-answer pairs as an indirect source of supervision for relation extraction, and study how to use such supervision to reduce noise induced from DS. Our model jointly embeds relation mentions, types, QA entity mention pairs and text features in two low-dimensional spaces (RE and QA), where objects with same relation types or semantically similar question-answer pairs have similar representations. Shared features connect these two spaces, carrying clearer semantic knowledge from both sources. ReQuest, then use these learned embeddings to estimate the types of test relation mentions. We formulate a global objective function and adopt a novel margin-based QA loss to reduce noise in DS by exploiting semantic evidence from the QA dataset. Our experimental results achieve an average of 11% improvement in F1 score on two public RE datasets combined with TREC QA dataset.
- Oct 31 2017 cs.SI arXiv:1710.10738v1Unveiling the general structural properties of various complex networks is one of the main tasks in network science. We define the degree of a bunch of nodes as the number of common neighbors they share in the network, namely common neighbor based degree (CNBD). We provide a general model, namely unified ring model, which unifies all the ring based models. We propose a general framework based on the generating function to calculate the CNBD distributions of complex networks. We find that in the ER network, the CNBD distribution obey the Poisson law for node sets of any size. We also study the CNBD distribution for the other types of complex networks including the regular ring lattice, small-world model, scale-free model, and real-world networks.
- Oct 30 2017 cs.DC arXiv:1710.10090v1We present an Edge-as-a-Service (EaaS) platform for realising distributed cloud architectures and integrating the edge of the network in the computing ecosystem. The EaaS platform is underpinned by (i) a lightweight discovery protocol that identifies edge nodes and make them publicly accessible in a computing environment, and (ii) a scalable resource provisioning mechanism for offloading workloads from the cloud on to the edge for servicing multiple user requests. We validate the feasibility of EaaS on an online game use-case to highlight the improvement in the QoS of the application hosted on our cloud-edge platform. On this platform we demonstrate (i) low overheads of less than 6%, (ii) reduced data traffic to the cloud by up to 95% and (iii) minimised application latency between 40%-60%.
- Millimeter-wave multi-input multi-output (mm-Wave MIMO) systems are one of the candidate schemes for 5G wireless standardization efforts. In this context, the main contributions of this article are three-fold. 1) We describe parallel sets of measurements at identical transmit-receive location pairs with 2.9, 29 and 61 GHz carrier frequencies in indoor office, shopping mall, and outdoor settings. These measurements provide insights on propagation, blockage and material penetration losses, and the key elements necessary in system design to make mm-Wave systems viable in practice. 2) One of these elements is hybrid beamforming necessary for better link margins by reaping the array gain with large antenna dimensions. From the class of fully-flexible hybrid beamformers, we describe a robust class of directional beamformers towards meeting the high data-rate requirements of mm-Wave systems. 3) Leveraging these design insights, we then describe an experimental prototype system at 28 GHz that realizes high data-rates on both the downlink and uplink and robustly maintains these rates in outdoor and indoor mobility scenarios. In addition to maintaining large signal constellation sizes in spite of radio frequency challenges, this prototype leverages the directional nature of the mm-Wave channel to perform seamless beam switching and handover across mm-Wave base-stations thereby overcoming the path losses in non-line-of-sight links and blockages encountered at mm-Wave frequencies.
- We present a technique to synthesize and analyze volume-rendered images using generative models. We use the Generative Adversarial Network (GAN) framework to compute a model from a large collection of volume renderings, conditioned on (1) viewpoint and (2) transfer functions for opacity and color. Our approach facilitates tasks for volume analysis that are challenging to achieve using existing rendering techniques such as ray casting or texture-based methods. We show how to guide the user in transfer function editing by quantifying expected change in the output image. Additionally, the generative model transforms transfer functions into a view-invariant latent space specifically designed to synthesize volume-rendered images. We use this space directly for rendering, enabling the user to explore the space of volume-rendered images. As our model is independent of the choice of volume rendering process, we show how to analyze volume-rendered images produced by direct and global illumination lighting, for a variety of volume datasets.
- Wireless communication systems, such as wireless sensor networks and RFIDs, are increasingly adopted to transfer potential highly sensitive information. Since the wireless medium has a sharing nature, adversaries have a chance to eavesdrop confidential information from the communication systems. Adding artificial noises caused by friendly jammers emerges as a feasible defensive technique against adversaries. This paper studies the schedule strategies of friendly jammers, which are randomly and redundantly deployed in a circumscribed geographical area and can be unrechargeable or rechargeable, to maximize the lifetime of the jammer networks and prevent the cracking of jamming effect made by the eavesdroppers, under the constraints of geographical area, energy consumption, transmission power, and threshold level. An approximation algorithm as baseline is first proposed using the integer linear programming model. To further reduce the computational complexity, a heuristic algorithm based on the greedy strategy that less consumption leads to longer lifetime is also proposed. Finally, extensive simulation results show that the proposed algorithms are effective and efficient.
- Oct 25 2017 cs.DS arXiv:1710.08488v1In the $k$-Cut problem, we are given an edge-weighted graph $G$ and an integer $k$, and have to remove a set of edges with minimum total weight so that $G$ has at least $k$ connected components. Prior work on this problem gives, for all $h \in [2,k]$, a $(2-h/k)$-approximation algorithm for $k$-cut that runs in time $n^{O(h)}$. Hence to get a $(2 - \varepsilon)$-approximation algorithm for some absolute constant $\varepsilon$, the best runtime using prior techniques is $n^{O(k\varepsilon)}$. Moreover, it was recently shown that getting a $(2 - \varepsilon)$-approximation for general $k$ is NP-hard, assuming the Small Set Expansion Hypothesis. If we use the size of the cut as the parameter, an FPT algorithm to find the exact $k$-Cut is known, but solving the $k$-Cut problem exactly is $W[1]$-hard if we parameterize only by the natural parameter of $k$. An immediate question is: \emphcan we approximate $k$-Cut better in FPT-time, using $k$ as the parameter? We answer this question positively. We show that for some absolute constant $\varepsilon > 0$, there exists a $(2 - \varepsilon)$-approximation algorithm that runs in time $2^{O(k^6)} \cdot \widetilde{O} (n^4)$. This is the first FPT algorithm that is parameterized only by $k$ and strictly improves the $2$-approximation.