results for au:Huang_C in:cs

- Dec 14 2017 cs.CV arXiv:1712.04616v1Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise $t$-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets.
- Stochastic Gradient Descent (SGD) is the central workhorse for training modern CNNs. Although giving impressive empirical performance it can be slow to converge. In this paper we explore a novel strategy for training a CNN using an alternation strategy that offers substantial speedups during training. We make the following contributions: (i) replace the ReLU non-linearity within a CNN with positive hard-thresholding, (ii) reinterpret this non-linearity as a binary state vector making the entire CNN linear if the multi-layer support is known, and (iii) demonstrate that under certain conditions a global optima to the CNN can be found through local descent. We then employ a novel alternation strategy (between weights and support) for CNN training that leads to substantially faster convergence rates, nice theoretical properties, and achieving state of the art results across large scale datasets (e.g. ImageNet) as well as other standard benchmarks.
- Nov 15 2017 cs.CV arXiv:1711.04661v1Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards real-time applications because of their time-consuming feature extraction and complex optimization details.In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-Lite can track generic objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments are performed on four challenging benchmark tracking datasets: OTB2013, OTB2015, VOT2014 and VOT2015, and our method achieves state-of-the-art results on these benchmarks compared with other real-time trackers.
- While single measurement vector (SMV) models have been widely studied in signal processing, there is a surging interest in addressing the multiple measurement vectors (MMV) problem. In the MMV setting, more than one measurement vector is available and the multiple signals to be recovered share some commonalities such as a common support. Applications in which MMV is a naturally occurring phenomenon include online streaming, medical imaging, and video recovery. This work presents a stochastic iterative algorithm for the support recovery of jointly sparse corrupted MMV. We present a variant of the Sparse Randomized Kaczmarz algorithm for corrupted MMV and compare our proposed method with an existing Kaczmarz type algorithm for MMV problems. We also showcase the usefulness of our approach in the online (streaming) setting and provide empirical evidence that suggests the robustness of the proposed method to the distribution of the corruption and the number of corruptions occurring.
- Sparse representation of a single measurement vector (SMV) has been explored in a variety of compressive sensing applications. Recently, SMV models have been extended to solve multiple measurement vectors (MMV) problems, where the underlying signal is assumed to have joint sparse structures. To circumvent the NP-hardness of the $\ell_0$ minimization problem, many deterministic MMV algorithms solve the convex relaxed models with limited efficiency. In this paper, we develop stochastic greedy algorithms for solving the joint sparse MMV reconstruction problem. In particular, we propose the MMV Stochastic Iterative Hard Thresholding (MStoIHT) and MMV Stochastic Gradient Matching Pursuit (MStoGradMP) algorithms, and we also utilize the mini-batching technique to further improve their performance. Convergence analysis indicates that the proposed algorithms are able to converge faster than their SMV counterparts, i.e., concatenated StoIHT and StoGradMP, under certain conditions. Numerical experiments have illustrated the superior effectiveness of the proposed algorithms over their SMV counterparts.
- We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
- Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.
- We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, $h$, is a neural network which learns to transform a simple noise distribution, $p(\epsilon) = \mathcal{N}(0,I)$, to a distribution $q(\theta) \doteq q(h(\epsilon))$ over the parameters $\theta$ of another neural network (the "primary network"). We train $q$ with variational inference, using an invertible $h$ to enable efficient estimation of the variational lower bound on the posterior $p(\theta | \mathcal{D})$ via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of $q(\theta)$. We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.
- In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.
- In this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users' intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text search algorithm using a inverse filtering mechanism that is very efficient for label based item search. Secondly, we adopt the Bayesian network to implement the user interest prediction for an improved personalized search. According to user input, it searches the related items using keyword information, predicted user interest. Thirdly, the word vectorization is used to discover potential targets according to the semantic meaning. Experimental results show that the proposed search engine has an improved efficiency and accuracy and it can operate on embedded devices with very limited computational resources.
- In this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are used for inducing practical emotional speech: cognitive task, computer game, noise stimulation, sleep deprivation and movie clips. A practical speech emotion recognition system is studied based on Gaussian mixture model. A two-class classifier set is adopted for performance improvement under the small sample case. Considering the context information in continuous emotional speech, a Gaussian mixture model embedded with Markov networks is proposed. A further study is carried out for system robustness analysis. First, noise reduction algorithm based on auditory masking properties is fist introduced to the practical speech emotion recognition. Second, to deal with the complicated unknown emotion types under real situation, an emotion recognition method with rejection ability is proposed, which enhanced the system compatibility against unknown emotion samples. Third, coping with the difficulties brought by a large number of unknown speakers, an emotional feature normalization method based on speaker-sensitive feature clustering is proposed. Fourth, by adding the electrocardiogram channel, a bi-modal emotion recognition system based on speech signals and electrocardiogram signals is first introduced. The speech emotion recognition methods studied in this dissertation may be extended into the cross-language speech emotion recognition and the whispered speech emotion recognition.
- We study computing \em all-pairs shortest paths (APSP) on distributed networks (the CONGEST model). The goal is for every node in the (weighted) network to know the distance from every other node using communication. The problem admits $(1+o(1))$-approximation $\tilde O(n)$-time algorithms ~\citeLenzenP-podc15,Nanongkai-STOC14, which are matched with $\tilde \Omega(n)$-time lower bounds~\citeNanongkai-STOC14,LenzenP_stoc13,FrischknechtHW12\footnote$\tilde \Theta$, $\tilde O$ and $\tilde \Omega$ hide polylogarithmic factors. Note that the lower bounds also hold even in the unweighted case and in the weighted case with polynomial approximation ratios.. No $\omega(n)$ lower bound or $o(m)$ upper bound were known for exact computation. In this paper, we present an $\tilde O(n^{5/4})$-time randomized (Las Vegas) algorithm for exact weighted APSP; this provides the first improvement over the naive $O(m)$-time algorithm when the network is not so sparse. Our result also holds for the case where edge weights are \em asymmetric (a.k.a. the directed case where communication is bidirectional). Our techniques also yield an $\tilde O(n^{3/4}k^{1/2}+n)$-time algorithm for the \em $k$-source shortest paths problem where we want every node to know distances from $k$ sources; this improves Elkin's recent bound~\citeElkin-STOC17 when $k=\tilde \omega(n^{1/4})$.
- Aug 11 2017 cs.CV arXiv:1708.02973v2Visual object tracking is a fundamental and time-critical vision task. Recent years have seen many shallow tracking methods based on real-time pixel-based correlation filters, as well as deep methods that have top performance but need a high-end GPU. In this paper, we learn to improve the speed of deep trackers without losing accuracy. Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features. We formulate the adaptive tracking problem as a decision-making process, and learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network. This significantly reduces the feed-forward cost for easy frames with distinct or slow-moving objects. We train the agent offline in a reinforcement learning fashion, and further demonstrate that learning all deep layers (so as to provide good features for adaptive tracking) can lead to near real-time average tracking speed of 23 fps on a single CPU while achieving state-of-the-art performance. Perhaps most tellingly, our approach provides a 100X speedup for almost 50% of the time, indicating the power of an adaptive approach.
- Aug 10 2017 cs.CV arXiv:1708.02760v1The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.
- Low-cost message passing (MP) algorithm has been recognized as a promising technique for sparse vector recovery. However, the existing MP algorithms either focus on mean square error (MSE) of the value recovery while ignoring the sparsity requirement, or support error rate (SER) of the sparse support (non-zero position) recovery while ignoring its value. A novel low-complexity Bernoulli-Gaussian MP (BGMP) is proposed to perform the value recovery as well as the support recovery. Particularly, in the proposed BGMP, support-related Bernoulli messages and value-related Gaussian messages are jointly processed and assist each other. In addition, a strict lower bound is developed for the MSE of BGMP via the genie-aided minimum mean-square-error (GA-MMSE) method. The GA-MMSE lower bound is shown to be tight in high signal-to-noise ratio. Numerical results are provided to verify the advantage of BGMP in terms of final MSE, SER and convergence speed.
- We present MoodSwipe, a soft keyboard that suggests text messages given the user-specified emotions utilizing the real dialog data. The aim of MoodSwipe is to create a convenient user interface to enjoy the technology of emotion classification and text suggestion, and at the same time to collect labeled data automatically for developing more advanced technologies. While users select the MoodSwipe keyboard, they can type as usual but sense the emotion conveyed by their text and receive suggestions for their message as a benefit. In MoodSwipe, the detected emotions serve as the medium for suggested texts, where viewing the latter is the incentive to correcting the former. We conduct several experiments to show the superiority of the emotion classification models trained on the dialog data, and further to verify good emotion cues are important context for text suggestion.
- Outlier detection is a crucial part of robust evaluation for crowdsourceable assessment of Quality of Experience (QoE) and has attracted much attention in recent years. In this paper, we propose some simple and fast algorithms for outlier detection and robust QoE evaluation based on the nonconvex optimization principle. Several iterative procedures are designed with or without knowing the number of outliers in samples. Theoretical analysis is given to show that such procedures can reach statistically good estimates under mild conditions. Finally, experimental results with simulated and real-world crowdsourcing datasets show that the proposed algorithms could produce similar performance to Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up, without or with a prior knowledge on the sparsity size of outliers, respectively. Therefore the proposed methodology provides us a set of helpful tools for robust QoE evaluation with crowdsourcing data.
- We describe a generalization of the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) which is able to encode prior information that state transitions are more likely between "nearby" states. This is accomplished by defining a similarity function on the state space and scaling transition probabilities by pair-wise similarities, thereby inducing correlations among the transition distributions. We present an augmented data representation of the model as a Markov Jump Process in which: (1) some jump attempts fail, and (2) the probability of success is proportional to the similarity between the source and destination states. This augmentation restores conditional conjugacy and admits a simple Gibbs sampler. We evaluate the model and inference method on a speaker diarization task and a "harmonic parsing" task using four-part chorale data, as well as on several synthetic datasets, achieving favorable comparisons to existing models.
- Although the recent progress in the deep neural network has led to the development of learnable local feature descriptors, there is no explicit answer for estimation of the necessary size of a neural network. Specifically, the local feature is represented in a low dimensional space, so the neural network should have more compact structure. The small networks required for local feature descriptor learning may be sensitive to initial conditions and learning parameters and more likely to become trapped in local minima. In order to address the above problem, we introduce an adaptive pruning Siamese Architecture based on neuron activation to learn local feature descriptors, making the network more computationally efficient with an improved recognition rate over more complex networks. Our experiments demonstrate that our learned local feature descriptors outperform the state-of-art methods in patch matching.
- Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, such as speaker and environment information in signals, for speech recognition. However, studies have suggested to favor a certain type of convolutional operations when building a deep convolutional neural network for speech applications although there has been promising results using different types of convolutional operations. In this work, we study four types of convolutional operations on different input features for speech emotion recognition in order to derive a comprehensive understanding. Since affective behavioral information has been shown to reflect temporally varying of mental state and convolutional operation are applied locally in time, all deep neural networks share a deep recurrent sub-network architecture for further temporal modeling. We present detailed quantitative module-wise performance analysis to gain insights into information flows within the proposed architectures. In particular, we demonstrate the interplay of affective information and the other irrelevant information during the progression from one module to another. Finally we show that all of our deep neural networks provide state-of-the-art performance on the eNTERFACE'05 corpus.
- From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative clinical insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focused on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it related to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.
- Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variability. Given the prognostic and therapeutic implications of Her2 scoring, a more objective method is required. In this paper, we report on a recent automated Her2 scoring contest, held in conjunction with the annual PathSoc meeting held in Nottingham in June 2016, aimed at systematically comparing and advancing the state-of-the-art Artificial Intelligence (AI) based automated methods for Her2 scoring. The contest dataset comprised of digitised whole slide images (WSI) of sections from 86 cases of invasive breast carcinoma stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting algorithms automatically predicted scores of the IHC slides for an unseen subset of the dataset and the predicted scores were compared with the 'ground truth' (a consensus score from at least two experts). We also report on a simple Man vs Machine contest for the scoring of Her2 and show that the automated methods could beat the pathology experts on this contest dataset. This paper presents a benchmark for comparing the performance of automated algorithms for scoring of Her2. It also demonstrates the enormous potential of automated algorithms in assisting the pathologist with objective IHC scoring.
- Class imbalance is a challenging issue in practical classification problems for deep learning models as well as traditional models. Traditionally successful countermeasures such as synthetic over-sampling have had limited success with complex, structured data handled by deep learning models. In this paper, we propose Deep Over-sampling (DOS), a framework for extending the synthetic over-sampling method to exploit the deep feature space acquired by a convolutional neural network (CNN). Its key feature is an explicit, supervised representation learning, for which the training data presents each raw input sample with a synthetic embedding target in the deep feature space, which is sampled from the linear subspace of in-class neighbors. We implement an iterative process of training the CNN and updating the targets, which induces smaller in-class variance among the embeddings, to increase the discriminative power of the deep representation. We present an empirical study using public benchmarks, which shows that the DOS framework not only counteracts class imbalance better than the existing method, but also improves the performance of the CNN in the standard, balanced settings.
- Apr 26 2017 cs.CV arXiv:1704.07754v1Deep learning models such as convolutional neural net- work have been widely used in 3D biomedical segmentation and achieve state-of-the-art performance. However, most of them often adapt a single modality or stack multiple modalities as different input channels. To better leverage the multi- modalities, we propose a deep encoder-decoder structure with cross-modality convolution layers to incorporate different modalities of MRI data. In addition, we exploit convolutional LSTM to model a sequence of 2D slices, and jointly learn the multi-modalities and convolutional LSTM in an end-to-end manner. To avoid converging to the certain labels, we adopt a re-weighting scheme and two-phase training to handle the label imbalance. Experimental results on BRATS-2015 show that our method outperforms state-of-the-art biomedical segmentation approaches.
- Apr 24 2017 cs.NI arXiv:1704.06569v1Domain Name System (DNS), one of the important infrastructure in the Internet, was vulnerable to attacks, for the DNS designer didn't take security issues into consideration at the beginning. The defects of DNS may lead to users' failure of access to the websites, what's worse, users might suffer a huge economic loss. In order to correct the DNS wrong resource records, we propose a Self-Feedback Correction System for DNS (SFCSD), which can find and track a large number of common websites' domain name and IP address correct correspondences to provide users with a real-time auto-updated correct (IP, Domain) binary tuple list. By matching specific strings with SSL, DNS and HTTP traffic passively, filtering with the CDN CNAME and non-homepage URL feature strings, verifying with webpage fingerprint algorithm, SFCSD obtains a large number of highly possibly correct IP addresses to make an active manual correction in the end. Its self-feedback mechanism can expand search range and improve performance. Experiments show that, SFCSD can achieve 94.3% precision and 93.07% recall rate with the optimal threshold selection in the test dataset. It has 8Gbps processing speed stand-alone to find almost 1000 possibly correct (IP, Domain) per day for the each specific string and to correct almost 200.
- Apr 11 2017 cs.SY arXiv:1704.02641v1The paper provides simple formulas of Bayesian filtering for the exact recursive computation of state conditional probability density functions given quantized innovations signal measurements of a linear stochastic system. This is a topic of current interest because the innovations signal should be white and therefore efficient in its use of channel capacity and in the design and optimization of the quantizer. Earlier approaches, which we reexamine and characterize here, have relied on assumptions concerning densities or approximations to yield recursive solutions, which include the sign-of-innovations Kalman filter and a Particle filtering technique. Our approach uses the Kalman filter innovations at the transmitter side and provides a point of comparison for the other methods, since it is based on the Bayesian filter. Computational examples are provided.
- Mar 21 2017 cs.CV arXiv:1703.06256v1The histogram of oriented gradients (HOG) is a widely used feature descriptor in computer vision for the purpose of object detection. In the paper, a modified HOG descriptor is described, it uses a lookup table and the method of integral image to speed up the detection performance by a factor of 5~10. By exploiting the special hardware features of a given platform(e.g. a digital signal processor), further improvement can be made to the HOG descriptor in order to have real-time object detection and tracking.
- Mar 20 2017 cs.CV arXiv:1703.05884v2In this paper, we propose the first higher frame rate video dataset (called Need for Speed - NfS) and benchmark for visual object tracking. The dataset consists of 100 videos (380K frames) captured with now commonly available higher frame rate (240 FPS) cameras from real world scenarios. All frames are annotated with axis aligned bounding boxes and all sequences are manually labelled with nine visual attributes - such as occlusion, fast motion, background clutter, etc. Our benchmark provides an extensive evaluation of many recent and state-of-the-art trackers on higher frame rate sequences. We ranked each of these trackers according to their tracking accuracy and real-time performance. One of our surprising conclusions is that at higher frame rates, simple trackers such as correlation filters outperform complex methods based on deep networks. This suggests that for practical applications (such as in robotics or embedded vision), one needs to carefully tradeoff bandwidth constraints associated with higher frame rate acquisition, computational costs of real-time analysis, and the required application accuracy. Our dataset and benchmark allows for the first time (to our knowledge) systematic exploration of such issues, and will be made available to allow for further research in this space.
- Feb 23 2017 cs.CY arXiv:1702.06830v3Electroencephalography (EEG) signal based intent recognition has recently attracted much attention in both academia and industries, due to helping the elderly or motor-disabled people controlling smart devices to communicate with outer world. However, the utilization of EEG signals is challenged by low accuracy, arduous and time- consuming feature extraction. This paper proposes a 7-layer deep learning model to classify raw EEG signals with the aim of recognizing subjects' intents, to avoid the time consumed in pre-processing and feature extraction. The hyper-parameters are selected by an Orthogonal Array experiment method for efficiency. Our model is applied to an open EEG dataset provided by PhysioNet and achieves the accuracy of 0.9553 on the intent recognition. The applicability of our proposed model is further demonstrated by two use cases of smart living (assisted living with robotics and home automation).
- Feb 13 2017 cs.ET arXiv:1702.03216v1Optical interconnect is a potential solution to attain the large bandwidth on-chip communications needed in high performance computers in a low power and low cost manner. Mode-division multiplexing (MDM) is an emerging technology that scales the capacity of a single wavelength carrier by the number of modes in a multimode waveguide, and is attractive as a cost-effective means for high bandwidth density on-chip communications. Advanced modulation formats with high spectral efficiency in MDM networks can further improve the data rates of the optical link. Here, we demonstrate an intra-chip MDM communications link employing advanced modulation formats with two waveguide modes. We demonstrate a compact single wavelength carrier link that is expected to support 2x100 Gb/s mode multiplexed capacity. The network comprised integrated microring modulators at the transmitter, mode multiplexers, multimode waveguide interconnect, mode demultiplexers and integrated germanium on silicon photodetectors. Each of the mode channels achieves 100 Gb/s line rate with 84 Gb/s net payload data rate at 7% overhead for hard-decision forward error correction (HD-FEC) in the OFDM/16-QAM signal transmission.
- Instant messaging is one of the major channels of computer mediated communication. However, humans are known to be very limited in understanding others' emotions via text-based communication. Aiming on introducing emotion sensing technologies to instant messaging, we developed EmotionPush, a system that automatically detects the emotions of the messages end-users received on Facebook Messenger and provides colored cues on their smartphones accordingly. We conducted a deployment study with 20 participants during a time span of two weeks. In this paper, we revealed five challenges, along with examples, that we observed in our study based on both user's feedback and chat logs, including (i)the continuum of emotions, (ii)multi-user conversations, (iii)different dynamics between different users, (iv)misclassification of emotions and (v)unconventional content. We believe this discussion will benefit the future exploration of affective computing for instant messaging, and also shed light on research of conversational emotion sensing.
- In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically semiseparable (HSS) matrices. HSS is an important type of rank-structured matrices.Most time of the DC algorithm is cost by computing the eigenvectors via the matrix-matrix multiplications (MMM). In our parallel hybrid DC (PHDC) algorithm, MMM is accelerated by using the HSS matrix techniques when the intermediate matrix is large. All the HSS algorithms are done via the package STRUMPACK. PHDC has been tested by using many different matrices. Compared with the DC implementation in MKL, PHDC can be faster for some matrices with few deflations when using hundreds of processes. However, the gains decrease as the number of processes increases. The comparisons of PHDC with ELPA (the Eigenvalue soLvers for Petascale Applications library) are similar. PHDC is usually slower than MKL and ELPA when using 300 or more processes on Tianhe-2 supercomputer.
- We propose an iterative channel estimation algorithm based on the Least Square Estimation (LSE) and Sparse Message Passing (SMP) algorithm for the Millimeter Wave (mmWave) MIMO systems. The channel coefficients of the mmWave MIMO are approximately modeled as a Bernoulli-Gaussian distribution since there are relatively fewer paths in the mmWave channel, i.e., the channel matrix is sparse and only has a few non-zero entries. By leveraging the advantage of sparseness, we proposed an algorithm that iteratively detects the exact location and value of non-zero entries of the sparse channel matrix. The SMP is used to detect the exact location of non-zero entries of the channel matrix, while the LSE is used for estimating its value at each iteration. We also analyze the Cramer-Rao Lower Bound (CLRB), and show that the proposed algorithm is a minimum variance unbiased estimator. Furthermore, we employ the Gaussian approximation for message densities under density evolution to simplify the analysis of the algorithm, which provides a simple method to predict the performance of the proposed algorithm. Numerical experiments show that the proposed algorithm has much better performance than the existing sparse estimators, especially when the channel is sparse. In addition, our proposed algorithm converges to the CRLB of the genie-aided estimation of sparse channels in just 5 turbo iterations.
- The emerging marketplace for online free services in which service providers earn revenue from using consumer data in direct and indirect ways has lead to significant privacy concerns. This leads to the following question: can the online marketplace sustain multiple service providers (SPs) that offer privacy-differentiated free services? This paper studies the problem of market segmentation for the free online services market by augmenting the classical Hotelling model for market segmentation analysis to include the fact that for the free services market, a consumer values service not in monetized terms but by its quality of service (QoS) and that the differentiator of services is not product price but the privacy risk advertised by a SP. Building upon the Hotelling model, this paper presents a parametrized model for SP profit and consumer valuation of service for both the two- and multi-SP problems to show that: (i) when consumers place a high value on privacy, it leads to a lower use of private data by SPs (i.e., their advertised privacy risk reduces), and thus, SPs compete on the QoS; (ii) SPs that are capable of differentiating on services that do not directly target consumers gain larger market share; and (iii) a higher valuation of privacy by consumers forces SPs with smaller untargeted revenue to offer lower privacy risk to attract more consumers. The work also illustrates the market segmentation problem for more than two SPs and highlights the instability of such markets.
- Existing deep embedding methods in vision tasks are capable of learning a compact Euclidean space from images, where Euclidean distances correspond to a similarity metric. To make learning more effective and efficient, hard sample mining is usually employed, with samples identified through computing the Euclidean feature distance. However, the global Euclidean distance cannot faithfully characterize the true feature similarity in a complex visual feature space, where the intraclass distance in a high-density region may be larger than the interclass distance in low-density regions. In this paper, we introduce a Position-Dependent Deep Metric (PDDM) unit, which is capable of learning a similarity metric adaptive to local feature structure. The metric can be used to select genuinely hard samples in a local neighborhood to guide the deep embedding learning in an online and robust manner. The new layer is appealing in that it is pluggable to any convolutional networks and is trained end-to-end. Our local similarity-aware feature embedding not only demonstrates faster convergence and boosted performance on two complex image retrieval datasets, its large margin nature also leads to superior generalization results under the large and open set scenarios of transfer learning and zero-shot learning on ImageNet 2010 and ImageNet-10K datasets.
- In this paper, we propose a novel channel estimation algorithm based on the Least Square Estimation (LSE) and Sparse Message Passing algorithm (SMP), which is of special interest for Millimeter Wave (mmWave) systems, since this algorithm can leverage the inherent sparseness of the mmWave channel. Our proposed algorithm will iteratively detect exact the location and the value of non-zero entries of sparse channel vector without its prior knowledge of distribution. The SMP is used to detect exact the location of non-zero entries of the channel vector, while the LSE is used for estimating its value at each iteration. Then, the analysis of the Cramer-Rao Lower Bound (CRLB) of our proposed algorithm is given. Numerical experiments show that our proposed algorithm has much better performance than the existing sparse estimators (e.g. LASSO), especially when mmWave systems have massive antennas at both the transmitters and receivers. In addition, we also find that our proposed algorithm converges to the CRLB of the genie-aided estimation of sparse channels in just a few turbo iterations.
- Aug 30 2016 cs.CL arXiv:1608.07738v2In Distributional Semantic Models (DSMs), Vector Cosine is widely used to estimate similarity between word vectors, although this measure was noticed to suffer from several shortcomings. The recent literature has proposed other methods which attempt to mitigate such biases. In this paper, we intend to investigate APSyn, a measure that computes the extent of the intersection between the most associated contexts of two target words, weighting it by context relevance. We evaluated this metric in a similarity estimation task on several popular test sets, and our results show that APSyn is in fact highly competitive, even with respect to the results reported in the literature for word embeddings. On top of it, APSyn addresses some of the weaknesses of Vector Cosine, performing well also on genuine similarity estimation.
- Aug 17 2016 cs.NI arXiv:1608.04411v1VPN service providers (VSP) and IP-VPN customers have traditionally maintained service demarcation boundaries between their routing and signaling entities. This has resulted in the VPNs viewing the VSP network as an opaque entity and therefore limiting any meaningful interaction between the VSP and the VPNs. A key challenge is to expose each VPN to information about available network resources through an abstraction (TA) [1] which is both accurate and fair. In [2] we proposed three decentralized schemes assuming that all the border nodes performing the abstraction have access to the entire core network topology. This assumption likely leads to over- or under-subscription. In this paper we develop centralized schemes to partition the core network capacities, and assign each partition to a specific VPN for applying the decentralized abstraction schemes presented in [2]. First, we present two schemes based on the maximum concurrent flow and the maximum multicommodity flow (MMCF) formulations. We then propose approaches to address the fairness concerns that arise when MMCF formulation is used. We present results based on extensive simulations on several topologies, and provide a comparative evaluation of the different schemes in terms of abstraction efficiency, fairness to VPNs and call performance characteristics achieved.
- Jul 21 2016 cs.CV arXiv:1607.05781v1In this paper, we develop a new approach of spatially supervised recurrent convolutional neural networks for visual object tracking. Our recurrent convolutional network exploits the history of locations as well as the distinctive visual features learned by the deep neural networks. Inspired by recent bounding box regression methods for object detection, we study the regression capability of Long Short-Term Memory (LSTM) in the temporal domain, and propose to concatenate high-level visual features produced by convolutional networks with region information. In contrast to existing deep learning based trackers that use binary classification for region candidates, we use regression for direct prediction of the tracking locations both at the convolutional layer and at the recurrent unit. Our extensive experimental results and performance comparison with state-of-the-art tracking methods on challenging benchmark video tracking datasets shows that our tracker is more accurate and robust while maintaining low computational cost. For most test video sequences, our method achieves the best tracking performance, often outperforms the second best by a large margin.
- Energy harvesting communication has raised great research interests due to its wide applications and feasibility of commercialization. In this paper, we investigate the multiuser energy diversity. Specifically, we reveal the throughput gain coming from the increase of total available energy harvested over time/space and from the combined dynamics of batteries. Considering both centralized and distributed access schemes, the scaling of the average throughput over the number of transmitters is studied, along with the scaling of corresponding available energy in the batteries.
- Several studies on sentence processing suggest that the mental lexicon keeps track of the mutual expectations between words. Current DSMs, however, represent context words as separate features, thereby loosing important information for word expectations, such as word interrelations. In this paper, we present a DSM that addresses this issue by defining verb contexts as joint syntactic dependencies. We test our representation in a verb similarity task on two datasets, showing that joint contexts achieve performances comparable to single dependencies or even better. Moreover, they are able to overcome the data sparsity problem of joint feature spaces, in spite of the limited size of our training corpus.
- This paper considers a low-complexity Gaussian Message Passing Iterative Detection (GMPID) algorithm for Multiple-Input Multiple-Output systems with Non-Orthogonal Multiple Access (MIMO-NOMA), in which a base station with $N_r$ antennas serves $N_u$ sources simultaneously. Both $N_u$ and $N_r$ are very large numbers and we consider the cases that $N_u>N_r$. The GMPID is based on a fully connected loopy graph, which is well understood to be not convergent in some cases. The large-scale property of the MIMO-NOMA is used to simplify the convergence analysis. Firstly, we prove that the variances of the GMPID definitely converge to that of Minimum Mean Square Error (MMSE) detection. Secondly, two sufficient conditions that the means of the GMPID converge to a higher MSE than that of the MMSE detection are proposed. However, the means of the GMPID may still not converge when $ N_u/N_r< (\sqrt{2}-1)^{-2}$. Therefore, a new convergent SA-GMPID is proposed, which converges to the MMSE detection for any $N_u> N_r$ with a faster convergence speed. Finally, numerical results are provided to verify the validity of the proposed theoretical results.
- This paper studies the throughput maximization problem for a three-node relay channel with non-ideal circuit power. In particular, the relay operates in a half-duplex manner, and the decode-and-forward (DF) relaying scheme is adopted. Considering the extra power consumption by the circuits, the optimal power allocation to maximize the throughput of the considered system over an infinite time horizon is investigated. First, two special scenarios, i.e., the direct link transmission (only use the direct link to transmit) and the relay assisted transmission (the source and the relay transmit with equal probability), are studied, and the corresponding optimal power allocations are obtained. By transforming two non-convex problems into quasiconcave ones, the closed-form solutions show that the source and the relay transmit with certain probability, which is determined by the average power budgets, circuit power consumptions, and channel gains. Next, based on the above results, the optimal power allocation for both the cases with and without direct link is derived, which is shown to be a mixed transmission scheme between the direct link transmission and the relay assisted transmission.
- Apr 26 2016 cs.CV arXiv:1604.06877v1The prevalent scene text detection approach follows four sequential steps comprising character candidate detection, false character candidate removal, text line extraction, and text line verification. However, errors occur and accumulate throughout each of these sequential steps which often lead to low detection performance. To address these issues, we propose a unified scene text detection system, namely Text Flow, by utilizing the minimum cost (min-cost) flow network model. With character candidates detected by cascade boosting, the min-cost flow network model integrates the last three sequential steps into a single process which solves the error accumulation problem at both character level and text line level effectively. The proposed technique has been tested on three public datasets, i.e, ICDAR2011 dataset, ICDAR2013 dataset and a multilingual dataset and it outperforms the state-of-the-art methods on all three datasets with much higher recall and F-score. The good performance on the multilingual dataset shows that the proposed technique can be used for the detection of texts in different languages.
- In virtualized data centers, consolidation of Virtual Machines (VMs) on minimizing the number of total physical machines (PMs) has been recognized as a very efficient approach. This paper considers the energy-efficient consolidation of VMs in a Cloud Data center. Concentrating on CPU-intensive applications, the objective is to schedule all requests non-preemptively, subjecting to constraints of PM capacities and running time interval spans, such that the total energy consumption of all PMs is minimized (called MinTE for abbreviation). The MinTE problem is NP-complete in general. We propose a self-adaptive approached called SAVE. The approach makes decisions of the assignment and migration of VMs by probabilistic processes and is based exclusively on local information, therefore it is very simple to implement. Both simulation and real environment test show that our proposed method SAVE can reduce energy consumption about 30% against VMWare DRS and 10-20% against EcoCloud on average.
- While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification model
- Apr 06 2016 cs.OS arXiv:1604.01378v5This paper presents the "isolate first, then share" OS model in which the processor cores, memory, and devices are divided up between disparate OS instances and a new abstraction, subOS, is proposed to encapsulate an OS instance that can be created, destroyed, and resized on-the-fly. The intuition is that this avoids shared kernel states between applications, which in turn reduces performance loss caused by contention. We decompose the OS into the supervisor and several subOSes running at the same privilege level: a subOS directly manages physical resources, while the supervisor can create, destroy, resize a subOS on-the-fly. The supervisor and subOSes have few state sharing, but fast inter-subOS communication mechanisms are provided on demand. We present the first implementation, RainForest, which supports unmodified Linux binaries. Our comprehensive evaluation shows RainForest outperforms Linux with four different kernels, LXC, and Xen in terms of worst-case and average performance most of time when running a large number of benchmarks. The source code is available soon.
- Mar 31 2016 cs.CL arXiv:1603.09054v1In this paper, we claim that vector cosine, which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9.00% and +17.98%, depending on the number of chosen top contexts.
- Mar 30 2016 cs.CL arXiv:1603.08701v1In this paper, we claim that Vector Cosine, which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that, independently of the adopted parameters, outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.
- Mar 30 2016 cs.CL arXiv:1603.08705v1In this paper, we describe ROOT13, a supervised system for the classification of hypernyms, co-hyponyms and random words. The system relies on a Random Forest algorithm and 13 unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT13 achieves an F1 score of 88.3%, against a baseline of 57.6% (vector cosine). When the classification is binary, ROOT13 achieves the following results: hypernyms-co-hyponyms (93.4% vs. 60.2%), hypernymsrandom (92.3% vs. 65.5%) and co-hyponyms-random (97.3% vs. 81.5%). Our results are competitive with stateof-the-art models.
- Mar 30 2016 cs.CL arXiv:1603.08702v1ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% (vector cosine). When the classification is binary, ROOT9 achieves the following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%, hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In order to compare the performance with the state-of-the-art, we have also evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system learns the semantic relation or it simply learns the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely, even though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to drastically reduce this bias.
- Mar 24 2016 cs.DM arXiv:1603.07168v1We are given a bipartite graph $G = (A \cup B, E)$ where each vertex has a preference list ranking its neighbors: in particular, every $a \in A$ ranks its neighbors in a strict order of preference, whereas the preference lists of $b \in B$ may contain ties. A matching $M$ is popular if there is no matching $M'$ such that the number of vertices that prefer $M'$ to $M$ exceeds the number of vertices that prefer $M$ to~$M'$. We show that the problem of deciding whether $G$ admits a popular matching or not is NP-hard. This is the case even when every $b \in B$ either has a strict preference list or puts all its neighbors into a single tie. In contrast, we show that the problem becomes polynomially solvable in the case when each $b \in B$ puts all its neighbors into a single tie. That is, all neighbors of $b$ are tied in $b$'s list and $b$ desires to be matched to any of them. Our main result is an $O(n^2)$ algorithm (where $n = |A \cup B|$) for the popular matching problem in this model. Note that this model is quite different from the model where vertices in $B$ have no preferences and do not care whether they are matched or not.
- The evolution of communication technology and the proliferation of electronic devices have rendered adversaries powerful means for targeted attacks via all sorts of accessible resources. In particular, owing to the intrinsic interdependency and ubiquitous connectivity of modern communication systems, adversaries can devise malware that propagates through intermediate hosts to approach the target, which we refer to as transmissive attacks. Inspired by biology, the transmission pattern of such an attack in the digital space much resembles the spread of an epidemic in real life. This paper elaborates transmissive attacks, summarizes the utility of epidemic models in communication systems, and draws connections between transmissive attacks and epidemic models. Simulations, experiments, and ongoing research challenges on transmissive attacks are also addressed.
- Feb 04 2016 cs.CV arXiv:1602.01197v1Data imbalance is common in many vision tasks where one or more classes are rare. Without addressing this issue conventional methods tend to be biased toward the majority class with poor predictive accuracy for the minority class. These methods further deteriorate on small, imbalanced data that has a large degree of class overlap. In this study, we propose a novel discriminative sparse neighbor approximation (DSNA) method to ameliorate the effect of class-imbalance during prediction. Specifically, given a test sample, we first traverse it through a cost-sensitive decision forest to collect a good subset of training examples in its local neighborhood. Then we generate from this subset several class-discriminating but overlapping clusters and model each as an affine subspace. From these subspaces, the proposed DSNA iteratively seeks an optimal approximation of the test sample and outputs an unbiased prediction. We show that our method not only effectively mitigates the imbalance issue, but also allows the prediction to extrapolate to unseen data. The latter capability is crucial for achieving accurate prediction on small dataset with limited samples. The proposed imbalanced learning method can be applied to both classification and regression tasks at a wide range of imbalance levels. It significantly outperforms the state-of-the-art methods that do not possess an imbalance handling mechanism, and is found to perform comparably or even better than recent deep learning methods by using hand-crafted features only.
- Machine learning algorithms, in conjunction with user data, hold the promise of revolutionizing the way we interact with our phones, and indeed their widespread adoption in the design of apps bear testimony to this promise. However, currently, the computationally expensive segments of the learning pipeline, such as feature extraction and model training, are offloaded to the cloud, resulting in an over-reliance on the network and under-utilization of computing resources available on mobile platforms. In this paper, we show that by combining the computing power distributed over a number of phones, judicious optimization choices, and contextual information it is possible to execute the end-to-end pipeline entirely on the phones at the edge of the network, efficiently. We also show that by harnessing the power of this combination, it is possible to execute a computationally expensive pipeline at near real-time. To demonstrate our approach, we implement an end-to-end image-processing pipeline -- that includes feature extraction, vocabulary learning, vectorization, and image clustering -- on a set of mobile phones. Our results show a 75% improvement over the standard, full pipeline implementation running on the phones without modification -- reducing the time to one minute under certain conditions. We believe that this result is a promising indication that fully distributed, infrastructure-less computing is possible on networks of mobile phones; enabling a new class of mobile applications that are less reliant on the cloud.
- Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese. We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents. Practical evaluations were conducted with texts that were extracted from more than 220 volumes of local gazetteers (Difangzhi). Difangzhi is a huge and the single most important collection that contains information about officers who served in local government in Chinese history. Our methods performed very well on these realistic tests. Thousands of names and addresses were identified from the texts. A good portion of the extracted names match the biographical information currently recorded in the China Biographical Database (CBDB) of Harvard University, and many others can be verified by historians and will become as new additions to CBDB.
- Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centralized estimator. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. In finite sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the mean squared errors. A potential application of the one-step approach is that one can use multiple machines to speed up large scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth.
- Sep 29 2015 cs.NI arXiv:1509.08420v1The Pacific Rim Application and Grid Middleware Assembly (PRAGMA) is an international community of researchers that actively collaborate to address problems and challenges of common interest in eScience. The PRAGMA Experimental Network Testbed (PRAGMA-ENT) was established with the goal of constructing an international software-defined network (SDN) testbed to offer the necessary networking support to the PRAGMA cyberinfrastructure. PRAGMA-ENT is isolated, and PRAGMA researchers have complete freedom to access network resources to develop, experiment, and evaluate new ideas without the concerns of interfering with production networks. In the first phase, PRAGMA-ENT focused on establishing an international L2 backbone. With support from the Florida Lambda Rail (FLR), Internet2, PacificWave, JGN-X, and TWAREN, PRAGMA-ENT backbone connects Open\-Flow-enabled switches at University of Florida (UF), University of California San Diego (UCSD), Nara Institute of Science and Technology (NAIST, Japan), Osaka University (Japan), National Institute of Advanced Industrial Science and Technology (AIST, Japan), and National Center for High-Performance Computing (Taiwan). The second phase of PRAGMA-ENT consisted of evaluation of technologies for the control plane that enables multiple experiments (i.e., OpenFlow controllers) to co-exist. Preliminary experiments with FlowVisor revealed some limitations leading to the development of a new approach, called AutoVFlow. This paper will share our experience in the establishment of PRAGMA-ENT backbone (with international L2 links), its current status, and control plane plans. Discussion on preliminary application ideas, including optimization of routing control; multipath routing control; and remote visualization will also be discussed.
- We propose a virtualization architecture for NoC-based reconfigurable systems. The motivation of this work is to develop a service-oriented architecture that includes Partial Reconfigurable Region as a Service (PRRaaS) and Processing Element as a Service (PEaaS) for software applications. According to the requirements of software applications, new PEs can be created on-demand by (re)configuring the logic resource of the PRRs in the FPGA, while the configured PEs can also be virtualized to support multiple application tasks at the same time. As a result, such a two-level virtualization mechanism, including the gate-level virtualization and the PE-level virtualization, enables an SoC to be dynamically adapted to changing application requirements. Therefore, more software applications can be performed, and system performance can be further enhanced.
- Businesses (retailers) often wish to offer personalized advertisements (coupons) to individuals (consumers), but run the risk of strong reactions from consumers who want a customized shopping experience but feel their privacy has been violated. Existing models for privacy such as differential privacy or information theory try to quantify privacy risk but do not capture the subjective experience and heterogeneous expression of privacy-sensitivity. We propose a Markov decision process (MDP) model to capture (i) different consumer privacy sensitivities via a time-varying state; (ii) different coupon types (action set) for the retailer; and (iii) the action-and-state-dependent cost for perceived privacy violations. For the simple case with two states ("Normal" and "Alerted"), two coupons (targeted and untargeted) model, and consumer behavior statistics known to the retailer, we show that a stationary threshold-based policy is the optimal coupon-offering strategy for a retailer that wishes to minimize its expected discounted cost. The threshold is a function of all model parameters; the retailer offers a targeted coupon if their belief that the consumer is in the "Alerted" state is below the threshold. We extend this two-state model to consumers with multiple privacy-sensitivity states as well as coupon-dependent state transition probabilities. Furthermore, we study the case with imperfect (noisy) cost feedback from consumers and uncertain initial belief state.
- Aug 06 2015 cs.SI physics.soc-ph arXiv:1508.00987v1Identifying the most influential individuals can provide invaluable help in developing and deploying effective viral marketing strategies. Previous studies mainly focus on designing efficient algorithms or heuristics to find top-K influential nodes on a given static social network. While, as a matter of fact, real-world social networks keep evolving over time and a recalculation upon the changed network inevitably leads to a long running time, significantly affecting the efficiency. In this paper, we observe from real-world traces that the evolution of social network follows the preferential attachment rule and the influential nodes are mainly selected from high-degree nodes. Such observations shed light on the design of IncInf, an incremental approach that can efficiently locate the top-K influential individuals in evolving social networks based on previous information instead of calculation from scratch. In particular, IncInf quantitatively analyzes the influence spread changes of nodes by localizing the impact of topology evolution to only local regions, and a pruning strategy is further proposed to effectively narrow the search space into nodes experiencing major increases or with high degrees. We carried out extensive experiments on real-world dynamic social networks including Facebook, NetHEPT, and Flickr. Experimental results demonstrate that, compared with the state-of-the-art static heuristic, IncInf achieves as much as 21X speedup in execution time while maintaining matching performance in terms of influence spread.
- Jul 28 2015 cs.DS arXiv:1507.07396v3Makespan minimization in restricted assignment $(R|p_{ij}\in \{p_j, \infty\}|C_{\max})$ is a classical problem in the field of machine scheduling. In a landmark paper in 1990 [8], Lenstra, Shmoys, and Tardos gave a 2-approximation algorithm and proved that the problem cannot be approximated within 1.5 unless P=NP. The upper and lower bounds of the problem have been essentially unimproved in the intervening 25 years, despite several remarkable successful attempts in some special cases of the problem [2,4,12] recently. In this paper, we consider a special case called graph-balancing with light hyper edges, where heavy jobs can be assigned to at most two machines while light jobs can be assigned to any number of machines. For this case, we present algorithms with approximation ratios strictly better than 2. Specifically, Two job sizes: Suppose that light jobs have weight $w$ and heavy jobs have weight $W$, and $w < W$. We give a $1.5$-approximation algorithm (note that the current 1.5 lower bound is established in an even more restrictive setting [1,3]). Indeed, depending on the specific values of $w$ and $W$, sometimes our algorithm guarantees sub-1.5 approximation ratios. Arbitrary job sizes: Suppose that $W$ is the largest given weight, heavy jobs have weights in the range of $(\beta W, W]$, where $4/7\leq \beta < 1$, and light jobs have weights in the range of $(0,\beta W]$. We present a $(5/3+\beta/3)$-approximation algorithm. Our algorithms are purely combinatorial, without the need of solving a linear program as required in most other known approaches.
- Energy harvesting (EH) based communication has raised great research interests due to its wide application and the feasibility of commercialization. In this paper, we consider wireless communications with EH constraints at the transmitter. First, for delay-tolerant traffic, we investigate the long-term average throughput maximization problem and analytically compare the throughput performance against that of a system supported by conventional power supplies. Second, for delay-sensitive traffic, we analyze the outage probability by studying its asymptotic behavior in the high energy arrival rate regime, where the new concept of energy diversity is formally introduced. Moreover, we show that the speed of outage probability approaching zero, termed energy diversity gain, varies under different power supply models.
- Jun 25 2015 cs.CV arXiv:1506.07310v4Face Recognition has been studied for many decades. As opposed to traditional hand-crafted features such as LBP and HOG, much more sophisticated features can be learned automatically by deep learning methods in a data-driven way. In this paper, we propose a two-stage approach that combines a multi-patch deep CNN and deep metric learning, which extracts low dimensional but very discriminative features for face verification and recognition. Experiments show that this method outperforms other state-of-the-art methods on LFW dataset, achieving 99.77% pair-wise verification accuracy and significantly better accuracy under other two more practical protocols. This paper also discusses the importance of data size and the number of patches, showing a clear path to practical high-performance face recognition systems in real world.
- This paper considers a heterogeneous ad hoc network with multiple transmitter-receiver pairs, in which all transmitters are capable of harvesting renewable energy from the environment and compete for one shared channel by random access. In particular, we focus on two different scenarios: the constant energy harvesting (EH) rate model where the EH rate remains constant within the time of interest and the i.i.d. EH rate model where the EH rates are independent and identically distributed across different contention slots. To quantify the roles of both the energy state information (ESI) and the channel state information (CSI), a distributed opportunistic scheduling (DOS) framework with two-stage probing and save-then-transmit energy utilization is proposed. Then, the optimal throughput and the optimal scheduling strategy are obtained via one-dimension search, i.e., an iterative algorithm consisting of the following two steps in each iteration: First, assuming that the stored energy level at each transmitter is stationary with a given distribution, the expected throughput maximization problem is formulated as an optimal stopping problem, whose solution is proved to exist and then derived for both models; second, for a fixed stopping rule, the energy level at each transmitter is shown to be stationary and an efficient iterative algorithm is proposed to compute its steady-state distribution. Finally, we validate our analysis by numerical results and quantify the throughput gain compared with the best-effort delivery scheme.
- Feb 12 2015 cs.CV arXiv:1502.03240v3Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. To this end, we formulate mean-field approximate inference for the Conditional Random Fields with Gaussian pairwise potentials as Recurrent Neural Networks. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation. We apply the proposed method to the problem of semantic image segmentation, obtaining top results on the challenging Pascal VOC 2012 segmentation benchmark.
- Feb 03 2015 cs.DS arXiv:1502.00447v2The traveling salesman problem (TSP) is one of the most challenging NP-hard problems. It has widely applications in various disciplines such as physics, biology, computer science and so forth. The best known approximation algorithm for Symmetric TSP (STSP) whose cost matrix satisfies the triangle inequality (called $\triangle$STSP) is Christofides algorithm which was proposed in 1976 and is a $\frac{3}{2}$-approximation. Since then no proved improvement is made and improving upon this bound is a fundamental open question in combinatorial optimization. In this paper, for the first time, we propose Truncated Generalized Beta distribution (TGB) for the probability distribution of optimal tour lengths in a TSP. We then introduce an iterative TGB approach to obtain quality-proved near optimal approximation, i.e., (1+$\frac{1}{2}(\frac{\alpha+1}{\alpha+2})^{K-1}$)-approximation where $K$ is the number of iterations in TGB and $\alpha (>>1)$ is the shape parameters of TGB. The result can approach the true optimum as $K$ increases.
- A function computation problem in directed acyclic networks has been considered in the literature, where a sink node wants to compute a target function with the inputs generated at multiple source nodes. The network links are error-free but capacity-limited, and the intermediate network nodes perform network coding. The target function is required to be computed with zero error. The computing rate of a network code is measured by the average number of times that the target function can be computed for one use of the network, i.e., each link in the network is used at most once. In the papers [1], [2], two cut-set bounds were proposed on the computing rate. However, we show in this paper that these bounds are not valid for general network function computation problems. We analyze the arguments that lead to the invalidity of these bounds and fix the issue with a new cut-set bound, where a new equivalence relation associated with the inputs of the target function is used. Our bound is qualified for general target functions and network topologies. We also show that our bound is tight for some special cases where the computing capacity is known. Moreover, some results in [11], [12] were proved using the invalid upper bound in [1] and hence their correctness needs further justification. We also justify their validity in the paper.
- The study of degrees of freedom (DoF) of multiuser channels has led to the development of important interference managing schemes, such as interference alignment (IA) and interference neutralization. However, while the integer DoF have been widely studied in literatures, non-integer DoF are much less addressed, especially for channels with less variety. In this paper, we study the non-integer DoF of the time-invariant multiple-input multiple-output (MIMO) interfering multiple access channel (IMAC) in the simple setting of two cells, $K$ users per cell, and $M$ antennas at all nodes. We provide the exact characterization of the maximum achievable sum DoF under the constraint of using linear interference alignment (IA) scheme with symbol extension. Our results indicate that the integer sum DoF characterization $2MK/(K+1)$ achieved by the Suh-Ho-Tse scheme can be extended to the non-integer case only when $K \leq M^2$ for the circularly-symmetric-signaling systems and $K \leq 2M^2$ for the asymmetric-complex-signaling systems. These results are further extended to the time-invariant parallel MIMO IMAC with independent subchannels.
- A heterogeneous system, where small networks (e.g., small cell or WiFi) boost the system throughput under the umbrella of a large network (e.g., large cell), is a promising architecture for the 5G wireless communication networks, where green and sustainable communication is also a key aspect. Renewable energy based communication via energy harvesting (EH) devices is one of such green technology candidates. In this paper, we study an uplink transmission scenario under a heterogeneous network hierarchy, where each mobile user (MU) is powered by a sustainable energy supply, capable of both deterministic access to the large network via one private channel, and dynamic access to a small network with certain probability via one common channel shared by multiple MUs. Considering a general EH model, i.e., energy arrivals are time-correlated, we study an opportunistic transmission scheme and aim to maximize the average throughput for each MU, which jointly exploits the statistics and current states of the private channel, common channel, battery level, and EH rate. Applying a simple yet efficient "save-then-transmit" scheme, the throughput maximization problem is cast as a "rate-of-return" optimal stopping problem. The optimal stopping rule is proved to has a time-dependent threshold-based structure for the case with general Markovian system dynamics, and degrades to a pure threshold policy for the case with independent and identically distributed system dynamics. As performance benchmarks, the optimal power allocation scheme with conventional power supplies is also examined. Finally, numerical results are presented, and a new concept of "EH diversity" is discussed.