results for au:Chen_L in:cs

- Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle the problem of static image question, which may be ineffectively for video question answering due to the insufficiency of modeling the temporal dynamics of video contents. In this paper, we study the problem of video question answering by modeling its temporal dynamics with frame-level attention mechanism. We propose the attribute-augmented attention network learning framework that enables the joint frame-level attribute detection and unified video representation learning for video question answering. We then incorporate the multi-step reasoning process for our proposed attention network to further improve the performance. We construct a large-scale video question answering dataset. We conduct the experiments on both multiple-choice and open-ended video question answering tasks to show the effectiveness of the proposed method.
- Jul 20 2017 cs.CV arXiv:1707.05847v1Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.
- Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.
- Submodular functions are a broad class of set functions, which naturally arise in diverse areas. Many algorithms have been suggested for the maximization of these functions. Unfortunately, once the function deviates from submodularity, the known algorithms may perform arbitrarily poorly. Amending this issue, by obtaining approximation results for set functions generalizing submodular functions, has been the focus of recent works. One such class, known as weakly submodular functions, has received a lot of attention. A key result proved by Das and Kempe (2011) showed that the approximation ratio of the greedy algorithm for weakly submodular maximization subject to a cardinality constraint degrades smoothly with the distance from submodularity. However, no results have been obtained for maximization subject to constraints beyond cardinality. In particular, it is not known whether the greedy algorithm achieves any non-trivial approximation ratio for such constraints. In this paper, we prove that a randomized version of the greedy algorithm (previously used by Buchbinder et al. (2014) for a different problem) achieves an approximation ratio of $(1 + 1/\gamma)^{-2}$ for the maximization of a weakly submodular function subject to a general matroid constraint, where $\gamma$ is a parameter measuring the distance of the function from submodularity. Moreover, we also experimentally compare the performance of this version of the greedy algorithm on real world problems against natural benchmarks, and show that the algorithm we study performs well also in practice. To the best of our knowledge, this is the first algorithm with a non-trivial approximation guarantee for maximizing a weakly submodular function subject to a constraint other than the simple cardinality constraint. In particular, it is the first algorithm with such a guarantee for the important and broad class of matroid constraints.
- Demand shaping is a promising way to mitigate the wireless cellular capacity shortfall in the presence of ever-increasing wireless data demand. In this paper, we formulate demand shaping as an optimization problem that minimizes the variation in aggregate traffic. We design a distributed and randomized offline demand shaping algorithm under complete traffic information and prove its almost surely convergence. We further consider a more realistic setting where the traffic information is incomplete but the future traffic can be predicted to a certain degree of accuracy. We design an online demand shaping algorithm that updates the schedules of deferrable applications (DAs) each time when new information is available, based on solving at each timeslot an optimization problem over a shrinking horizon from the current time to the end of the day. We compare the performance of the online algorithm against the optimal offline algorithm, and provide numerical examples to complement the theoretical analysis.
- Jul 11 2017 cs.LG arXiv:1707.02727v1In machine learning, nonconvex optimization problems with multiple local optimums are often encountered. Graduated Optimization Algorithm (GOA) is a popular heuristic method to obtain global optimums of nonconvex problems through progressively minimizing a series of convex approximations to the nonconvex problems more and more accurate. Recently, such an algorithm GradOpt based on GOA is proposed with amazing theoretical and experimental results, but it mainly studies the problem which consists of one nonconvex part. This paper aims to find the global solution of a nonconvex objective with a convex part plus a nonconvex part based on GOA. By graduating approximating non-convex part of the problem and minimizing them with the Stochastic Variance Reduced Gradient (SVRG) or proximal SVRG, two new algorithms, SVRG-GOA and PSVRG-GOA, are proposed. We prove that the new algorithms have lower iteration complexity ($O(1/\varepsilon)$) than GradOpt ($O(1/\varepsilon^2)$). Some tricks, such as enlarging shrink factor, using project step, stochastic gradient, and mini-batch skills, are also given to accelerate the convergence speed of the proposed algorithms. Experimental results illustrate that the new algorithms with the similar performance can converge to 'global' optimums of the nonconvex problems, and they converge faster than the GradOpt and the nonconvex proximal SVRG.
- Graph similarity search is a common and fundamental operation in graph databases. One of the most popular graph similarity measures is the Graph Edit Distance (GED) mainly because of its broad applicability and high interpretability. Despite its prevalence, exact GED computation is proved to be NP-hard, which could result in unsatisfactory computational efficiency on large graphs. However, exactly accurate search results are usually unnecessary for real-world applications especially when the responsiveness is far more important than the accuracy. Thus, in this paper, we propose a novel probabilistic approach to efficiently estimate GED, which is further leveraged for the graph similarity search. Specifically, we first take branches as elementary structures in graphs, and introduce a novel graph similarity measure by comparing branches between graphs, i.e., Graph Branch Distance (GBD), which can be efficiently calculated in polynomial time. Then, we formulate the relationship between GED and GBD by considering branch variations as the result ascribed to graph edit operations, and model this process by probabilistic approaches. By applying our model, the GED between any two graphs can be efficiently estimated by their GBD, and these estimations are finally utilized in the graph similarity search. Extensive experiments show that our approach has better accuracy, efficiency and scalability than other comparable methods in the graph similarity search over real and synthetic data sets.
- Jun 20 2017 cs.CV arXiv:1706.05587v1In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
- Jun 15 2017 cs.CV arXiv:1706.04264v1A number of pattern recognition tasks, e.g., face verification, can be boiled down to classification or clustering of unit length directional feature vectors whose distance can be simply computed by their angle. In this paper, we propose the von Mises-Fisher (vMF) mixture model as the theoretical foundation for an effective deep-learning of such directional features and derive a novel vMF Mixture Loss and its corresponding vMF deep features. The proposed vMF features learning achieves a discriminative learning, i.e., compacting the instances of the same class while increasing the distance of instances from different classes, and subsumes a number of loss functions or deep learning practice, e.g., normalization. The experiments carried out on face verification using 4 different challenging face datasets, i.e., LFW, IJB-A, YouTube Faces and CACD, show the effectiveness of the proposed approach, which displays very competitive and state-of-the-art results.
- Jun 12 2017 cs.CR arXiv:1706.03016v1Electronic tickets (e-tickets) are electronic versions of paper tickets, which enable users to access intended services and improve services's ciency. Generally, to buy discounted tickets, users must convince ticket sellers that their attributes (i.e. age, profession, location) satisfy the ticket price policies. Hence, privacy issues have been the primary concerns of e-tickets users. In this paper, a privacy-preserving electronic ticket scheme with fine-grained pricing (PPETS-FGP) is proposed to protect users' privacy and implement fine-grained pricing. The proposed PPETS- FGP scheme captures the following features: (1) Users can buy different price tickets from ticket sellers without releasing their exact attributes; (2) Each user's transaction records cannot be linked; (3) Tickets cannot be transferred and double spent; (4) The security of the proposed PPETS-FGP is formally proven and reduced to well- known (q-strong Diffie-Hellman) complexity assumption. To the best of our knowledge, it is the first provable e-ticket system where both privacy-preserving and fine-grained pricing are addressed.
- It is well-known that Android malware constantly evolves so as to evade detection. This causes the entire malware population to be non-stationary. Contrary to this fact, most of the prior works on Machine Learning based Android malware detection have assumed that the distribution of the observed malware characteristics (i.e., features) does not change over time. In this work, we address the problem of malware population drift and propose a novel online learning based framework to detect malware, named CASANDRA (Contextaware, Adaptive and Scalable ANDRoid mAlware detector). In order to perform accurate detection, a novel graph kernel that facilitates capturing apps' security-sensitive behaviors along with their context information from dependency graphs is proposed. Besides being accurate and scalable, CASANDRA has specific advantages: i) being adaptive to the evolution in malware features over time ii) explaining the significant features that led to an app's classification as being malicious or benign. In a large-scale comparative analysis, CASANDRA outperforms two state-of-the-art techniques on a benchmark dataset achieving 99.23% F-measure. When evaluated with more than 87,000 apps collected in-the-wild, CASANDRA achieves 89.92% accuracy, outperforming existing techniques by more than 25% in their typical batch learning setting and more than 7% when they are continuously retained, while maintaining comparable efficiency.
- Jun 06 2017 cs.DS arXiv:1706.01172v1Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch weighted sets and has drawn increasing interest from the community. Due to its constant-time complexity independent of the values of the weights, Improved CWS (ICWS) is considered as the state-of-the-art CWS algorithm. In this paper, we revisit ICWS and analyze its underlying mechanism to show that there actually exists dependence between the two components of the hash-code produced by ICWS, which violates the condition of independence. To remedy the problem, we propose an Improved ICWS (I$^2$CWS) algorithm which not only shares the same theoretical computational complexity as ICWS but also abides by the required conditions of the CWS scheme. The experimental results on a number of synthetic data sets and real-world text data sets demonstrate that our I$^2$CWS algorithm can estimate the Jaccard similarity more accurately, and also compete with or outperform the compared methods, including ICWS, in classification and top-$K$ retrieval, after relieving the underlying dependence.
- We study the combinatorial pure exploration problem Best-Set in stochastic multi-armed bandits. In a Best-Set instance, we are given $n$ arms with unknown reward distributions, as well as a family $\mathcal{F}$ of feasible subsets over the arms. Our goal is to identify the feasible subset in $\mathcal{F}$ with the maximum total mean using as few samples as possible. The problem generalizes the classical best arm identification problem and the top-$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$. For an important class of combinatorial families, we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and approximate Pareto curves in multi-objective optimization. We also show that the $\ln|\mathcal{F}|$ factor is inevitable in general through a nontrivial lower bound construction. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general Best-Set problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$-dimensional Euclidean space $\mathbb{R}^n$, and a collection $\mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $\mathcal{O}$ that contains the $n$-dimensional vector of the means. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.
- May 25 2017 cs.CV arXiv:1705.08620v1Domain adaptation (DA) is transfer learning which aims to leverage labeled data in a related source domain to achieve informed knowledge transfer and help the classification of unlabeled data in a target domain. In this paper, we propose a novel DA method, namely Robust Data Geometric Structure Aligned, Close yet Discriminative Domain Adaptation (RSA-CDDA), which brings closer, in a latent joint subspace, both source and target data distributions, and aligns inherent hidden source and target data geometric structures while performing discriminative DA in repulsing both interclass source and target data. The proposed method performs domain adaptation between source and target in solving a unified model, which incorporates data distribution constraints, in particular via a nonparametric distance, i.e., Maximum Mean Discrepancy (MMD), as well as constraints on inherent hidden data geometric structure segmentation and alignment between source and target, through low rank and sparse representation. RSA-CDDA achieves the search of a joint subspace in solving the proposed unified model through iterative optimization, alternating Rayleigh quotient algorithm and inexact augmented Lagrange multiplier algorithm. Extensive experiments carried out on standard DA benchmarks, i.e., 16 cross-domain image classification tasks, verify the effectiveness of the proposed method, which consistently outperforms the state-of-the-art methods.
- Knowledge bases (KBs) have attracted increasing attention due to its great success in various areas, such as Web and mobile search.Existing KBs are restricted to objective factual knowledge, such as city population or fruit shape, whereas,subjective knowledge, such as big city, which is commonly mentioned in Web and mobile queries, has been neglected. Subjective knowledge differs from objective knowledge in that it has no documented or observed ground truth. Instead, the truth relies on people's dominant opinion. Thus, we can use the crowdsourcing technique to get opinion from the crowd. In our work, we propose a system, called crowdsourced subjective knowledge acquisition (CoSKA),for subjective knowledge acquisition powered by crowdsourcing and existing KBs. The acquired knowledge can be used to enrich existing KBs in the subjective dimension which bridges the gap between existing objective knowledge and subjective queries.The main challenge of CoSKA is the conflict between large scale knowledge facts and limited crowdsourcing resource. To address this challenge, in this work, we define knowledge inference rules and then select the seed knowledge judiciously for crowdsourcing to maximize the inference power under the resource constraint. Our experimental results on real knowledge base and crowdsourcing platform verify the effectiveness of CoSKA system.
- Small cell base stations (SBSs) endowed with cloud-like computing capabilities are considered as a key enabler of edge computing (EC), which provides ultra-low latency and location-awareness for a variety of emerging mobile applications and the Internet of Things. However, due to the limited computation resources of an individual SBS, providing computation services of high quality to its users faces significant challenges when it is overloaded with an excessive amount of computation workload. In this paper, we propose collaborative edge computing among SBSs by forming SBS coalitions to share computation resources with each other, thereby accommodating more computation workload in the edge system and reducing reliance on the remote cloud. A novel SBS coalition formation algorithm is developed based on the coalitional game theory to cope with various new challenges in small-cell-based edge systems, including the co-provisioning of radio access and computing services, cooperation incentives, and potential security risks. To address these challenges, the proposed method (1) allows collaboration at both the user-SBS association stage and the SBS peer offloading stage by exploiting the ultra dense deployment of SBSs, (2) develops a payment-based incentive mechanism that implements proportionally fair utility division to form stable SBS coalitions, and (3) builds a social trust network for managing security risks among SBSs due to collaboration. Systematic simulations in practical scenarios are carried out to evaluate the efficacy and performance of the proposed method, which shows that tremendous edge computing performance improvement can be achieved.
- Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.
- This paper formulates a time-varying social-welfare maximization problem for distribution grids with distributed energy resources (DERs) and develops online distributed algorithms to identify (and track) its solutions. In the considered setting, network operator and DER-owners pursue given operational and economic objectives, while concurrently ensuring that voltages are within prescribed limits. The proposed algorithm affords an online implementation to enable tracking of the solutions in the presence of time-varying operational conditions and changing optimization objectives. It involves a strategy where the network operator collects voltage measurements throughout the feeder to build incentive signals for the DER-owners in real time; DERs then adjust the generated/consumed powers in order to avoid the violation of the voltage constraints while maximizing given objectives. The stability of the proposed schemes is analytically established and numerically corroborated.
- Cross-modal audio-visual perception has been a long-lasting topic in psychology and neurology, and various studies have discovered strong correlations in human perception of auditory and visual stimuli. Despite works in computational multimodal modeling, the problem of cross-modal audio-visual generation has not been systematically studied in the literature. In this paper, we make the first attempt to solve this cross-modal generation problem leveraging the power of deep generative adversarial training. Specifically, we use conditional generative adversarial networks to achieve cross-modal audio-visual generation of musical performances. We explore different encoding methods for audio and visual signals, and work on two scenarios: instrument-oriented generation and pose-oriented generation. Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments. Our experiments using both classification and human evaluations demonstrate that our model has the ability to generate one modality, i.e., audio/visual, from the other modality, i.e., visual/audio, to a good extent. Our experiments on various design choices along with the datasets will facilitate future research in this new problem space.
- Apr 25 2017 cs.CR arXiv:1704.07216v1Research on vehicular networking (V2X) security has produced a range of security mechanisms and protocols tailored for this domain, addressing both security and privacy. Typically, the security analysis of these proposals has largely been informal. However, formal analysis can be used to expose flaws and ultimately provide a higher level of assurance in the protocols. This paper focusses on the formal analysis of a particular element of security mechanisms for V2X found in many proposals: the revocation of malicious or misbehaving vehicles from the V2X system by invalidating their credentials. This revocation needs to be performed in an unlinkable way for vehicle privacy even in the context of vehicles regularly changing their pseudonyms. The REWIRE scheme by Förster et al. and its subschemes BASIC and RTOKEN aim to solve this challenge by means of cryptographic solutions and trusted hardware. Formal analysis using the TAMARIN prover identifies two flaws with some of the functional correctness and authentication properties in these schemes. We then propose Obscure Token (OTOKEN), an extension of REWIRE to enable revocation in a privacy preserving manner. Our approach addresses the functional and authentication properties by introducing an additional key-pair, which offers a stronger and verifiable guarantee of successful revocation of vehicles without resolving the long-term identity. Moreover OTOKEN is the first V2X revocation protocol to be co-designed with a formal model.
- A growing number of threats to Android phones creates challenges for malware detection. Manually labeling the samples into benign or different malicious families requires tremendous human efforts, while it is comparably easy and cheap to obtain a large amount of unlabeled APKs from various sources. Moreover, the fast-paced evolution of Android malware continuously generates derivative malware families. These families often contain new signatures, which can escape detection when using static analysis. These practical challenges can also cause traditional supervised machine learning algorithms to degrade in performance. In this paper, we propose a framework that uses model-based semi-supervised (MBSS) classification scheme on the dynamic Android API call logs. The semi-supervised approach efficiently uses the labeled and unlabeled APKs to estimate a finite mixture model of Gaussian distributions via conditional expectation-maximization and efficiently detects malwares during out-of-sample testing. We compare MBSS with the popular malware detection classifiers such as support vector machine (SVM), $k$-nearest neighbor (kNN) and linear discriminant analysis (LDA). Under the ideal classification setting, MBSS has competitive performance with 98\% accuracy and very low false positive rate for in-sample classification. For out-of-sample testing, the out-of-sample test data exhibit similar behavior of retrieving phone information and sending to the network, compared with in-sample training set. When this similarity is strong, MBSS and SVM with linear kernel maintain 90\% detection rate while $k$NN and LDA suffer great performance degradation. When this similarity is slightly weaker, all classifiers degrade in performance, but MBSS still performs significantly better than other classifiers.
- Domain adaptation is transfer learning which aims to generalize a learning model across training and testing data with different distributions. Most previous research tackle this problem in seeking a shared feature representation between source and target domains while reducing the mismatch of their data distributions. In this paper, we propose a close yet discriminative domain adaptation method, namely CDDA, which generates a latent feature representation with two interesting properties. First, the discrepancy between the source and target domain, measured in terms of both marginal and conditional probability distribution via Maximum Mean Discrepancy is minimized so as to attract two domains close to each other. More importantly, we also design a repulsive force term, which maximizes the distances between each label dependent sub-domain to all others so as to drag different class dependent sub-domains far away from each other and thereby increase the discriminative power of the adapted domain. Moreover, given the fact that the underlying data manifold could have complex geometric structure, we further propose the constraints of label smoothness and geometric structure consistency for label propagation. Extensive experiments are conducted on 36 cross-domain image classification tasks over four public datasets. The comprehensive results show that the proposed method consistently outperforms the state-of-the-art methods with significant margins.
- Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall.
- Mobile Edge Computing (MEC) (a.k.a. fog computing) has recently emerged to enable low-latency and location-aware data processing at the edge of mobile networks. Providing grid power supply in support of MEC, however, is costly and even infeasible, thus mandating on-site renewable energy as a major or even sole power supply in many scenarios. Nonetheless, the high intermittency and unpredictability of energy harvesting creates many new challenges of performing effective MEC. In this paper, we develop an algorithm called GLOBE that performs joint geographical load balancing (GLB) (for computation workload) and admission control (for communication data traffic), for optimizing the system performance of a network of MEC-enabled base stations. By leveraging the Lyapunov optimization with perturbation technique, GLOBE operates online without requiring future system information and addresses significant challenges caused by battery state dynamics and energy causality constraints. We prove that GLOBE achieves a close-to-optimal system performance compared to the offline algorithm that knows full future information, and present a critical tradeoff between battery capacity and system performance. Simulation results validate our analysis and demonstrate the superior performance of GLOBE compared to benchmark algorithms.
- Mar 31 2017 cs.CV arXiv:1703.10304v1Reconstruction based on the stereo camera has received considerable attention recently, but two particular challenges still remain. The first concerns the need to aggregate similar pixels in an effective approach, and the second is to maintain as much of the available information as possible while ensuring sufficient accuracy. To overcome these issues, we propose a new 3D representation method, namely, planecell, that extracts planarity from the depth-assisted image segmentation and then projects these depth planes into the 3D world. An energy function formulated from Conditional Random Field that generalizes the planar relationships is maximized to merge coplanar segments. We evaluate our method with a variety of reconstruction baselines on both KITTI and Middlebury datasets, and the results indicate the superiorities compared to other 3D space representation methods in accuracy, memory requirements and further applications.
- Mar 28 2017 cs.CV arXiv:1703.08912v1In this paper, we will investigate the contribution of color names for salient object detection. Each input image is first converted to the color name space, which is consisted of 11 probabilistic channels. By exploring the topological structure relationship between the figure and the ground, we obtain a saliency map through a linear combination of a set of sequential attention maps. To overcome the limitation of only exploiting the surroundedness cue, two global cues with respect to color names are invoked for guiding the computation of another weighted saliency map. Finally, we integrate the two saliency maps into a unified framework to infer the saliency result. In addition, an improved post-processing procedure is introduced to effectively suppress the background while uniformly highlight the salient objects. Experimental results show that the proposed model produces more accurate saliency maps and performs well against 23 saliency models in terms of three evaluation metrics on three public datasets.
- Mar 27 2017 cs.CV arXiv:1703.08388v2Face recognition (FR) methods report significant performance by adopting the convolutional neural network (CNN) based learning methods. Although CNNs are mostly trained by optimizing the softmax loss, the recent trend shows an improvement of accuracy with different strategies, such as task-specific CNN learning with different loss functions, fine-tuning on target dataset, metric learning and concatenating features from multiple CNNs. Incorporating these tasks obviously requires additional efforts. Moreover, it demotivates the discovery of efficient CNN models for FR which are trained only with identity labels. We focus on this fact and propose an easily trainable and single CNN based FR method. Our CNN model exploits the residual learning framework. Additionally, it uses normalized features to compute the loss. Our extensive experiments show excellent generalization on different datasets. We obtain very competitive and state-of-the-art results on the LFW, IJB-A, YouTube faces and CACD datasets.
- The (ultra-)dense deployment of small-cell base stations (SBSs) endowed with cloud-like computing functionalities paves the way for pervasive mobile edge computing (MEC), enabling ultra-low latency and location-awareness for a variety of emerging mobile applications and the Internet of Things. To handle spatially uneven computation workloads in the network, cooperation among SBSs via workload peer offloading is essential to avoid large computation latency at overloaded SBSs and provide high quality of service to end users. However, performing effective peer offloading faces many unique challenges in small cell networks due to limited energy resources committed by self-interested SBS owners, uncertainties in the system dynamics and co-provisioning of radio access and computing services. This paper develops a novel online SBS peer offloading framework, called OPEN, by leveraging the Lyapunov technique, in order to maximize the long-term system performance while keeping the energy consumption of SBSs below individual long-term constraints. OPEN works online without requiring information about future system dynamics, yet provides provably near-optimal performance compared to the oracle solution that has the complete future information. In addition, this paper formulates a novel peer offloading game among SBSs, analyzes its equilibrium and efficiency loss in terms of the price of anarchy in order to thoroughly understand SBSs' strategic behaviors, thereby enabling decentralized and autonomous peer offloading decision making. Extensive simulations are carried out and show that peer offloading among SBSs dramatically improves the edge computing performance.
- Mobile edge computing (a.k.a. fog computing) has recently emerged to enable in-situ processing of delay-sensitive applications at the edge of mobile networks. Providing grid power supply in support of mobile edge computing, however, is costly and even infeasible (in certain rugged or under-developed areas), thus mandating on-site renewable energy as a major or even sole power supply in increasingly many scenarios. Nonetheless, the high intermittency and unpredictability of renewable energy make it very challenging to deliver a high quality of service to users in energy harvesting mobile edge computing systems. In this paper, we address the challenge of incorporating renewables into mobile edge computing and propose an efficient reinforcement learning-based resource management algorithm, which learns on-the-fly the optimal policy of dynamic workload offloading (to the centralized cloud) and edge server provisioning to minimize the long-term system cost (including both service delay and operational cost). Our online learning algorithm uses a decomposition of the (offline) value iteration and (online) reinforcement learning, thus achieving a significant improvement of learning rate and run-time performance when compared to standard reinforcement learning algorithms such as Q-learning. We prove the convergence of the proposed algorithm and analytically show that the learned policy has a simple monotone structure amenable to practical implementation. Our simulation results validate the efficacy of our algorithm, which significantly improves the edge computing performance compared to fixed or myopic optimization schemes and conventional reinforcement learning algorithms.
- Mar 14 2017 cs.CY arXiv:1703.04174v1Structured Peer Learning (SPL) is a form of peer-based supplemental instruction that focuses on mentoring, guidance, and development of technical, communication, and social skills in both the students receiving assistance and the students in teaching roles. This paper explores the methodology, efficacy, and reasoning behind the practical realization of a SPL program designed to increase student knowledge and success in undergraduate Computer Science courses. Students expressed an increased level of comfort when asking for help from student teachers versus traditional educational resources, historically showed an increased average grade in lower-level courses, and felt that the program positively impacted their desire to continue in or switch to a Computer major. Additionally, results indicated that advances in programming, analytical thinking, and abstract analysis skills were evident in not only the students but also the student teachers, suggesting a strong bidirectional flow of knowledge.
- Mar 06 2017 cs.CV arXiv:1703.01243v1One of the major challenges in Minimally Invasive Surgery (MIS) such as laparoscopy is the lack of depth perception. In recent years, laparoscopic scene tracking and surface reconstruction has been a focus of investigation to provide rich additional information to aid the surgical process and compensate for the depth perception issue. However, robust 3D surface reconstruction and augmented reality with depth perception on the reconstructed scene are yet to be reported. This paper presents our work in this area. First, we adopt a state-of-the-art visual simultaneous localization and mapping (SLAM) framework - ORB-SLAM - and extend the algorithm for use in MIS scenes for reliable endoscopic camera tracking and salient point mapping. We then develop a robust global 3D surface reconstruction frame- work based on the sparse point clouds extracted from the SLAM framework. Our approach is to combine an outlier removal filter within a Moving Least Squares smoothing algorithm and then employ Poisson surface reconstruction to obtain smooth surfaces from the unstructured sparse point cloud. Our proposed method has been quantitatively evaluated compared with ground-truth camera trajectories and the organ model surface we used to render the synthetic simulation videos. In vivo laparoscopic videos used in the tests have demonstrated the robustness and accuracy of our proposed framework on both camera tracking and surface reconstruction, illustrating the potential of our algorithm for depth augmentation and depth-corrected augmented reality in MIS with monocular endoscopes.
- Mar 03 2017 cs.DS arXiv:1703.00575v1In a recent paper, Braun, Chung and Graham [1] have addressed a single-processor scheduling problem with time restrictions. Given a fixed integer $B \geq 2$, there is a set of jobs to be processed by a single processor subject to the following B-constraint. For any real $x$, no unit time interval $[x, x+1)$ is allowed to intersect more than $B$ jobs. The problem has been shown to be NP-hard when $B$ is part of the input and left as an open question whether it remains NP-hard or not if $B$ is fixed [1, 5, 7]. This paper contributes to answering this question that we prove the problem is NP-hard even when $B=2$. A PTAS is also presented for any constant $B \geq 2$.
- Mar 01 2017 cs.CV arXiv:1702.08692v1Fine-grained recognition is a challenging task due to the small intra-category variances. Most of top-performing fine-grained recognition methods leverage parts of objects for better performance. Therefore, part annotations which are extremely computationally expensive are required. In this paper, we propose a novel cascaded deep CNN detection framework for fine-grained recognition which is trained to detect the whole object without considering parts. Nevertheless, most of current top-performing detection networks use the N+1 class (N object categories plus background) softmax loss, and the background category with much more training samples dominates the feature learning progress so that the features are not good for object categories with fewer samples. To bridge this gap, we introduce a cascaded structure to eliminate background and exploit a one-vs-rest loss to capture more minute variances among different subordinate categories. Experiments show that our proposed recognition framework achieves comparable performance with state-of-the-art, part-free, fine-grained recognition methods on the CUB-200-2011 Bird dataset. Moreover, our method even outperforms most of part-based methods while does not need part annotations at the training stage and is free from any annotations at test stage.
- In this paper we study the use of coding techniques to accelerate machine learning (ML). Coding techniques, such as prefix codes, have been extensively studied and used to accelerate low-level data processing primitives such as scans in a relational database system. However, there is little work on how to exploit them to accelerate ML algorithms. In fact, applying coding techniques for faster ML faces a unique challenge: one needs to consider both how the codes fit into the optimization algorithm used to train a model, and the interplay between the model structure and the coding scheme. Surprisingly and intriguingly, our study demonstrates that a slight variant of the classical Lempel-Ziv-Welch (LZW) coding scheme is a good fit for several popular ML algorithms, resulting in substantial runtime savings. Comprehensive experiments on several real-world datasets show that our LZW-based ML algorithms exhibit speedups of up to 31x compared to a popular and state-of-the-art ML library, with no changes to ML accuracy, even though the implementations of our LZW variants are not heavily tuned. Thus, our study reveals a new avenue for accelerating ML algorithms using coding techniques and we hope this opens up a new direction for more research.
- In the Best-$k$-Arm problem, we are given $n$ stochastic bandit arms, each associated with an unknown reward distribution. We are required to identify the $k$ arms with the largest means by taking as few samples as possible. In this paper, we make progress towards a complete characterization of the instance-wise sample complexity bounds for the Best-$k$-Arm problem. On the lower bound side, we obtain a novel complexity term to measure the sample complexity that every Best-$k$-Arm instance requires. This is derived by an interesting and nontrivial reduction from the Best-$1$-Arm problem. We also provide an elimination-based algorithm that matches the instance-wise lower bound within doubly-logarithmic factors. The sample complexity of our algorithm strictly dominates the state-of-the-art for Best-$k$-Arm (module constant factors).
- Feb 10 2017 cs.CL arXiv:1702.02584v2For the purpose of automatically evaluating speakers' humor usage, we build a presentation corpus containing humorous utterances based on TED talks. Compared to previous data resources supporting humor recognition research, ours has several advantages, including (a) both positive and negative instances coming from a homogeneous data set, (b) containing a large number of speakers, and (c) being open. Focusing on using lexical cues for humor recognition, we systematically compare a newly emerging text classification method based on Convolutional Neural Networks (CNNs) with a well-established conventional method using linguistic knowledge. The advantages of the CNN method are both getting higher detection accuracies and being able to learn essential features automatically.
- As enjoying the closed form solution, least squares support vector machine (LSSVM) has been widely used for classification and regression problems having the comparable performance with other types of SVMs. However, LSSVM has two drawbacks: sensitive to outliers and lacking sparseness. Robust LSSVM (R-LSSVM) overcomes the first partly via nonconvex truncated loss function, but the current algorithms for R-LSSVM with the dense solution are faced with the second drawback and are inefficient for training large-scale problems. In this paper, we interpret the robustness of R-LSSVM from a re-weighted viewpoint and give a primal R-LSSVM by the representer theorem. The new model may have sparse solution if the corresponding kernel matrix has low rank. Then approximating the kernel matrix by a low-rank matrix and smoothing the loss function by entropy penalty function, we propose a convergent sparse R-LSSVM (SR-LSSVM) algorithm to achieve the sparse solution of primal R-LSSVM, which overcomes two drawbacks of LSSVM simultaneously. The proposed algorithm has lower complexity than the existing algorithms and is very efficient for training large-scale problems. Many experimental results illustrate that SR-LSSVM can achieve better or comparable performance with less training time than related algorithms, especially for training large scale problems.
- Feb 03 2017 cs.DB arXiv:1702.00567v1Data fusion has played an important role in data mining because high-quality data is required in a lot of applications. As on-line data may be out-of-date and errors in the data may propagate with copying and referring between sources, it is hard to achieve satisfying results with merely applying existing data fusion methods to fuse Web data. In this paper, we make use of the crowd to achieve high-quality data fusion result. We design a framework selecting a set of tasks to ask crowds in order to improve the confidence of data. Since data are correlated and crowds may provide incorrect answers, how to select a proper set of tasks to ask the crowd is a very challenging problem. In this paper, we design an approximation solution to address this challenge since we prove that the problem is at NP-hard. To further improve the efficiency, we design a pruning strategy and a preprocessing method, which effectively improve the performance of the proposed approximation solution. Furthermore, we find that under certain scenarios, we are not interested in all the facts, but only a specific set of facts. Thus, for these specific scenarios, we also develop another approximation solution which is much faster than the general approximation solution. We verify the solutions with extensive experiments on a real crowdsourcing platform.
- Feb 01 2017 cs.DS arXiv:1701.08809v1We investigate the problem of scheduling the maintenance of edges in a network, motivated by the goal of minimizing outages in transportation or telecommunication networks. We focus on maintaining connectivity between two nodes over time; for the special case of path networks, this is related to the problem of minimizing the busy time of machines. We show that the problem can be solved in polynomial time in arbitrary networks if preemption is allowed. If preemption is restricted to integral time points, the problem is NP-hard and in the non-preemptive case we give strong non-approximability results. Furthermore, we give tight bounds on the power of preemption, that is, the maximum ratio of the values of non-preemptive and preemptive optimal solutions. Interestingly, the preemptive and the non-preemptive problem can be solved efficiently on paths, whereas we show that mixing both leads to a weakly NP-hard problem that allows for a simple 2-approximation.
- Merging Mobile Edge Computing (MEC), which is an emerging paradigm to meet the increasing computation demands from mobile devices, with the dense deployment of Base Stations (BSs), is foreseen as a key step towards the next generation mobile networks. However, new challenges arise for designing energy efficient networks since radio access resources and computing resources of BSs have to be jointly managed, and yet they are complexly coupled with traffic in both spatial and temporal domains. In this paper, we address the challenge of incorporating MEC into dense cellular networks, and propose an efficient online algorithm, called ENGINE (ENErgy constrained offloadINg and slEeping) which makes joint computation offloading and BS sleeping decisions in order to maximize the quality of service while keeping the energy consumption low. Our algorithm leverages Lyapunov optimization technique, works online and achieves a close-to-optimal performance without using future information. Our simulation results show that our algorithm can effectively reduce energy consumption without sacrificing the user quality of service.
- Merging mobile edge computing with the dense deployment of small cell base stations promises enormous benefits such as a real proximity, ultra-low latency access to cloud functionalities. However, the envisioned integration creates many new challenges and one of the most significant is mobility management, which is becoming a key bottleneck to the overall system performance. Simply applying existing solutions leads to poor performance due to the highly overlapped coverage areas of multiple base stations in the proximity of the user and the co-provisioning of radio access and computing services. In this paper, we develop a novel user-centric mobility management scheme, leveraging Lyapunov optimization and multi-armed bandits theories, in order to maximize the edge computation performance for the user while keeping the user's communication energy consumption below a constraint. The proposed scheme effectively handles the uncertainties present at multiple levels in the system and provides both short-term and long-term performance guarantee. Simulation results show that our proposed scheme can significantly improve the computation performance (compared to state of the art) while satisfying the communication energy constraint.
- Jan 17 2017 cs.CV arXiv:1701.04043v1Tensor principal component analysis (TPCA) is a multi-linear extension of principal component analysis which converts a set of correlated measurements into several principal components. In this paper, we propose a new robust TPCA method to extract the princi- pal components of the multi-way data based on tensor singular value decomposition. The tensor is split into a number of blocks of the same size. The low rank component of each block tensor is extracted using iterative tensor singular value thresholding method. The prin- cipal components of the multi-way data are the concatenation of all the low rank components of all the block tensors. We give the block tensor incoherence conditions to guarantee the successful decom- position. This factorization has similar optimality properties to that of low rank matrix derived from singular value decomposition. Ex- perimentally, we demonstrate its effectiveness in two applications, including motion separation for surveillance videos and illumination normalization for face images.
- Energy internet is one of the most promising future energy infrastructures which could enhance energy efficiency and improve system operating flexibility. In analog to the micro-grid, micro energy internet puts more emphasis on distribution level and consumer side. The concept and design principles of smart micro energy internet are proposed to accommodate micro-grids, distributed poly-generation systems, energy storage facilities, and related energy distribution infrastructures. The dispatch and control system of the smart micro energy internet should be responsive to external disturbance and able to approach a satisfactory operating point which compromises multiple criteria, such as safety, economy, and environment protection. To realize the vision of the smart micro energy internet, engineering game theory based dispatch and control strategies are investigated. Based on the proposed architecture and energy management system, a prototype of the first domestic solar based smart micro energy internet is initially established in Qinghai University.
- Dec 30 2016 cs.NI arXiv:1612.08778v1Cellular communication networks are plagued with redundant capacity, which results in low utilization and cost-effectiveness of network capital investments. The redundant capacity can be exploited to deliver secondary traffic that is ultra-elastic and delay-tolerant. In this paper, we propose an analytical framework to study the capacity-delay tradeoff of elastic/secondary traffic in large scale cellular networks with spectrum aggregation. Our framework integrates stochastic geometry and queueing theory models and gives analytical insights into the capacity-delay performance in the interference limited regime. Closed-form results are obtained to characterize the mean delay and delay distribution as functions of per user throughput capacity. The impacts of spectrum aggregation, user and base station (BS) densities, traffic session payload, and primary traffic dynamics on the capacity-delay tradeoff relationship are investigated. The fundamental capacity limit is derived and its scaling behavior is revealed. Our analysis shows the feasibility of providing secondary communication services over cellular networks and highlights some critical design issues.
- Differential space time block coding (STBC) achieves full spatial diversity and avoids channel estimation overhead. Over highly frequency-selective channels, STBC is integrated with orthogonal frequency division multiplexing (OFDM) to achieve high performance. However, low-cost implementation of differential STBC-OFDM using direct-conversion transceivers is sensitive to In-phase/Quadrature-phase imbalance (IQI). In this paper, we quantify the performance impact of IQI at the receiver front-end on differential STBC-OFDM systems and propose a compensation algorithm to mitigate its effect. The proposed receiver IQI compensation works in an adaptive decision-directed manner without using known pilots or training sequences, which reduces the rate loss due to training overhead. Our numerical results show that our proposed compensation algorithm can effectively mitigate receive IQI in differential STBC-OFDM.
- Differential space time block coding (STBC) achieves full spatial diversity and avoids channel estimation overhead. Over highly frequency-selective channels, STBC is integrated with orthogonal frequency division multiplexing (OFDM) to efficiently mitigate intersymbol interference effects. However, low-cost implementation of STBC-OFDM with direct-conversion transceivers is sensitive to In-phase/Quadrature-phase imbalance (IQI). In this paper, we quantify the performance impact of IQI at both the transmitter and receiver radio frequency front-ends on differential STBC-OFDM systems which has not been investigated before in the literature. In addition, we propose a widely-linear compensation algorithm at the receiver to mitigate the performance degradation caused by the IQI at the transmitter and receiver ends. Moreover, a parameter-based generalized algorithm is proposed to extract the IQI parameters and improve the performance under high-mobility. The adaptive compensation algorithms are blind and work in a decision-directed manner without using known pilots or training sequences. Numerical results show that our proposed compensation algorithms can effectively mitigate IQI in differential STBC-OFDM.
- Dec 23 2016 cs.DB arXiv:1612.07448v6Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all the ML tools require the input data to be presented as a single table, many datasets are multi-table, which forces data scientists to join those tables first, leading to data redundancy and runtime waste. Recent works on "factorized" ML mitigate this issue for a few specific ML algorithms by pushing ML through joins. But their approaches require a manual rewrite of ML implementations. Such piecemeal methods create a massive development overhead when extending such ideas to other ML algorithms. In this paper, we show that it is possible to mitigate this overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra. We introduce a new logical data type to represent normalized data and devise a framework of algebraic rewrite rules to convert a large set of linear algebra operations over denormalized data into operations over normalized data. We show how this enables us to automatically "factorize" several popular ML algorithms, thus unifying and generalizing several prior works. We prototype our framework in the popular ML environment R and an industrial R-over-RDBMS tool. Experiments with both synthetic and real normalized data show that our framework also yields significant speed-ups, up to 36x on real data.
- In the near future, there will likely be special-purpose quantum computers with 40-50 high-quality qubits. This paper lays general theoretical foundations for how to use such devices to demonstrate "quantum supremacy": that is, a clear quantum speedup for some task, motivated by the goal of overturning the Extended Church-Turing Thesis as confidently as possible. First, we study the hardness of sampling the output distribution of a random quantum circuit, along the lines of a recent proposal by the the Quantum AI group at Google. We show that there's a natural hardness assumption, which has nothing to do with sampling, yet implies that no efficient classical algorithm can pass a statistical test that the quantum sampling procedure's outputs do pass. Compared to previous work, the central advantage is that we can now talk directly about the observed outputs, rather than about the distribution being sampled. Second, in an attempt to refute our hardness assumption, we give a new algorithm, for simulating a general quantum circuit with n qubits and m gates in polynomial space and m^O(n) time. We then discuss why this and other known algorithms fail to refute our assumption. Third, resolving an open problem of Aaronson and Arkhipov, we show that any strong quantum supremacy theorem--of the form "if approximate quantum sampling is classically easy, then PH collapses"--must be non-relativizing. Fourth, refuting a conjecture by Aaronson and Ambainis, we show that the Fourier Sampling problem achieves a constant versus linear separation between quantum and randomized query complexities. Fifth, we study quantum supremacy relative to oracles in P/poly. Previous work implies that, if OWFs exist, then quantum supremacy is possible relative to such oracles. We show that some assumption is needed: if SampBPP=SampBQP and NP is in BPP, then quantum supremacy is impossible relative to such oracles.
- Dec 09 2016 cs.DB arXiv:1612.02564v3Huge amount of data with both space and text information, e.g., geo-tagged tweets, is flooding on the Internet. Such spatio-textual data stream contains valuable information for millions of users with various interests on different keywords and locations. Publish/subscribe systems enable efficient and effective information distribution by allowing users to register continuous queries with both spatial and textual constraints. However, the explosive growth of data scale and user base has posed challenges to the existing centralized publish/subscribe systems for spatio-textual data streams. In this paper, we propose our distributed publish/subscribe system, called PS2Stream, which digests a massive spatio-textual data stream and directs the stream to target users with registered interests. Compared with existing systems, PS2Stream achieves a better workload distribution in terms of both minimizing the total amount of workload and balancing the load of workers. To achieve this, we propose a new workload distribution algorithm considering both space and text properties of the data. Additionally, PS2Stream supports dynamic load adjustments to adapt to the change of the workload, which makes PS2Stream adaptive. Extensive empirical evaluation, on commercial cloud computing platform with real data, validates the superiority of our system design and advantages of our techniques on system performance improvement.
- Heart diseases constitute a global health burden, and the problem is exacerbated by the error-prone nature of listening to and interpreting heart sounds. This motivates the development of automated classification to screen for abnormal heart sounds. Existing machine learning-based systems achieve accurate classification of heart sound recordings but rely on expert features that have not been thoroughly evaluated on noisy recordings. Here we propose a segmental convolutional neural network architecture that achieves automatic feature learning from noisy heart sound recordings. Our experiments show that our best model, trained on noisy recording segments acquired with an existing hidden semi-markov model-based approach, attains a classification accuracy of 87.5% on the 2016 PhysioNet/CinC Challenge dataset, compared to the 84.6% accuracy of the state-of-the-art statistical classifier trained and evaluated on the same dataset. Our results indicate the potential of using neural network-based methods to increase the accuracy of automated classification of heart sound recordings for improved screening of heart diseases.
- There exist multitudes of cloud performance metrics, including workload performance, application placement, software/hardware optimization, scalability, capacity, reliability, agility and so on. In this paper, we consider jointly optimizing the performance of the software applications in the cloud. The challenges lie in bringing a diversity of raw data into tidy data format, unifying performance data from multiple systems based on timestamps, and assessing the quality of the processed performance data. Even after verifying the quality of cloud performance data, additional challenges block optimizing cloud computing. In this paper, we identify the challenges of cloud computing from the perspectives of computing environment, data collection, performance analytics and production environment.
- Nov 22 2016 cs.NI arXiv:1611.06302v1With the dense deployment of small cell networks, low-cost backhaul schemes for small cell base stations (SBSs) have attracted great attentions. Self-backhaul using cellular communication technology is considered as a promising solution. Although some excellent works have been done on self-backhaul in small cell networks, most of them do not consider the recent advances of full-duplex (FD) and massive multiple-input and multiple-output (MIMO) technologies. In this paper, we propose a self-backhaul scheme for small cell networks by combining FD and massive MIMO technologies. In our proposed scheme, the macro base station (MBS) is equipped with massive MIMO antennas, and the SBSs have the FD communication ability. By treating the SBSs as \textitspecial macro users, we can achieve the simultaneous transmissions of the access link of users and the backhaul link of SBSs in the same frequency. Furthermore, considering the existence of inter-tier and intra-tier interference, we formulate the power allocation problem of the MBS and SBSs as an optimization problem. Because the formulated power allocation problem is a non-convex problem, we transform the original problem into a difference of convex program (DCP) by successive convex approximation method (SCAM) and variable transformation, and then solve it using a constrained concave convex procedure (CCCP) based iterative algorithm. Finally, extensive simulations are conducted with different system configurations to verify the effectiveness of the proposed scheme.
- Nov 18 2016 cs.CV arXiv:1611.05594v2Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism --- a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN dynamically modulates the sentence generation context in multi-layer feature maps, encoding where (i.e., attentive spatial locations at multiple layers) and what (i.e., attentive channels) the visual attention is. We evaluate the proposed SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K, Flickr30K, and MSCOCO. It is consistently observed that SCA-CNN significantly outperforms state-of-the-art visual attention-based image captioning methods.
- Device-to-device (D2D) computation offloading has recently been proposed to enhance mobile edge computing (MEC) performance by exploiting spare computing resources in proximity user devices, thereby alleviating computation burdens from the network infrastructure and enabling truly pervasive edge computing. A key challenge in this new mobile computing paradigm is how to provide self-interested users with incentives to participate in D2D computing. Although incentive mechanism design has been intensively studied in the literature, this paper considers a much more challenging yet much under-investigated problem in which user incentives are complexly coupled with security risks, which is extremely important since D2D-enhanced MEC systems are vulnerable to distributed attacks, such as distributed denial of service (DDoS) attacks, due to its autonomous nature. In this paper, we build a novel mathematical framework incorporating game theory and classic epidemic models to investigate the interplay between user incentives and security risks in D2D-enhanced MEC systems under infectious DDoS attacks. A key result derived from this analysis is a phase change effect between persistent and non-persistent DDoS attacks, which is significantly different in nature from classic epidemic results for non-strategic users. Based on this, we determine the optimal reward in a contract-based incentive mechanism that the network operator should offer to users in order to maximize the operator's utility. The optimal solution exhibits an interesting "Less is More" effect: although giving users a higher reward promote more participation, it may harm the operator's utility. This is because too much participation fosters persistent DDoS attack and as a result, the \textiteffective participation level does not improve. Extensive simulations are carried out to verify the analytic conclusions.
- A main focus of machine learning research has been improving the generalization accuracy and efficiency of prediction models. Many models such as SVM, random forest, and deep neural nets have been proposed and achieved great success. However, what emerges as missing in many applications is actionability, i.e., the ability to turn prediction results into actions. For example, in applications such as customer relationship management, clinical prediction, and advertisement, the users need not only accurate prediction, but also actionable instructions which can transfer an input to a desirable goal (e.g., higher profit repays, lower morbidity rates, higher ads hit rates). Existing effort in deriving such actionable knowledge is few and limited to simple action models which restricted to only change one attribute for each action. The dilemma is that in many real applications those action models are often more complex and harder to extract an optimal solution. In this paper, we propose a novel approach that achieves actionability by combining learning with planning, two core areas of AI. In particular, we propose a framework to extract actionable knowledge from random forest, one of the most widely used and best off-the-shelf classifiers. We formulate the actionability problem to a sub-optimal action planning (SOAP) problem, which is to find a plan to alter certain features of a given input so that the random forest would yield a desirable output, while minimizing the total costs of actions. Technically, the SOAP problem is formulated in the SAS+ planning formalism, and solved using a Max-SAT based approach. Our experimental results demonstrate the effectiveness and efficiency of the proposed approach on a personal credit dataset and other benchmarks. Our work represents a new application of automated planning on an emerging and challenging machine learning paradigm.
- Oct 27 2016 cs.DB arXiv:1610.08411v1For decades, the crowdsourcing has gained much attention from both academia and industry, which outsources a number of tasks to human workers. Existing works considered improving the task accuracy through voting or learning methods, they usually did not fully take into account reducing the latency of the task completion. When a task requester posts a group of tasks (e.g., sentiment analysis), and one can only obtain answers of all tasks after the last task is accomplished. As a consequence, the time delay of even one task in this group could delay the next step of the task requester's work from minutes to days, which is quite undesirable for the task requester. Inspired by the importance of the task accuracy and latency, in this paper, we will propose a novel crowdsourcing framework, namely Fast and Reliable crOwdsourcinG framework (FROG), which intelligently assigns tasks to workers, such that the latencies of tasks are reduced and the expected accuracies of tasks are met. Specifically, our FROG framework consists of two important components, task scheduler and notification modules. For the task scheduler module, we formalize a FROG task scheduling (FROG-TS) problem, in which the server actively assigns workers with tasks with high reliability and low latency. We prove that the FROG-TS problem is NP-hard. Thus, we design two heuristic approaches, request-based and batch-based scheduling. For the notification module, we define an efficient worker notifying (EWN) problem, which only sends task invitations to those workers with high probabilities of accepting the tasks. To tackle the EWN problem, we propose a smooth kernel density estimation approach to estimate the probability that a worker accepts the task invitation. Through extensive experiments, we demonstrate the effectiveness and efficiency of our proposed FROG platform on both real and synthetic data sets.
- Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V;E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that correctly estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. We conclude with extensions and directions for future work.
- With the dense deployment of the remote radio heads (RRHs), the huge network power consumption has become a great challenge for green cloud radio access networks (Cloud-RANs), and multiuser downlink beamforming has been proposed as a promising solution. Moreover, the increasing number of mobile users (MUs) causes that admission control is essential for Cloud-RAN with limited fronthaul capacity and predefined power budget. In this paper, we consider the problem of joint multiuser downlink beamforming and admission control (JBAC) to enhance the admitted MUs in the network and reduce the network power consumption, while taking into account the Quality of Service requirements of the MUs, the power budget constraints and fronthaul limitation. It is shown that the JBAC problem is a mixed integer nonlinear problem, and still non-convex even though the continuous relaxation is adopted. Therefore, we first transform the JBAC problem into a Mixed-Integer Semidefinite Program. Then, we propose a bound improving Branch and Bound algorithm to yield the near-optimal solution. For practical application, a polynomial-time heuristic algorithm is proposed to derive the sub-optimal solution. Extensive simulations are conducted with different system configurations to show the effectiveness of the proposed two schemes.
- As the use of crowdsourcing increases, it is important to think about performance optimization. For this purpose, it is possible to think about each worker as a HPU(Human Processing Unit), and to draw inspiration from performance optimization on traditional computers or cloud nodes with CPUs. However, as we characterize HPUs in detail for this purpose, we find that there are important differences between CPUs and HPUs, leading to the need for completely new optimization algorithms. In this paper, we study the specific optimization problem of obtaining results fastest for a crowd sourced job with a fixed total budget. In crowdsourcing, jobs are usually broken down into sets of small tasks, which are assigned to workers one at a time. We consider three scenarios of increasing complexity: Identical Round Homogeneous tasks, Multiplex Round Homogeneous tasks, and Multiple Round Heterogeneous tasks. For each scenario, we analyze the stochastic behavior of the HPU clock-rate as a function of the remuneration offered. After that, we develop an optimum Budget Allocation strategy to minimize the latency for job completion. We validate our results through extensive simulations and experiments on Amazon Mechanical Turk.
- Frequency estimation of multiple sinusoids is significant in both theory and application. In some application scenarios, only sub-Nyquist samples are available to estimate the frequencies. A conventional approach is to sample the signals at several lower rates. In this paper, we address frequency estimation of the signals in the time domain through undersampled data. We analyze the impact of undersampling and demonstrate that three sub-Nyquist channels are generally enough to estimate the frequencies provided the undersampling ratios are pairwise coprime. We deduce the condition that leads to the failure of resolving frequency ambiguity when two coprime undersampling channels are utilized. When three-channel sub-Nyquist samples are used jointly, the frequencies can be determined uniquely and the correct frequencies are estimated. Numerical experiments verify the correctness of our analysis and conclusion.
- Sep 28 2016 cs.CV arXiv:1609.08436v7The detection of road and free space remains challenging for non-flat plane, especially with the varying latitudinal and longitudinal slope or in the case of multi-ground plane. In this paper, we propose a framework of the ground plane detection with stereo vision. The main contribution of this paper is a newly proposed descriptor which is implemented in the disparity image to obtain a disparity texture image. The ground plane regions can be distinguished from their surroundings effectively in the disparity texture image. Because the descriptor is implemented in the local area of the image, it can address well the problem of non-flat plane. And we also present a complete framework to detect the ground plane regions base on the disparity texture image with convolutional neural network architecture.
- Sep 12 2016 cs.CC arXiv:1609.02888v4We examine the power of statistical zero knowledge proofs (captured by the complexity class SZK) and their variants. First, we give the strongest known relativized evidence that SZK contains hard problems, by exhibiting an oracle relative to which SZK (indeed, even NISZK) is not contained in the class UPP, containing those problems solvable by randomized algorithms with unbounded error. This answers an open question of Watrous from 2002 [Aar]. Second, we "lift" this oracle separation to the setting of communication complexity, thereby answering a question of Göös et al. (ICALP 2016). Third, we give relativized evidence that perfect zero knowledge proofs (captured by the class PZK) are weaker than general zero knowledge proofs. Specifically, we exhibit oracles relative to which SZK is not contained in PZK, NISZK is not contained in NIPZK, and PZK is not equal to coPZK. The first of these results answers a question raised in 1991 by Aiello and Håstad (Information and Computation), and the second answers a question of Lovett and Zhang (2016). We also describe additional applications of these results outside of structural complexity. The technical core of our results is a stronger hardness amplification theorem for approximate degree, which roughly says that composing the gapped-majority function with any function of high approximate degree yields a function with high threshold degree.
- Aug 29 2016 cs.CV arXiv:1608.07444v1It is well known that clothing fashion is a distinctive and often habitual trend in the style in which a person dresses. Clothing fashions are usually expressed with visual stimuli such as style, color, and texture. However, it is not clear which visual stimulus places higher/lower influence on the updating of clothing fashion. In this study, computer vision and machine learning techniques are employed to analyze the influence of different visual stimuli on clothing-fashion updates. Specifically, a classification-based model is proposed to quantify the influence of different visual stimuli, in which each visual stimulus's influence is quantified by its corresponding accuracy in fashion classification. Experimental results demonstrate that, on clothing-fashion updates, the style holds a higher influence than the color, and the color holds a higher influence than the texture.
- Multi-view data clustering refers to categorizing a data set by making good use of related information from multiple representations of the data. It becomes important nowadays because more and more data can be collected in a variety of ways, in different settings and from different sources, so each data set can be represented by different sets of features to form different views of it. Many approaches have been proposed to improve clustering performance by exploring and integrating heterogeneous information underlying different views. In this paper, we propose a new multi-view fuzzy clustering approach called MinimaxFCM by using minimax optimization based on well-known Fuzzy c means. In MinimaxFCM the consensus clustering results are generated based on minimax optimization in which the maximum disagreements of different weighted views are minimized. Moreover, the weight of each view can be learned automatically in the clustering process. In addition, there is only one parameter to be set besides the fuzzifier. The detailed problem formulation, updating rules derivation, and the in-depth analysis of the proposed MinimaxFCM are provided here. Experimental studies on nine multi-view data sets including real world image and document data sets have been conducted. We observed that MinimaxFCM outperforms related multi-view clustering approaches in terms of clustering accuracy, demonstrating the great potential of MinimaxFCM for multi-view data analysis.
- Incremental clustering approaches have been proposed for handling large data when given data set is too large to be stored. The key idea of these approaches is to find representatives to represent each cluster in each data chunk and final data analysis is carried out based on those identified representatives from all the chunks. However, most of the incremental approaches are used for single view data. As large multi-view data generated from multiple sources becomes prevalent nowadays, there is a need for incremental clustering approaches to handle both large and multi-view data. In this paper we propose a new incremental clustering approach called incremental minimax optimization based fuzzy clustering (IminimaxFCM) to handle large multi-view data. In IminimaxFCM, representatives with multiple views are identified to represent each cluster by integrating multiple complementary views using minimax optimization. The detailed problem formulation, updating rules derivation, and the in-depth analysis of the proposed IminimaxFCM are provided. Experimental studies on several real world multi-view data sets have been conducted. We observed that IminimaxFCM outperforms related incremental fuzzy clustering in terms of clustering accuracy, demonstrating the great potential of IminimaxFCM for large multi-view data analysis.
- In the classical best arm identification (Best-$1$-Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least $1-\delta$, using as few samples as possible. Understanding the sample complexity of Best-$1$-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-$1$-Arm. Given an instance $I$, let $\mu_{[i]}$ be the $i$th largest mean and $\Delta_{[i]}=\mu_{[1]}-\mu_{[i]}$ be the corresponding gap. $H(I)=\sum_{i=2}^n\Delta_{[i]}^{-2}$ is the complexity of the instance. The gap-entropy conjecture states that $\Omega\left(H(I)\cdot\left(\ln\delta^{-1}+\mathsf{Ent}(I)\right)\right)$ is an instance lower bound, where $\mathsf{Ent}(I)$ is an entropy-like term determined by the gaps, and there is a $\delta$-correct algorithm for Best-$1$-Arm with sample complexity $O\left(H(I)\cdot\left(\ln\delta^{-1}+\mathsf{Ent}(I)\right)+\Delta_{[2]}^{-2}\ln\ln\Delta_{[2]}^{-1}\right)$. If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-$1$-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)⋅\left(\ln\delta^-1 +\mathsfEnt(I)\right)+\Delta_[2]^-2\ln\ln\Delta_[2]^-1\mathrmpolylog(n,\delta^-1)\right)\]samples in expectation. For the lower bound, we show that for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $\delta$-correct monotone algorithm requires $\Omega\left(H(I)\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)\right)$ samples in expectation.
- Jul 19 2016 cs.DS arXiv:1607.04913v2Depth first search (DFS) tree is one of the most well-known data structures for designing efficient graph algorithms. Given an undirected graph $G=(V,E)$ with $n$ vertices and $m$ edges, the textbook algorithm takes $O(n+m)$ time to construct a DFS tree. In this paper, we study the problem of maintaining a DFS tree when the graph is undergoing updates. Formally, we show the following results. Incremental DFS tree: Given any arbitrary online sequence of edge or vertex insertions, we can report a DFS tree in $O(n)$ worst case time per operation, and $O\left(\min\{m \log n + n \log^2 n, n^2\}\right)$ preprocessing time. Fault tolerant DFS tree: There exists a data structure of size $O(m \log n)$, such that given any set $\mathcal{F}$ of failed vertices or edges, it can report a DFS tree of the graph $G \setminus \mathcal{F}$ in $O(n|\mathcal{F}|\log^2 n)$ time. Fully dynamic DFS tree: Given any arbitrary online sequence of edge or vertex updates, we can report a DFS tree in $O(\sqrt{nm}\log^{1.5} n)$ worst case time per operation. Our result for incremental DFS tree improves the previous $O(n \log^3 n)$ worst case update time by Baswana et al., and matches the trivial $\Omega(n)$ lower bound when it is required to explicitly output the DFS tree. Our other results also improve the corresponding state-of-the-art results by Baswana et al. as well. Our work builds on the framework of the breakthrough work by Baswana et al., together with a novel use of a tree-partition lemma by Duan and Zhang, and the celebrated fractional cascading technique by Chazelle and Guibas.
- Jul 05 2016 cs.CC arXiv:1607.00945v2The width measure treedepth, also known as vertex ranking, centered coloring and elimination tree height, is a well-established notion which has recently seen a resurgence of interest. Since graphs of bounded treedepth are more restricted than graphs of bounded tree- or pathwidth, we are interested in the algorithmic utility of this additional structure. On the negative side, we show that every dynamic programming algorithm on treedepth decompositions of depth~$t$ cannot solve Dominating Set with $O((3-\epsilon)^t \cdot \log n)$ space for any $\epsilon > 0$. This result implies the same space lower bound for dynamic programming algorithms on tree and path decompositions. We supplement this result by showing a space lower bound of $O((3-\epsilon)^t \cdot \log n)$ for 3-Coloring and $O((2-\epsilon)^t \cdot \log n)$ for Vertex Cover. This formalizes the common intuition that dynamic programming algorithms on graph decompositions necessarily consume a lot of space and complements known results of the time-complexity of problems restricted to low-treewidth classes. We then show that treedepth lends itself to the design of branching algorithms. This class of algorithms has in general distinct advantages over dynamic programming algorithms: a) They use less space than algorithms based on dynamic programming, b) they are easy to parallelize and c) they provide possible solutions before terminating. Specifically, we design for Dominating Set a pure branching algorithm that runs in time $t^{O(t^2)}\cdot n$ and uses space $O(t^3 \log t + t \log n)$ and a hybrid of branching and dynamic programming that achieves a running time of $O(3^t \log t \cdot n)$ while using $O(2^t t \log t + t \log n)$ space. Algorithms for 3-Coloring and Vertex Cover with space complexity $O(t \cdot \log n)$ and time complexity $O(3^t \cdot n)$ and $O(2^t\cdot n)$, respectively, are included for completeness.
- Jul 01 2016 cs.CV arXiv:1606.09349v1Zero-shot learning (ZSL) extends the conventional image classification technique to a more challenging situation where the test image categories are not seen in the training samples. Most studies on ZSL utilize side information such as attributes or word vectors to bridge the relations between the seen classes and the unseen classes. However, existing approaches on ZSL typically exploit a shared space for each type of side information independently, which cannot make full use of the complementary knowledge of different types of side information. To this end, this paper presents an MBFA-ZSL approach to embed different types of side information as well as the visual feature into one shared space. Specifically, we first develop an algorithm named Multi-Battery Factor Analysis (MBFA) to build a unified semantic space, and then employ multiple types of side information in it to achieve the ZSL. The close-form solution makes MBFA-ZSL simple to implement and efficient to run on large datasets. Extensive experiments on the popular AwA, CUB, and SUN datasets show its significant superiority over the state-of-the-art approaches.
- In this paper, we present subgraph2vec, a novel approach for learning latent representations of rooted subgraphs from large graphs inspired by recent advancements in Deep Learning and Graph Kernels. These latent representations encode semantic substructure dependencies in a continuous vector space, which is easily exploited by statistical models for tasks such as graph classification, clustering, link prediction and community detection. subgraph2vec leverages on local information obtained from neighbourhoods of nodes to learn their latent representations in an unsupervised fashion. We demonstrate that subgraph vectors learnt by our approach could be used in conjunction with classifiers such as CNNs, SVMs and relational data clustering algorithms to achieve significantly superior accuracies. Also, we show that the subgraph vectors could be used for building a deep learning variant of Weisfeiler-Lehman graph kernel. Our experiments on several benchmark and large-scale real-world datasets reveal that subgraph2vec achieves significant improvements in accuracies over existing graph kernels on both supervised and unsupervised learning tasks. Specifically, on two realworld program analysis tasks, namely, code clone and malware detection, subgraph2vec outperforms state-of-the-art kernels by more than 17% and 4%, respectively.