This paper investigates an energy-efficient non-orthogonal transmission design problem for two downlink receivers that have strict reliability and finite blocklength (latency) constraints. The Shannon capacity formula widely used in traditional designs needs the assumption of infinite blocklength and thus is no longer appropriate. We adopt the newly finite blocklength coding capacity formula for explicitly specifying the trade-off between reliability and code blocklength. However, conventional successive interference cancellation (SIC) may become infeasible due to heterogeneous blocklengths. We thus consider several scenarios with different channel conditions and with/without SIC. By carefully examining the problem structure, we present in closed-form the optimal power and code blocklength for energy-efficient transmissions. Simulation results provide interesting insights into conditions for which non-orthogonal transmission is more energy efficient than the orthogonal transmission such as TDMA.
Dec 07 2017 cs.CV
While deep learning has led to significant advances in visual recognition over the past few years, such advances often require a lot of annotated data. While unsupervised domain adaptation has emerged as an alternative approach that doesn't require as much annotated data, prior evaluations of domain adaptation have been limited to relatively simple datasets. This work pushes the state of the art in unsupervised domain adaptation through an in depth evaluation of AlexNet, DenseNet and Residual Transfer Networks (RTN) on multimodal benchmark datasets that shows and identifies which layers more effectively transfer features across different domains. We also modify the existing RTN architecture and propose a novel domain adaptation architecture called "Deep MagNet" that combines Deep Convolutional Blocks with multiple Maximum Mean Discrepancy losses. Our experiments show quantitative and qualitative improvements in performance of our method on benchmarking datasets for complex data domains.
In the last decade, the development of electric taxis has motivated rapidly growing research interest in efficiently allocating electric charging stations in the academic literature. To address the driving pattern of electric taxis, we introduce the perspective of transport energy supply chain to capture the charging demand and to transform the charging station allocation problem to a location problem. Based on the P-median and the Min-max models, we developed a data-driven method to evaluate the system efficiency and service quality. We also conduct a case study using GPS trajectory data in Beijing, where various location strategies are evaluated from perspectives of system efficiency and service quality. Also, situations with and without congestion are comparatively evaluated.
Nov 28 2017 cs.CL
Recurrent neural language models are the state-of-the-art models for language modeling. When the vocabulary size is large, the space taken to store the model parameters becomes the bottleneck for the use of recurrent neural language models. In this paper, we introduce a simple space compression method that randomly shares the structured parameters at both the input and output embedding layers of the recurrent neural language models to significantly reduce the size of model parameters, but still compactly represent the original input and output embedding layers. The method is easy to implement and tune. Experiments on several data sets show that the new method can get similar perplexity and BLEU score results while only using a very tiny fraction of parameters.
Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various technics. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it can not distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.
Nov 10 2017 cs.CV
Video classification is highly important with wide applications, such as video search and intelligent surveillance. Video naturally consists of static and motion information, which can be represented by frame and optical flow. Recently, researchers generally adopt the deep networks to capture the static and motion information \textbf\emphseparately, which mainly has two limitations: (1) Ignoring the coexistence relationship between spatial and temporal attention, while they should be jointly modelled as the spatial and temporal evolutions of video, thus discriminative video features can be extracted.(2) Ignoring the strong complementarity between static and motion information coexisted in video, while they should be collaboratively learned to boost each other. For addressing the above two limitations, this paper proposes the approach of two-stream collaborative learning with spatial-temporal attention (TCLSTA), which consists of two models: (1) Spatial-temporal attention model: The spatial-level attention emphasizes the salient regions in frame, and the temporal-level attention exploits the discriminative frames in video. They are jointly learned and mutually boosted to learn the discriminative static and motion features for better classification performance. (2) Static-motion collaborative model: It not only achieves mutual guidance on static and motion information to boost the feature learning, but also adaptively learns the fusion weights of static and motion streams, so as to exploit the strong complementarity between static and motion information to promote video classification. Experiments on 4 widely-used datasets show that our TCLSTA approach achieves the best performance compared with more than 10 state-of-the-art methods.
Nov 06 2017 cs.CL
While neural machine translation (NMT) has become the new paradigm, the parameter optimization requires large-scale parallel data which is scarce in many domains and language pairs. In this paper, we address a new translation scenario in which there only exists monolingual corpora and phrase pairs. We propose a new method towards translation with partially aligned sentence pairs which are derived from the phrase pairs and monolingual corpora. To make full use of the partially aligned corpora, we adapt the conventional NMT training method in two aspects. On one hand, different generation strategies are designed for aligned and unaligned target words. On the other hand, a different objective function is designed to model the partially aligned parts. The experiments demonstrate that our method can achieve a relatively good result in such a translation scenario, and tiny bitexts can boost translation quality to a large extent.
As one of the most plausible convex optimization methods for sparse data reconstruction, $\ell_1$-minimization plays a fundamental role in the development of sparse optimization theory. The stability of this method has been addressed in the literature under various assumptions such as restricted isometry property (RIP), null space property (NSP), and mutual coherence. In this paper, we propose a unified means to develop the so-called weak stability theory for $\ell_1$-minimization methods under the condition called weak range space property of a transposed design matrix, which turns out to be a necessary and sufficient condition for the standard $\ell_1$-minimization method to be weakly stable in sparse data reconstruction. The reconstruction error bounds established in this paper are measured by the so-called Robinson's constant. We also provide a unified weak stability result for standard $\ell_1$-minimization under several existing compressed-sensing matrix properties. In particular, the weak stability of $\ell_1$-minimization under the constant-free range space property of order $k$ of the transposed design matrix is established for the first time in this paper. Different from the existing analysis, we utilize the classic Hoffman's Lemma concerning the error bound of linear systems as well as the Dudley's theorem concerning the polytope approximation of the unit $\ell_2$-ball to show that $\ell_1$-minimization is robustly and weakly stable in recovering sparse data from inaccurate measurements.
Oct 31 2017 cs.RO
Dexterous manipulation has broad applications in assembly lines, warehouses and agriculture. To perform large-scale manipulation tasks for various objects, a multi-fingered robotic hand sometimes has to sequentially adjust its grasping gestures, i.e. the finger gaits, to address the workspace limits and guarantee the object stability. However, realizing finger gaits planning in dexterous manipulation is challenging due to the complicated grasp quality metrics, uncertainties on object shapes and dynamics (mass and moment of inertia), and unexpected slippage under uncertain contact dynamics. In this paper, a dual-stage optimization based planner is proposed to handle these challenges. In the first stage, a velocity-level finger gaits planner is introduced by combining object grasp quality with hand manipulability. The proposed finger gaits planner is computationally efficient and realizes finger gaiting without 3D model of the object. In the second stage, a robust manipulation controller using robust control and force optimization is proposed to address object dynamics uncertainties and external disturbances. The dual-stage planner is able to guarantee stability under unexpected slippage caused by uncertain contact dynamics. Moreover, it does not require velocity measurement or expensive 3D/6D tactile sensors. The proposed dual-stage optimization based planner is verified by simulations on Mujoco.
In recent years, analyzing task-based fMRI (tfMRI) data has become an essential tool for understanding brain function and networks. However, due to the sheer size of tfMRI data, its intrinsic complex structure, and lack of ground truth of underlying neural activities, modeling tfMRI data is hard and challenging. Previously proposed data-modeling methods including Independent Component Analysis (ICA) and Sparse Dictionary Learning only provided a weakly established model based on blind source separation under the strong assumption that original fMRI signals could be linearly decomposed into time series components with corresponding spatial maps. Meanwhile, analyzing and learning a large amount of tfMRI data from a variety of subjects has been shown to be very demanding but yet challenging even with technological advances in computational hardware. Given the Convolutional Neural Network (CNN), a robust method for learning high-level abstractions from low-level data such as tfMRI time series, in this work we propose a fast and scalable novel framework for distributed deep Convolutional Autoencoder model. This model aims to both learn the complex hierarchical structure of the tfMRI data and to leverage the processing power of multiple GPUs in a distributed fashion. To implement such a model, we have created an enhanced processing pipeline on the top of Apache Spark and Tensorflow library, leveraging from a very large cluster of GPU machines. Experimental data from applying the model on the Human Connectome Project (HCP) show that the proposed model is efficient and scalable toward tfMRI big data analytics, thus enabling data-driven extraction of hierarchical neuroscientific information from massive fMRI big data in the future.
Identifying arbitrary topologies of power networks in real time is a computationally hard problem due to the number of hypotheses that grows exponentially with the network size. A new "Learning-to-Infer" variational inference method is developed for efficient inference of every line status in the network. Optimizing the variational model is transformed to and solved as a discriminative learning problem based on Monte Carlo samples generated with power flow simulations. A major advantage of the developed Learning-to-Infer method is that the labeled data used for training can be generated in an arbitrarily large amount fast and at very little cost. As a result, the power of offline training is fully exploited to learn very complex classifiers for effective real-time topology identification. The proposed methods are evaluated in the IEEE 30, 118 and 300 bus systems. Excellent performance in identifying arbitrary power network topologies in real time is achieved even with relatively simple variational models and a reasonably small amount of data.
The emergence of smartwatches poses new challenges to information security. Although there are mature touch-based authentication methods for smartphones, the effectiveness of using these methods on smartwatches is still unclear. We conducted a user study (n=16) to evaluate how authentication methods (PIN and Pattern), UIs (Square and Circular), and display sizes (38mm and 42mm) affect authentication accuracy, speed, and security. Circular UIs are tailored to smartwatches with fewer UI elements. Results show that 1) PIN is more accurate and secure than Pattern; 2) Pattern is much faster than PIN; 3) Square UIs are more secure but less accurate than Circular UIs; 4) display size does not affect accuracy or speed, but security; 5) Square PIN is the most secure method of all. The study also reveals a security concern that participants' favorite method is not the best in any of the measures. We finally discuss implications for future touch-based smartwatch authentication design.
Recent breakthroughs in cancer research have come via the up-and-coming field of pathway analysis. By applying statistical methods to prior known gene and protein regulatory information, pathway analysis provides a meaningful way to interpret genomic data. While many gene/protein regulatory relationships have been studied, never before has such a significant amount data been made available in organized forms of gene/protein regulatory networks and pathways. However, pathway analysis research is still in its infancy, especially when applying it to solve practical problems. In this paper we propose a new method of studying biological pathways, one that cross analyzes mutation information, transcriptome and proteomics data. Using this outcome, we identify routes of aberrant pathways potentially responsible for the etiology of disease. Each pathway route is encoded as a bayesian network which is initialized with a sequence of conditional probabilities specifically designed to encode directionality of regulatory relationships encoded in the pathways. Far more complex interactions, such as phosphorylation and methylation, among others, in the pathways can be modeled using this approach. The effectiveness of our model is demonstrated through its ability to distinguish real pathways from decoys on TCGA mRNA-seq, mutation, Copy Number Variation and phosphorylation data for both Breast cancer and Ovarian cancer study. The majority of pathways distinguished can be confirmed by biological literature. Moreover, the proportion of correctly indentified pathways is \% higher than previous work where only mRNA-seq mutation data is incorporated for breast cancer patients. Consequently, such an in-depth pathway analysis incorporating more diverse data can give rise to the accuracy of perturbed pathway detection.
In order for autonomous robots to be able to support people's well-being in homes and everyday environments, new interactive capabilities will be required, as exemplified by the soft design used for Disney's recent robot character Baymax in popular fiction. Home robots will be required to be easy to interact with and intelligent--adaptive, fun, unobtrusive and involving little effort to power and maintain--and capable of carrying out useful tasks both on an everyday level and during emergencies. The current article adopts an exploratory medium fidelity prototyping approach for testing some new robotic capabilities in regard to recognizing people's activities and intentions and behaving in a way which is transparent to people. Results are discussed with the aim of informing next designs.
Randomized experiments have been critical tools of decision making for decades. However, subjects can show significant heterogeneity in response to treatments in many important applications. Therefore it is not enough to simply know which treatment is optimal for the entire population. What we need is a model that correctly customize treatment assignment base on subject characteristics. The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature. Many algorithms have been proposed for uplift modeling and some have generated promising results on various data sets. Yet little is known about the theoretical properties of these algorithms. In this paper, we propose a new tree-based ensemble algorithm for uplift modeling. Experiments show that our algorithm can achieve competitive results on both synthetic and industry-provided data. In addition, by properly tuning the "node size" parameter, our algorithm is proved to be consistent under mild regularity conditions. This is the first consistent algorithm for uplift modeling that we are aware of.
Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications. Although existing studies have demonstrated the effectiveness and feasibility of running deep neural network inference operations on mobile and embedded devices, they overlooked the reliability of mobile computing models. Reliability measurements such as predictive uncertainty estimations are key factors for improving the decision accuracy and user experience. In this work, we propose RDeepSense, the first deep learning model that provides well-calibrated uncertainty estimations for resource-constrained mobile and embedded devices. RDeepSense enables the predictive uncertainty by adopting a tunable proper scoring rule as the training criterion and dropout as the implicit Bayesian approximation, which theoretically proves its correctness.To reduce the computational complexity, RDeepSense employs efficient dropout and predictive distribution estimation instead of model ensemble or sampling-based method for inference operations. We evaluate RDeepSense with four mobile sensing applications using Intel Edison devices. Results show that RDeepSense can reduce around 90% of the energy consumption while producing superior uncertainty estimations and preserving at least the same model accuracy compared with other state-of-the-art methods.
Sep 12 2017 cs.RO
Advanced motor skills are essential for robots to physically coexist with humans. Much research on robot dynamics and control has achieved success on hyper robot motor capabilities, but mostly through heavily case-specific engineering. Meanwhile, in terms of robot acquiring skills in a ubiquitous manner, robot learning from human demonstration (LfD) has achieved great progress, but still has limitations handling dynamic skills and compound actions. In this paper, we present a composite learning scheme which goes beyond LfD and integrates robot learning from human definition, demonstration, and evaluation. The method tackles advanced motor skills that require dynamic time-critical maneuver, complex contact control, and handling partly soft partly rigid objects. We also introduce the "nunchaku flipping challenge", an extreme test that puts hard requirements to all these three aspects. Continued from our previous presentations, this paper introduces the latest update of the composite learning scheme and the physical success of the nunchaku flipping challenge.
Aug 31 2017 cs.CV
Recently, Generative Adversarial Network (GAN) has been found wide applications in style transfer, image-to-image translation and image super-resolution. In this paper, a color-depth conditional GAN is proposed to concurrently resolve the problems of depth super-resolution and color super-resolution in 3D videos. Firstly, given the low-resolution depth image and low-resolution color image, a generative network is proposed to leverage mutual information of color image and depth image to enhance each other in consideration of the geometry structural dependency of color-depth image in the same scene. Secondly, three loss functions, including data loss, total variation loss, and 8-connected gradient difference loss are introduced to train this generative network in order to keep generated images close to the real ones, in addition to the adversarial loss. Experimental results demonstrate that the proposed approach produces high-quality color image and depth image from low-quality image pair, and it is superior to several other leading methods. Besides, we use the same neural network framework to resolve the problem of image smoothing and edge detection at the same time.
Human activity recognition (HAR) has become a popular topic in research because of its wide application. With the development of deep learning, new ideas have appeared to address HAR problems. Here, a deep network architecture using residual bidirectional long short-term memory (LSTM) cells is proposed. The advantages of the new network include that a bidirectional connection can concatenate the positive time direction (forward state) and the negative time direction (backward state). Second, residual connections between stacked cells act as highways for gradients, which can pass underlying information directly to the upper layer, effectively avoiding the gradient vanishing problem. Generally, the proposed network shows improvements on both the temporal (using bidirectional cells) and the spatial (residual connections stacked deeply) dimensions, aiming to enhance the recognition rate. When tested with the Opportunity data set and the public domain UCI data set, the accuracy was increased by 4.78% and 3.68%, respectively, compared with previously reported results. Finally, the confusion matrix of the public domain UCI data set was analyzed.
This study presents a theoretical method for planning and controlling agile bipedal locomotion based on robustly tracking a set of non-periodic keyframe states. Based on centroidal momentum dynamics, we formulate a hybrid phase-space planning and control method which includes the following key components: (i) a step transition solver that enables dynamically tracking non-periodic keyframe states over various types of terrains, (ii) a robust hybrid automaton to effectively formulate planning and control algorithms, (iii) a steering direction model to control the robot's heading, (iv) a phase-space metric to measure distance to the planned locomotion manifolds, and (v) a hybrid control method based on the previous distance metric to produce robust dynamic locomotion under external disturbances. Compared to other locomotion methodologies, we have a large focus on non-periodic gait generation and robustness metrics to deal with disturbances. Such focus enables the proposed control method to robustly track non-periodic keyframe states over various challenging terrains and under external disturbances as illustrated through several simulations.
Aug 22 2017 cs.AI
Data analytics helps basketball teams to create tactics. However, manual data collection and analytics are costly and ineffective. Therefore, we applied a deep bidirectional long short-term memory (BLSTM) and mixture density network (MDN) approach. This model is not only capable of predicting a basketball trajectory based on real data, but it also can generate new trajectory samples. It is an excellent application to help coaches and players decide when and where to shoot. Its structure is particularly suitable for dealing with time series problems. BLSTM receives forward and backward information at the same time, while stacking multiple BLSTMs further increases the learning ability of the model. Combined with BLSTMs, MDN is used to generate a multi-modal distribution of outputs. Thus, the proposed model can, in principle, represent arbitrary conditional probability distributions of output variables. We tested our model with two experiments on three-pointer datasets from NBA SportVu data. In the hit-or-miss classification experiment, the proposed model outperformed other models in terms of the convergence speed and accuracy. In the trajectory generation experiment, eight model-generated trajectories at a given time closely matched real trajectories.
Aug 08 2017 cs.CV
Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset by 9.5%, and the best published model on the VQA dataset by 1.25%. Source code is available at https: //github.com/zhuchen03/vqa-sva.
Aug 07 2017 cs.CV
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. We first fine-tune GoogleNet by jointly modeling clothing images and their corresponding descriptions in a visual-semantic embedding space. Then, for each attribute (word), we generate its spatially-aware representation by combining its semantic word vector representation with its spatial representation derived from the convolutional maps of the fine-tuned network. The resulting spatially-aware representations are further used to cluster attributes into multiple groups to form spatially-aware concepts (e.g., the neckline concept might consist of attributes like v-neck, round-neck, etc). Finally, we decompose the visual-semantic embedding space into multiple concept-specific subspaces, which facilitates structured browsing and attribute-feedback product retrieval by exploiting multimodal linguistic regularities. We conducted extensive experiments on our newly collected Fashion200K dataset, and results on clustering quality evaluation and attribute-feedback product retrieval task demonstrate the effectiveness of our automatically discovered spatially-aware concepts.
Aug 03 2017 cs.CV
Visual tracking is intrinsically a temporal problem. Discriminative Correlation Filters (DCF) have demonstrated excellent performance for high-speed generic visual object tracking. Built upon their seminal work, there has been a plethora of recent improvements relying on convolutional neural network (CNN) pretrained on ImageNet as a feature extractor for visual tracking. However, most of their works relying on ad hoc analysis to design the weights for different layers either using boosting or hedging techniques as an ensemble tracker. In this paper, we go beyond the conventional DCF framework and propose a Kernalised Multi-resolution Convnet (KMC) formulation that utilises hierarchical response maps to directly output the target movement. When directly deployed the learnt network to predict the unseen challenging UAV tracking dataset without any weight adjustment, the proposed model consistently achieves excellent tracking performance. Moreover, the transfered multi-reslution CNN renders it possible to be integrated into the RNN temporal learning framework, therefore opening the door on the end-to-end temporal deep learning (TDL) for visual tracking.
Jul 14 2017 cs.SI
Cities are living systems where urban infrastructures and their functions are defined and evolved due to population behaviors. Profiling the cities and functional regions has been an important topic in urban design and planning. This paper studies a unique big data set which includes daily movement data of tens of millions of city residents, and develop a visual analytics system, namely UrbanFACET, to discover and visualize the dynamical profiles of multiple cities and their residents. This big user movement data set, acquired from mobile users' agnostic check-ins at thousands of phone APPs, is well utilized in an integrative study and visualization together with urban structure (e.g., road network) and POI (Point of Interest) distributions. In particular, we novelly develop a set of information-theory based metrics to characterize the mobility patterns of city areas and groups of residents. These multifaceted metrics including Fluidity, vibrAncy, Commutation, divErsity, and densiTy (FACET) which categorize and manifest hidden urban functions and behaviors. UrbanFACET system further allows users to visually analyze and compare the metrics over different areas and cities in metropolitan scales. The system is evaluated through both case studies on several big and heavily populated cities, and user studies involving real-world users.
Jul 12 2017 cs.CV
Foreground detection has been widely studied for decades due to its importance in many practical applications. Most of the existing methods assume foreground and background show visually distinct characteristics and thus the foreground can be detected once a good background model is obtained. However, there are many situations where this is not the case. Of particular interest in video surveillance is the camouflage case. For example, an active attacker camouflages by intentionally wearing clothes that are visually similar to the background. In such cases, even given a decent background model, it is not trivial to detect foreground objects. This paper proposes a texture guided weighted voting (TGWV) method which can efficiently detect foreground objects in camouflaged scenes. The proposed method employs the stationary wavelet transform to decompose the image into frequency bands. We show that the small and hardly noticeable differences between foreground and background in the image domain can be effectively captured in certain wavelet frequency bands. To make the final foreground decision, a weighted voting scheme is developed based on intensity and texture of all the wavelet bands with weights carefully designed. Experimental results demonstrate that the proposed method achieves superior performance compared to the current state-of-the-art results.
Jul 11 2017 cs.CV
In this paper, two local activity-tuned filtering frameworks are proposed for noise removal and image smoothing, where the local activity measurement is given by the clipped and normalized local variance or standard deviation. The first framework is a modified anisotropic diffusion for noise removal of piece-wise smooth image. The second framework is a local activity-tuned Relative Total Variation (LAT-RTV) method for image smoothing. Both frameworks employ the division of gradient and the local activity measurement to achieve noise removal. In addition, to better capture local information, the proposed LAT-RTV uses the product of gradient and local activity measurement to boost the performance of image smoothing. Experimental results are presented to demonstrate the efficiency of the proposed methods on various applications, including depth image filtering, clip-art compression artifact removal, image smoothing, and image denoising.
Cloud radio access network (C-RAN) has become a promising network architecture to support the massive data traffic in the next generation cellular networks. In a C-RAN, a massive number of low-cost remote antenna ports (RAPs) are connected to a single baseband unit (BBU) pool via high-speed low-latency fronthaul links, which enables efficient resource allocation and interference management. As the RAPs are geographically distributed, the group sparse beamforming schemes attracts extensive studies, where a subset of RAPs is assigned to be active and a high spectral efficiency can be achieved. However, most studies assumes that each user is equipped with a single antenna. How to design the group sparse precoder for the multiple antenna users remains little understood, as it requires the joint optimization of the mutual coupling transmit and receive beamformers. This paper formulates an optimal joint RAP selection and precoding design problem in a C-RAN with multiple antennas at each user. Specifically, we assume a fixed transmit power constraint for each RAP, and investigate the optimal tradeoff between the sum rate and the number of active RAPs. Motivated by the compressive sensing theory, this paper formulates the group sparse precoding problem by inducing the $\ell_0$-norm as a penalty and then uses the reweighted $\ell_1$ heuristic to find a solution. By adopting the idea of block diagonalization precoding, the problem can be formulated as a convex optimization, and an efficient algorithm is proposed based on its Lagrangian dual. Simulation results verify that our proposed algorithm can achieve almost the same sum rate as that obtained from exhaustive search.
Recent advances in deep learning motivate the use of deep neutral networks in sensing applications, but their excessive resource needs on constrained embedded devices remain an important impediment. A recently explored solution space lies in compressing (approximating or simplifying) deep neural networks in some manner before use on the device. We propose a new compression solution, called DeepIoT, that makes two key contributions in that space. First, unlike current solutions geared for compressing specific types of neural networks, DeepIoT presents a unified approach that compresses all commonly used deep learning structures for sensing applications, including fully-connected, convolutional, and recurrent neural networks, as well as their combinations. Second, unlike solutions that either sparsify weight matrices or assume linear structure within weight matrices, DeepIoT compresses neural network structures into smaller dense matrices by finding the minimum number of non-redundant hidden elements, such as filters and dimensions required by each layer, while keeping the performance of sensing applications the same. Importantly, it does so using an approach that obtains a global view of parameter redundancies, which is shown to produce superior compression. We conduct experiments with five different sensing-related tasks on Intel Edison devices. DeepIoT outperforms all compared baseline algorithms with respect to execution time and energy consumption by a significant margin. It reduces the size of deep neural networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to 94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements are achieved without loss of accuracy. The results underscore the potential of DeepIoT for advancing the exploitation of deep neural networks on resource-constrained embedded devices.
May 25 2017 cs.AI
Randomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response analysis, or personalized treatment learning in literature. A key feature for uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved. This presents a challenge to both the training and the evaluation of uplift models. In this paper we describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. We present a new uplift algorithm which creates a forest of randomized trees. The trees are built with a splitting criterion designed to directly optimize their uplift performance based on the proposed evaluation method. Both the evaluation method and the algorithm apply to arbitrary number of treatments and general response types. Experimental results on synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods.
May 02 2017 cs.CL
Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems. However, these latent variables are highly randomized, leading to uncontrollable generated responses. In this paper, we propose a framework allowing conditional response generation based on specific attributes. These attributes can be either manually assigned or automatically detected. Moreover, the dialog states for both speakers are modeled separately in order to reflect personal features. We validate this framework on two different scenarios, where the attribute refers to genericness and sentiment states respectively. The experiment result testified the potential of our model, where meaningful responses can be generated in accordance with the specified attributes.
Apr 21 2017 cs.CV
Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. Additionally, a simple yet effective temporal action proposal scheme, dubbed temporal actionness grouping (TAG) is devised to generate high quality action proposals. On two challenging benchmarks, THUMOS14 and ActivityNet, our method remarkably outperforms previous state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling actions with various temporal structures.
In this paper, a distributed average tracking problem is studied for Lipschitz-type nonlinear dynamical systems. The objective is to design distributed average tracking algorithms for locally interactive agents to track the average of multiple reference signals. Here, in both the agents' and the reference signals' dynamics, there is a nonlinear term satisfying the Lipschitz-type condition. Three types of distributed average tracking algorithms are designed. First, based on state-dependent-gain designing approaches, a robust distributed average tracking algorithm is developed to solve distributed average tracking problems without requiring the same initial condition. Second, by using a gain adaption scheme, an adaptive distributed average tracking algorithm is proposed in this paper to remove the requirement that the Lipschitz constant is known for agents. Third, to reduce chattering and make the algorithms easier to implement, a continuous distributed average tracking algorithm based on a time-varying boundary layer is further designed as a continuous approximation of the previous discontinuous distributed average tracking algorithms.
Apr 10 2017 cs.MM
Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly focused on single-media retrieval. However, the requirements of users are highly flexible, such as retrieving the relevant audio clips with one query of image. So challenges stemming from the "media gap", which means that representations of different media types are inconsistent, have attracted increasing attention. Cross-media retrieval is designed for the scenarios where the queries and retrieval results are of different media types. As a relatively new research topic, its concepts, methodologies and benchmarks are still not clear in the literatures. To address these issues, we review more than 100 references, give an overview including the concepts, methodologies, major challenges and open issues, as well as build up the benchmarks including datasets and experimental results. Researchers can directly adopt the benchmarks to promptly evaluate their proposed methods. This will help them to focus on algorithm design, rather than the time-consuming compared methods and results. It is noted that we have constructed a new dataset XMedia, which is the first publicly available dataset with up to five media types (text, image, video, audio and 3D model). We believe this overview will attract more researchers to focus on cross-media retrieval and be helpful to them.
Mar 27 2017 cs.CV
We investigate a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems. Classification networks are only responsive to small and sparse discriminative regions from the object of interest, which deviates from the requirement of the segmentation task that needs to localize dense, interior and integral regions for pixel-wise inference. To mitigate this gap, we propose a new adversarial erasing approach for localizing and expanding object regions progressively. Starting with a single small object region, our proposed approach drives the classification network to sequentially discover new and complement object regions by erasing the current mined regions in an adversarial manner. These localized regions eventually constitute a dense and complete object region for learning semantic segmentation. To further enhance the quality of the discovered regions by adversarial erasing, an online prohibitive segmentation learning approach is developed to collaborate with adversarial erasing by providing auxiliary segmentation supervision modulated by the more reliable classification scores. Despite its apparent simplicity, the proposed approach achieves 55.0% and 55.7% mean Intersection-over-Union (mIoU) scores on PASCAL VOC 2012 val and test sets, which are the new state-of-the-arts.
Mar 24 2017 cs.CV
Video classification is productive in many practical applications, and the recent deep learning has greatly improved its accuracy. However, existing works often model video frames indiscriminately, but from the view of motion, video frames can be decomposed into salient and non-salient areas naturally. Salient and non-salient areas should be modeled with different networks, for the former present both appearance and motion information, and the latter present static background information. To address this problem, in this paper, video saliency is predicted by optical flow without supervision firstly. Then two streams of 3D CNN are trained individually for raw frames and optical flow on salient areas, and another 2D CNN is trained for raw frames on non-salient areas. For the reason that these three streams play different roles for each class, the weights of each stream are adaptively learned for each class. Experimental results show that saliency-guided modeling and adaptively weighted learning can reinforce each other, and we achieve the state-of-the-art results.
Mar 16 2017 cs.CV
Source camera identification is still a hard task in forensics community, especially for the case of the small query image size. In this paper, we propose a solution to identify the source camera of the small-size images: content-adaptive fusion network. In order to learn better feature representation from the input data, content-adaptive convolutional neural networks(CA-CNN) are constructed. We add a convolutional layer in preprocessing stage. Moreover, with the purpose of capturing more comprehensive information, we parallel three CA-CNNs: CA3-CNN, CA5-CNN, CA7-CNN to get the content-adaptive fusion network. The difference of three CA-CNNs lies in the convolutional kernel size of pre-processing layer. The experimental results show that the proposed method is practicable and satisfactory.
Mar 14 2017 cs.CV
In this paper, we propose an efficient super-resolution (SR) method based on deep convolutional neural network (CNN), namely gradual upsampling network (GUN). Recent CNN based SR methods either preliminarily magnify the low resolution (LR) input to high resolution (HR) and then reconstruct the HR input, or directly reconstruct the LR input and then recover the HR result at the last layer. The proposed GUN utilizes a gradual process instead of these two kinds of frameworks. The GUN consists of an input layer, multistep upsampling and convolutional layers, and an output layer. By means of the gradual process, the proposed network can simplify the difficult direct SR problem to multistep easier upsampling tasks with very small magnification factor in each step. Furthermore, a gradual training strategy is presented for the GUN. In the proposed training process, an initial network can be easily trained with edge-like samples, and then the weights are gradually tuned with more complex samples. The GUN can recover fine and vivid results, and is easy to be trained. The experimental results on several image sets demonstrate the effectiveness of the proposed network.
Mar 14 2017 cs.CV
Recent learning-based super-resolution (SR) methods often focus on the dictionary learning or network training. In this paper, we detailedly discuss a new SR framework based on local classification instead of traditional dictionary learning. The proposed efficient and extendible SR framework is named as local patch classification (LPC) based framework. The LPC framework consists of a learning stage and a reconstructing stage. In the learning stage, image patches are classified into different classes by means of the proposed local patch encoding (LPE), and then a projection matrix is computed for each class by utilizing a simple constraint. In the reconstructing stage, an input LR patch can be simply reconstructed by computing its LPE code and then multiplying corresponding projection matrix. Furthermore, we establish the relationship between the proposed method and the anchored neighborhood regression based methods; and we also analyze the extendibility of the proposed framework. The experimental results on several image sets demonstrate the effectiveness of the proposed framework.
Mar 13 2017 cs.CV
This paper proposes a new evaluation protocol for cross-media retrieval which better fits the real-word applications. Both image-text and text-image retrieval modes are considered. Traditionally, class labels in the training and testing sets are identical. That is, it is usually assumed that the query falls into some pre-defined classes. However, in practice, the content of a query image/text may vary extensively, and the retrieval system does not necessarily know in advance the class label of a query. Considering the inconsistency between the real-world applications and laboratory assumptions, we think that the existing protocol that works under identical train/test classes can be modified and improved. This work is dedicated to addressing this problem by considering the protocol under an extendable scenario, \ie, the training and testing classes do not overlap. We provide extensive benchmarking results obtained by the existing protocol and the proposed new protocol on several commonly used datasets. We demonstrate a noticeable performance drop when the testing classes are unseen during training. Additionally, a trivial solution, \ie, directly using the predicted class label for cross-media retrieval, is tested. We show that the trivial solution is very competitive in traditional non-extendable retrieval, but becomes less so under the new settings. The train/test split, evaluation code, and benchmarking results are publicly available on our website.
Mar 09 2017 cs.CV
Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g., they often meet difficulties in locating the beginning and end of a long complex action. In this paper, we propose a generic framework that can accurately detect a wide variety of activities from untrimmed videos. Our first contribution is a novel proposal scheme that can efficiently generate candidates with accurate temporal boundaries. The other contribution is a cascaded classification pipeline that explicitly distinguishes between relevance and completeness of a candidate instance. On two challenging temporal activity detection datasets, THUMOS14 and ActivityNet, the proposed framework significantly outperforms the existing state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.
We propose a novel random triggering based modulated wideband compressive sampling (RT-MWCS) method to facilitate efficient realization of sub-Nyquist rate compressive sampling systems for sparse wideband signals. Under the assumption that the signal is repetitively (not necessarily periodically) triggered, RT-MWCS uses random modulation to obtain measurements of the signal at randomly chosen positions. It uses multiple measurement vector method to estimate the non-zero supports of the signal in the frequency domain. Then, the signal spectrum is solved using least square estimation. The distinct ability of estimating sparse multiband signal is facilitated with the use of level triggering and time to digital converter devices previously used in random equivalent sampling (RES) scheme. Compared to the existing compressive sampling (CS) techniques, such as modulated wideband converter (MWC), RT-MWCS is with simple system architecture and can be implemented with one channel at the cost of more sampling time. Experimental results indicate that, for sparse multiband signal with unknown spectral support, RT-MWCS requires a sampling rate much lower than Nyquist rate, while giving great quality of signal reconstruction.
The radio interferometric positioning system (RIPS) is an accurate node localization method featuring a novel phase-based ranging process. Multipath is the limiting error source for RIPS in ground-deployed scenarios or indoor applications. There are four distinct channels involved in the ranging process for RIPS. Multipath reflections affect both the phase and amplitude of the ranging signal for each channel. By exploiting untapped amplitude information, we put forward a scheme to estimate each channel's multipath profile, which is then subsequently used to correct corresponding errors in phase measurements. Simulations show that such a scheme is very effective in reducing multipath phase errors, which are essentially brought down to the level of receiver noise under moderate multipath conditions. It is further demonstrated that ranging errors in RIPS are also greatly reduced via the proposed scheme.
Assurance of information-flow security by formal methods is mandated in security certification of separation kernels. As an industrial standard for improving safety, ARINC 653 has been complied with by mainstream separation kernels. Due to the new trend of integrating safe and secure functionalities into one separation kernel, security analysis of ARINC 653 as well as a formal specification with security proofs are thus significant for the development and certification of ARINC 653 compliant Separation Kernels (ARINC SKs). This paper presents a specification development and security analysis method for ARINC SKs based on refinement. We propose a generic security model and a stepwise refinement framework. Two levels of functional specification are developed by the refinement. A major part of separation kernel requirements in ARINC 653 are modeled, such as kernel initialization, two-level scheduling, partition and process management, and inter-partition communication. The formal specification and its security proofs are carried out in the Isabelle/HOL theorem prover. We have reviewed the source code of one industrial and two open-source ARINC SK implementations, i.e. VxWorks 653, XtratuM, and POK, in accordance with the formal specification. During the verification and code review, six security flaws, which can cause information leakage, are found in the ARINC 653 standard and the implementations.
Feb 07 2017 cs.DC
This paper made a short review of Cloud Computing and Big Data, and discussed the portability of general data mining algorithms to Cloud Computing platform. It revealed the Cloud Computing platform based on Map-Reduce cannot solve all the Big Data and data mining problems. Transplanting the general data mining algorithms to the real-time Cloud Computing platform will be one of the research focuses in Cloud Computing and Big Data.
The cospark of a matrix is the cardinality of the sparsest vector in the column space of the matrix. Computing the cospark of a matrix is well known to be an NP hard problem. Given the sparsity pattern (i.e., the locations of the non-zero entries) of a matrix, if the non-zero entries are drawn from independently distributed continuous probability distributions, we prove that the cospark of the matrix equals, with probability one, to a particular number termed the generic cospark of the matrix. The generic cospark also equals to the maximum cospark of matrices consistent with the given sparsity pattern. We prove that the generic cospark of a matrix can be computed in polynomial time, and offer an algorithm that achieves this.
Humanoid robots are increasingly demanded to operate in interactive and human-surrounded environments while achieving sophisticated locomotion and manipulation tasks. To accomplish these tasks, roboticists unremittingly seek for advanced methods that generate whole-body coordination behaviors and meanwhile fulfill various planning and control objectives. Undoubtedly, these goals pose fundamental challenges to the robotics and control community. To take an incremental step towards reducing the performance gap between theoretical foundations and real implementations, we present a planning and control framework for the humanoid, especially legged robots, for achieving high performance and generating agile motions. A particular concentration is on the robust, optimal and real-time performance. This framework constitutes three hierarchical layers: First, we present a robust optimal phase-space planning framework for dynamic legged locomotion over rough terrain. This framework is a hybrid motion planner incorporating a series of pivotal components. Second, we take a step toward formally synthesizing high-level reactive planners for whole-body locomotion in constrained environments. We formulate a two-player temporal logic game between the contact planner and its possibly-adversarial environment. Third, we propose a distributed control architecture for the latency-prone humanoid robotic systems. A central experimental phenomenon is observed that the stability of high impedance distributed controllers is highly sensitive to damping feedback delay but much less to stiffness feedback delay. We pursue a detailed analysis of the distributed controllers where damping feedback effort is executed in proximity to the control plant, and stiffness feedback effort is implemented in a latency-prone centralized control process.
Jan 11 2017 cs.NI
Today's WiFi networks deliver a large fraction of traffic. However, the performance and quality of WiFi networks are still far from satisfactory. Among many popular quality metrics (throughput, latency), the probability of successfully connecting to WiFi APs and the time cost of the WiFi connection set-up process are the two of the most critical metrics that affect WiFi users' experience. To understand the WiFi connection set-up process in real-world settings, we carry out measurement studies on $5$ million mobile users from $4$ representative cities associating with $7$ million APs in $0.4$ billion WiFi sessions, collected from a mobile "WiFi Manager" App that tops the Android/iOS App market. To the best of our knowledge, we are the first to do such large scale study on: how large the WiFi connection set-up time cost is, what factors affect the WiFi connection set-up process, and what can be done to reduce the WiFi connection set-up time cost. Based on the measurement analysis, we develop a machine learning based AP selection strategy that can significantly improve WiFi connection set-up performance, against the conventional strategy purely based on signal strength, by reducing the connection set-up failures from $33\%$ to $3.6\%$ and reducing $80\%$ time costs of the connection set-up processes by more than $10$ times.
Jan 09 2017 cs.SE
Separation kernels provide temporal/spatial separation and controlled information flow to their hosted applications. They are introduced to decouple the analysis of applications in partitions from the analysis of the kernel itself. More than 20 implementations of separation kernels have been developed and widely applied in critical domains, e.g., avionics/aerospace, military/defense, and medical devices. Formal methods are mandated by the security/safety certification of separation kernels and have been carried out since this concept emerged. However, this field lacks a survey to systematically study, compare, and analyze related work. On the other hand, high-assurance separation kernels by formal methods still face big challenges. In this paper, an analytical framework is first proposed to clarify the functionalities, implementations, properties and standards, and formal methods application of separation kernels. Based on the proposed analytical framework, a taxonomy is designed according to formal methods application, functionalities, and properties of separation kernels. Research works in the literature are then categorized and overviewed by the taxonomy. In accordance with the analytical framework, a comprehensive analysis and discussion of related work are presented. Finally, four challenges and their possible technical directions for future research are identified, e.g. specification bottleneck, multicore and concurrency, and automation of full formal verification.
Jan 09 2017 cs.NI
A virtual network (VN) contains a collection of virtual nodes and links assigned to underlying physical resources in a network substrate. VN migration is the process of remapping a VN's logical topology to a new set of physical resources to provide failure recovery, energy savings, or defense against attack. Providing VN migration that is transparent to running applications is a significant challenge. Efficient migration mechanisms are highly dependent on the technology deployed in the physical substrate. Prior work has considered migration in data centers and in the PlanetLab infrastructure. However, there has been little effort targeting an SDN-enabled wide-area networking environment - an important building block of future networking infrastructure. In this work, we are interested in the design, implementation and evaluation of VN migration in GENI as a working example of such a future network. We identify and propose techniques to address key challenges: the dynamic allocation of resources during migration, managing hosts connected to the VN, and flow table migration sequences to minimize packet loss. We find that GENI's virtualization architecture makes transparent and efficient migration challenging. We suggest alternatives that might be adopted in GENI and are worthy of adoption by virtual network providers to facilitate migration.
A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetically discriminative/speaker discriminative DNNs as feature extractors for speaker verification has shown promising results. The extracted frame-level (DNN bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation (d-vector or i-vector). In this work we use speaker discriminative CNNs to extract the noise-robust frame-level features. These features are smartly combined to form an utterance-level speaker vector through an attention mechanism. The proposed attention model takes the speaker discriminative information and the phonetic information to learn the weights. The whole system, including the CNN and attention model, is joint optimized using an end-to-end criterion. The training algorithm imitates exactly the evaluation process --- directly mapping a test utterance and a few target speaker utterances into a single verification score. The algorithm can automatically select the most similar impostor for each target speaker to train the network. We demonstrated the effectiveness of the proposed end-to-end system on Windows $10$ "Hey Cortana" speaker verification task.
Aggregating statistically diverse renewable power producers (RPPs) is an effective way to reduce the uncertainty of the RPPs. The key question in aggregation of RPPs is how to allocate payoffs among the RPPs. In this paper, a payoff allocation mechanism (PAM) with a simple closed-form expression is proposed: It achieves stability (in the core) and fairness both in the "ex-post" sense, i.e., for all possible realizations of renewable power generation. Furthermore, this PAM can in fact be derived from the competitive equilibrium in a market. The proposed PAM is evaluated in a simulation study with ten wind power producers in the PJM interconnection.
This paper considers a multi-pair two-way amplify-and-forward relaying system, where multiple pairs of full-duplex users are served via a full-duplex relay with massive antennas, and the relay adopts maximum-ratio combining/maximum-ratio transmission (MRC/MRT) processing. The orthogonal pilot scheme and the least square method are firstly exploited to estimate the channel state information (CSI). When the number of relay antennas is finite, we derive an approximate sum rate expression which is shown to be a good predictor of the ergodic sum rate, especially in large number of antennas. Then the corresponding achievable rate expression is obtained by adopting another pilot scheme which estimates the composite CSI for each user pair to reduce the pilot overhead of channel estimation. We analyze the achievable rates of the two pilot schemes and then show the relative merits of the two methods. Furthermore, power allocation strategies for users and the relay are proposed based on sum rate maximization and max-min fairness criterion, respectively. Finally, numerical results verify the accuracy of the analytical results and show the performance gains achieved by the proposed power allocation.
It is starting to become a big trend in the era of social networking that people produce and upload user-generated contents to Internet via wireless networks, bringing a significant burden on wireless uplink networks. In this paper, we contribute to designing and theoretical understanding of wireless cache-enabled upload transmission in a delay-tolerant small cell network to relieve the burden, and then propose the corresponding scheduling policies for the small base station (SBS) under the infinite and finite cache sizes. Specifically, the cache ability introduced by SBS enables SBS to eliminate the redundancy among the upload contents from users. This strategy not only alleviates the wireless backhual traffic congestion from SBS to a macro base station (MBS), but also improves the transmission efficiency of SBS. We then investigate the scheduling schemes of SBS to offload more data traffic under caching size constraint. Moreover, two operational regions for the wireless cache-enabled upload network, namely, the delay-limited region and the cache-limited region, are established to reveal the fundamental tradeoff between the delay tolerance and the cache ability. Finally, numerical results are provided to demonstrate the significant performance gains of the proposed wireless cache-enabled upload network.
Dec 13 2016 cs.NI
In this paper, we consider the stochastic network with dynamic traffic. The spatial distribution of access points (APs) and users are first modeled as mutually independent Poisson point processes (PPPs). Different from most previous literatures which assume all the APs are fully loaded, we consider the fact that APs having no data to transmit do not generate interference to users. The APs opportunistically share the channel according to the existence of the packet to be transmitted and the proposed interference suppression strategy. In the interference suppression region, only one AP can be active at a time to transmit the packet on the channel and the other adjacent APs keep silent to reduce serious interference. The idle probability of any AP, influenced by the traffic load and availability of the channels, is analyzed. The density of simultaneously active APs in the network is obtained and the packet loss rate is further elaborated. We reveal the impacts of network features (e.g., AP density, user density and channel state) and service features (e.g., user request, packet size) on the network performance. Simulation results validate our proposed model.
Quantum computing has undergone rapid development in recent years. Owing to limitations on scalability, personal quantum computers still seem slightly unrealistic in the near future. The first practical quantum computer for ordinary users is likely to be on the cloud. However, the adoption of cloud computing is possible only if security is ensured. Homomorphic encryption is a cryptographic protocol that allows computation to be performed on encrypted data without decrypting them, so it is well suited to cloud computing. Here, we first applied homomorphic encryption on IBM's cloud quantum computer platform. In our experiments, we successfully implemented a quantum algorithm for linear equations while protecting our privacy. This demonstration opens a feasible path to the next stage of development of cloud quantum information technology.
The omnipresence of deep learning architectures such as deep convolutional neural networks (CNN)s is fueled by the synergistic combination of ever-increasing labeled datasets and specialized hardware. Despite the indisputable success, the reliance on huge amounts of labeled data and specialized hardware can be a limiting factor when approaching new applications. To help alleviating these limitations, we propose an efficient learning strategy for layer-wise unsupervised training of deep CNNs on conventional hardware in acceptable time. Our proposed strategy consists of randomly convexifying the reconstruction contractive auto-encoding (RCAE) learning objective and solving the resulting large-scale convex minimization problem in the frequency domain via coordinate descent (CD). The main advantages of our proposed learning strategy are: (1) single tunable optimization parameter; (2) fast and guaranteed convergence; (3) possibilities for full parallelization. Numerical experiments show that our proposed learning strategy scales (in the worst case) linearly with image size, number of filters and filter size.
In this work, we abstract some key ingredients in previous LWE- and RLWE-based key exchange protocols, by introducing and formalizing the building tool, referred to as key consensus (KC) and its asymmetric variant AKC. KC and AKC allow two communicating parties to reach consensus from close values obtained by some secure information exchange. We then discover upper bounds on parameters for any KC and AKC. KC and AKC are fundamental to lattice based cryptography, in the sense that a list of cryptographic primitives based on LWR, LWE and RLWE (including key exchange, public-key encryption, and more) can be modularly constructed from them. As a conceptual contribution, this much simplifies the design and analysis of these cryptosystems in the future. We then design and analyze both general and highly practical KC and AKC schemes, which are referred to as OKCN and AKCN respectively for presentation simplicity. Based on KC and AKC, we present generic constructions of key exchange (KE) from LWR, LWE and RLWE. The generic construction allows versatile instantiations with our OKCN and AKCN schemes, for which we elaborate on evaluating and choosing the concrete parameters in order to achieve an optimally-balanced performance among security, computational cost, bandwidth efficiency, error rate, and operation simplicity.
In this paper, we propose to exploit the limited cache packets as side information to cancel incoming interference at the receiver side. We consider a stochastic network where the random locations of base stations and users are modeled using Poisson point processes. Caching schemes to reap both the local caching gain and the interference cancellation gain for the users are developed based on two factors: the density of different user subsets and the packets cached in the corresponding subsets. The packet loss rate (PLR) is analyzed, which depends on both the cached packets and the channel state information (CSI) available at the receiver. Theoretical results reveal the tradeoff between caching resource and wireless resource. The performance for different caching schemes are analyzed and the minimum achievable PLR for the distributed caching is derived.
Mobile sensing applications usually require time-series inputs from sensors. Some applications, such as tracking, can use sensed acceleration and rate of rotation to calculate displacement based on physical system models. Other applications, such as activity recognition, extract manually designed features from sensor inputs for classification. Such applications face two challenges. On one hand, on-device sensor measurements are noisy. For many mobile applications, it is hard to find a distribution that exactly describes the noise in practice. Unfortunately, calculating target quantities based on physical system and noise models is only as accurate as the noise assumptions. Similarly, in classification applications, although manually designed features have proven to be effective, it is not always straightforward to find the most robust features to accommodate diverse sensor noise patterns and user behaviors. To this end, we propose DeepSense, a deep learning framework that directly addresses the aforementioned noise and feature customization challenges in a unified manner. DeepSense integrates convolutional and recurrent neural networks to exploit local interactions among similar mobile sensors, merge local interactions of different sensory modalities into global interactions, and extract temporal relationships to model signal dynamics. DeepSense thus provides a general signal estimation and classification framework that accommodates a wide range of applications. We demonstrate the effectiveness of DeepSense using three representative and challenging tasks: car tracking with motion sensors, heterogeneous human activity recognition, and user identification with biometric motion analysis. DeepSense significantly outperforms the state-of-the-art methods for all three tasks. In addition, DeepSense is feasible to implement on smartphones due to its moderate energy consumption and low latency
Scalable and automatic formal verification for concurrent systems is always demanding, but yet to be developed. In this paper, we propose a verification framework to support automated compositional reasoning for concurrent programs with shared variables. Our framework models concurrent programs as succinct automata and supports the verification of multiple important properties. Safety verification and simulations of succinct automata are parallel compositional, and safety properties of succinct automata are preserved under refinements. Formal verification of finite state succinct automata can be automated. Furthermore, we propose the first automated approach to checking rely-guarantee based simulations between infinite state concurrent programs. We have prototyped our algorithm and applied our tool to the verification of multiple refinements.
Nov 03 2016 cs.DC
Users of cloud computing platforms pose different types of demands for multiple resources on servers (physical or virtual machines). Besides differences in their resource capacities, servers may be additionally heterogeneous in their ability to service users - certain users' tasks may only be serviced by a subset of the servers. We identify important shortcomings in existing multi-resource fair allocation mechanisms - Dominant Resource Fairness (DRF) and its follow up work - when used in such environments. We develop a new fair allocation mechanism called Per-Server Dominant-Share Fairness (PS-DSF) which we show offers all desirable sharing properties that DRF is able to offer in the case of a single "resource pool" (i.e., if the resources of all servers were pooled together into one hypothetical server). We evaluate the performance of PS-DSF through simulations. Our evaluation shows the enhanced efficiency of PS-DSF compared to the existing allocation mechanisms. We argue how our proposed allocation mechanism is applicable in cloud computing networks and especially large scale data-centers.
Oct 27 2016 cs.CV
A head-mounted display (HMD) could be an important component of augmented reality system. However, as the upper face region is seriously occluded by the device, the user experience could be affected in applications such as telecommunication and multi-player video games. In this paper, we first present a novel experimental setup that consists of two near-infrared (NIR) cameras to point to the eye regions and one visible-light RGB camera to capture the visible face region. The main purpose of this paper is to synthesize realistic face images without occlusions based on the images captured by these cameras. To this end, we propose a novel synthesis framework that contains four modules: 3D head reconstruction, face alignment and tracking, face synthesis, and eye synthesis. In face synthesis, we propose a novel algorithm that can robustly align and track a personalized 3D head model given a face that is severely occluded by the HMD. In eye synthesis, in order to generate accurate eye movements and dynamic wrinkle variations around eye regions, we propose another novel algorithm to colorize the NIR eye images and further remove the "red eye" effects caused by the colorization. Results show that both hardware setup and system framework are robust to synthesize realistic face images in video sequences.
Oct 26 2016 cs.CV
Identifying user's identity is a key problem in many data mining applications, such as product recommendation, customized content delivery and criminal identification. Given a set of accounts from the same or different social network platforms, user identification attempts to identify all accounts belonging to the same person. A commonly used solution is to build the relationship among different accounts by exploring their collective patterns, e.g., user profile, writing style, similar comments. However, this kind of method doesn't work well in many practical scenarios, since the information posted explicitly by users may be false due to various reasons. In this paper, we re-inspect the user identification problem from a novel perspective, i.e., identifying user's identity by matching his/her cameras. The underlying assumption is that multiple accounts belonging to the same person contain the same or similar camera fingerprint information. The proposed framework, called User Camera Identification (UCI), is based on camera fingerprints, which takes fully into account the problems of multiple cameras and reposting behaviors.
Oct 20 2016 cs.CV
For most hyperspectral remote sensing applications, removing bad bands, such as water absorption bands, is a required preprocessing step. Currently, the commonly applied method is by visual inspection, which is very time-consuming and it is easy to overlook some noisy bands. In this study, we find an inherent connection between target detection algorithms and the corrupted band removal. As an example, for the matched filter (MF), which is the most widely used target detection method for hyperspectral data, we present an automatic MF-based algorithm for bad band identification. The MF detector is a filter vector, and the resulting filter output is the sum of all bands weighted by the MF coefficients. Therefore, we can identify bad bands only by using the MF filter vector itself, the absolute value of whose entry accounts for the importance of each band for the target detection. For a specific target of interest, the bands with small MF weights correspond to the noisy or bad ones. Based on this fact, we develop an automatic bad band preremoval algorithm by utilizing the average absolute value of MF weights for multiple targets within a scene. Experiments with three well known hyperspectral datasets show that our method can always identify the water absorption and other low signal-to-noise (SNR) bands that are usually chosen as bad bands manually.
Sep 13 2016 cs.SI
In medical research, economics, and the social sciences data frequently appear as subsets of a set of objects. Over the past century a number of descriptive statistics have been developed to construct network structure from such data. However, these measures lack a generating mechanism that links the inferred network structure to the observed groups. To address this issue, we propose a model-based approach called the Hub Model which assumes that every observed group has a leader and that the leader has brought together the other members of the group. The performance of Hub Models is demonstrated by simulation studies. We apply this model to infer the relationships among Senators serving in the 110th United States Congress, the characters in a famous 18th century Chinese novel, and the distribution of flora in North America.
Aug 02 2016 cs.CV
In this paper, we propose a novel face alignment method that trains deep convolutional network from coarse to fine. It divides given landmarks into principal subset and elaborate subset. We firstly keep a large weight for principal subset to make our network primarily predict their locations while slightly take elaborate subset into account. Next the weight of principal subset is gradually decreased until two subsets have equivalent weights. This process contributes to learn a good initial model and search the optimal model smoothly to avoid missing fairly good intermediate models in subsequent procedures. On the challenging COFW dataset , our method achieves 6.33% mean error with a reduction of 21.37% compared with the best previous result .
Jul 01 2016 cs.SY
This paper studies the distributed average tracking problem for multiple time-varying signals generated by linear dynamics, whose reference inputs are nonzero and not available to any agent in the network. In the edge-based framework, a pair of continuous algorithms with, respectively, static and adaptive coupling strengths are designed. Based on the boundary layer concept, the proposed continuous algorithm with static coupling strengths can asymptotically track the average of multiple reference signals without the chattering phenomenon. Furthermore, for the case of algorithms with adaptive coupling strengths, average tracking errors are uniformly ultimately bounded and exponentially converge to a small adjustable bounded set. Finally, a simulation example is presented to show the validity of theoretical results.
Jun 24 2016 cs.NI
There is an increasing trend that enterprises outsource their network functions to the cloud for lower cost and ease of management. However, network function outsourcing brings threats to the privacy of enterprises since the cloud is able to access the traffic and rules of in-cloud network functions. Current tools for secure network function outsourcing either incur large performance overhead or do not support real-time updates. In this paper, we present SICS, a secure service function chain outsourcing framework. SICS encrypts each packet header and use a label for in-cloud rule matching, which enables the cloud to perform its functionalities correctly with minimum header information leakage. Evaluation results show that SICS achieves higher throughput, faster construction and update speed, and lower resource overhead at both enterprise and cloud sides, compared to existing solutions.
This paper investigates the fixed-time consensus problem under directed topologies. By using a motion-planning approach, a class of distributed fixed-time algorithms are developed for a multi-agent system with double-integrator dynamics. In the context of the fixed-time consensus, we focus on both directed fixed and switching topologies. Under the directed fixed topology, a novel class of distributed algorithms are designed, which guarantee the consensus of the multi-agent system with a fixed settling time if the topology has a directed spanning tree. Under the directed periodically switching topologies, the fixedtime consensus is solved via the proposed algorithms if the topologies jointly have a directed spanning tree. In particular, the fixed settling time can be off-line pre-assigned according to task requirements. Compared with the existing results, to our best knowledge, it is the first time to solve the fixed-time consensus problem for double-integrator systems under directed topologies. Finally, a numerical example is given to illustrate the effectiveness of the analytical results.