May 15 2018 cs.CV
In this paper, we propose a general dual convolutional neural network (DualCNN) for low-level vision problems, e.g., super-resolution, edge-preserving filtering, deraining and dehazing. These problems usually involve the estimation of two components of the target signals: structures and details. Motivated by this, our proposed DualCNN consists of two parallel branches, which respectively recovers the structures and details in an end-to-end manner. The recovered structures and details can generate the target signals according to the formation model for each particular application. The DualCNN is a flexible framework for low-level vision tasks and can be easily incorporated into existing CNNs. Experimental results show that the DualCNN can be effectively applied to numerous low-level vision tasks with favorable performance against the state-of-the-art methods.
May 08 2018 cs.CV
Artistic style transfer can be thought as a process to generate different versions of abstraction of the original image. However, most of the artistic style transfer operators are not optimized for human faces thus mainly suffers from two undesirable features when applying them to selfies. First, the edges of human faces may unpleasantly deviate from the ones in the original image. Second, the skin color is far from faithful to the original one which is usually problematic in producing quality selfies. In this paper, we take a different approach and formulate this abstraction process as a gradient domain learning problem. We aim to learn a type of abstraction which not only achieves the specified artistic style but also circumvents the two aforementioned drawbacks thus highly applicable to selfie photography. We also show that our method can be directly generalized to videos with high inter-frame consistency. Our method is also robust to non-selfie images, and the generalization to various kinds of real-life scenes is discussed. We will make our code publicly available.
Mar 28 2018 cs.CV
Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.
Mar 20 2018 cs.CV
Despite the recent success of stereo matching with convolutional neural networks (CNNs), it remains arduous to generalize a pre-trained deep stereo model to a novel domain. A major difficulty is to collect accurate ground-truth disparities for stereo pairs in the target domain. In this work, we propose a self-adaptation approach for CNN training, utilizing both synthetic training data (with ground-truth disparities) and stereo pairs in the new domain (without ground-truths). Our method is driven by two empirical observations. By feeding real stereo pairs of different domains to stereo models pre-trained with synthetic data, we see that: i) a pre-trained model does not generalize well to the new domain, producing artifacts at boundaries and ill-posed regions; however, ii) feeding an up-sampled stereo pair leads to a disparity map with extra details. To avoid i) while exploiting ii), we formulate an iterative optimization problem with graph Laplacian regularization. At each iteration, the CNN adapts itself better to the new domain: we let the CNN learn its own higher-resolution output; at the meanwhile, a graph Laplacian regularization is imposed to discriminatively keep the desired edges while smoothing out the artifacts. We demonstrate the effectiveness of our method in two domains: daily scenes collected by smartphone cameras, and street views captured in a driving car.
Mar 08 2018 cs.CV
Previous monocular depth estimation methods take a single view and directly regress the expected results. Though recent advances are made by applying geometrically inspired loss functions during training, the inference procedure does not explicitly impose any geometrical constraint. Therefore these models purely rely on the quality of data and the effectiveness of learning to generalize. This either leads to suboptimal results or the demand of huge amount of expensive ground truth labelled data to generate reasonable results. In this paper, we show for the first time that the monocular depth estimation problem can be reformulated as two sub-problems, a view synthesis procedure followed by stereo matching, with two intriguing properties, namely i) geometrical constraints can be explicitly imposed during inference; ii) demand on labelled depth data can be greatly alleviated. We show that the whole pipeline can still be trained in an end-to-end fashion and this new formulation plays a critical role in advancing the performance. The resulting model outperforms all the previous monocular depth estimation methods as well as the stereo block matching method in the challenging KITTI dataset by only using a small number of real training data. The model also generalizes well to other monocular depth estimation benchmarks. We also discuss the implications and the advantages of solving monocular depth estimation using stereo methods.
Mar 02 2018 cs.SY
This paper studies the model of the probe-drogue aerial refueling system under aerodynamic disturbances, and proposes a docking control method based on terminal iterative learning control to compensate for the docking errors caused by aerodynamic disturbances. The designed controller works as an additional unit for the trajectory generation function of the original autopilot system. Simulations based on our previously published simulation environment show that the proposed control method has a fast learning speed to achieve a successful docking control under aerodynamic disturbances including the bow wave effect.
We present AliMe Assist, an intelligent assistant designed for creating an innovative online shopping experience in E-commerce. Based on question answering (QA), AliMe Assist offers assistance service, customer service, and chatting service. It is able to take voice and text input, incorporate context to QA, and support multi-round interaction. Currently, it serves millions of customer questions per day and is able to address 85% of them. In this paper, we demonstrate the system, present the underlying techniques, and share our experience in dealing with real-world QA in the E-commerce field.
Jan 10 2018 cs.CR
Cloud computing has changed the way enterprises store, access and share data. Data is constantly being uploaded to the cloud and shared within an organization built on a hierarchy of many different individuals that are given certain data access privileges. With more data storage needs turning over to the cloud, finding a secure and efficient data access structure has become a major research issue. With different access privileges, individuals with more privileges (at higher levels of the hierarchy) are granted access to more sensitive data than those with fewer privileges (at lower levels of the hierarchy). In this paper, a Privilege-based Multilevel Organizational Data-sharing scheme~(P-MOD) is proposed that incorporates a privilege-based access structure into an attribute-based encryption mechanism to handle these concerns. Each level of the privilege-based access structure is affiliated with an access policy that is uniquely defined by specific attributes. Data is then encrypted under each access policy at every level to grant access to specific data users based on their data access privileges. An individual ranked at a certain level can decrypt the ciphertext (at that specific level) if and only if that individual owns a correct set of attributes that can satisfy the access policy of that level. The user may also decrypt the ciphertexts at the lower levels with respect to the user's level. Security analysis shows that P-MOD is secure against adaptively chosen plaintext attack assuming the DBDH assumption holds.The comprehensive performance analysis demonstrates that P-MOD is more efficient in computational complexity and storage space than the existing schemes in secure data sharing within an organization.
Dec 19 2017 cs.CV
We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.
Nov 16 2017 cs.CR
The proliferation of online biometric authentication has necessitated security requirements of biometric templates. The existing secure biometric authentication schemes feature a server-centric model, where a service provider maintains a biometric database and is fully responsible for the security of the templates. The end-users have to fully trust the server in storing, processing and managing their private templates. As a result, the end-users' templates could be compromised by outside attackers or even the service provider itself. In this paper, we propose a user-centric biometric authentication scheme (PassBio) that enables end-users to encrypt their own templates with our proposed light-weighted encryption scheme. During authentication, all the templates remain encrypted such that the server will never see them directly. However, the server is able to determine whether the distance of two encrypted templates is within a pre-defined threshold. Our security analysis shows that no critical information of the templates can be revealed under both passive and active attacks. PassBio follows a "compute-then-compare" computational model over encrypted data. More specifically, our proposed Threshold Predicate Encryption (TPE) scheme can encrypt two vectors x and y in such a manner that the inner product of x and y can be evaluated and compared to a pre-defined threshold. TPE guarantees that only the comparison result is revealed and no key information about x and y can be learned. Furthermore, we show that TPE can be utilized as a flexible building block to evaluate different distance metrics such as Hamming distance and Euclidean distance over encrypted data. Such a compute-then-compare computational model, enabled by TPE, can be widely applied in many interesting applications such as searching over encrypted data while ensuring data security and privacy.
Oct 11 2017 cs.DC
Web browsing is an activity that billions of mobile users perform on a daily basis. Battery life is a primary concern to many mobile users who often find their phone has died at most inconvenient times. The heterogeneous multi-core architecture is a solution for energy-efficient processing. However, the current mobile web browsers rely on the operating system to exploit the underlying hardware, which has no knowledge of individual web contents and often leads to poor energy efficiency. This paper describes an automatic approach to render mobile web workloads for performance and energy efficiency. It achieves this by developing a machine learning based approach to predict which processor to use to run the web rendering engine and at what frequencies the processors should operate. Our predictor learns offline from a set of training web workloads. The built predictor is then integrated into the browser to predict the optimal processor configuration at runtime, taking into account the web workload characteristics and the optimisation goal: whether it is load time, energy consumption or a trade-off between them. We evaluate our approach on a representative ARM big.LITTLE mobile architecture using the hottest 500 webpages. Our approach achieves 80% of the performance delivered by an ideal predictor. We obtain, on average, 45%, 63.5% and 81% improvement respectively for load time, energy consumption and the energy delay product, when compared to the Linux heterogeneous multi-processing scheduler.
Oct 10 2017 cs.CV
For Hyperspectral image (HSI) datasets, each class have their salient feature and classifiers classify HSI datasets according to the class's saliency features, however, there will be different salient features when use different normalization method. In this letter, we report the effect on classifiers by different normalization methods and recommend the best normalization methods for classifier after analyzing the impact of different normalization methods on classifiers. Pavia University datasets, Indian Pines datasets and Kennedy Space Center datasets will apply to several typical classifiers in order to evaluate and analysis the impact of different normalization methods on typical classifiers.
Oct 03 2017 cs.CV
In this paper, we introduce a bilinear composition loss function to address the problem of image dehazing. Previous methods in image dehazing use a two-stage approach which first estimate the transmission map followed by clear image estimation. The drawback of a two-stage method is that it tends to boost local image artifacts such as noise, aliasing and blocking. This is especially the case for heavy haze images captured with a low quality device. Our method is based on convolutional neural networks. Unique in our method is the bilinear composition loss function which directly model the correlations between transmission map, clear image, and atmospheric light. This allows errors to be back-propagated to each sub-network concurrently, while maintaining the composition constraint to avoid overfitting of each sub-network. We evaluate the effectiveness of our proposed method using both synthetic and real world examples. Extensive experiments show that our method outperfoms state-of-the-art methods especially for haze images with severe noise level and compressions.
This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent with that reported in previous study.
Sep 13 2017 cs.CV
Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the op-timized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to effi-ciently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. The variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. Third, the spatial information is extracted using the weighted compo-site features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches.
Sep 11 2017 cs.CV
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
As a new machine learning approach, extreme learning machine (ELM) has received wide attentions due to its good performances. However, when directly applied to the hyperspectral image (HSI) classification, the recognition rate is too low. This is because ELM does not use the spatial information which is very important for HSI classification. In view of this, this paper proposes a new framework for spectral-spatial classification of HSI by combining ELM with loopy belief propagation (LBP). The original ELM is linear, and the nonlinear ELMs (or Kernel ELMs) are the improvement of linear ELM (LELM). However, based on lots of experiments and analysis, we found out that the LELM is a better choice than nonlinear ELM for spectral-spatial classification of HSI. Furthermore, we exploit the marginal probability distribution that uses the whole information in the HSI and learn such distribution using the LBP. The proposed method not only maintain the fast speed of ELM, but also greatly improves the accuracy of classification. The experimental results in the well-known HSI data sets, Indian Pines and Pavia University, demonstrate the good performances of the proposed method.
We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$ regularized minimization problem. The communication cost is reduced from constant times of the dimension number for the state-of-the-art algorithm to constant times of the sparsity number via Two-way Truncation procedure. Theoretically, we prove that the estimation error of the proposed algorithm decreases exponentially and matches that of the centralized method under mild assumptions. Extensive experiments on both simulated data and real data verify that the proposed algorithm is efficient and has performance comparable with the centralized method on solving high-dimensional sparse learning problems.
Aug 31 2017 cs.CV
Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.
Aug 03 2017 cs.CV
Deep learning, e.g., convolutional neural networks (CNNs), has achieved great success in image processing and computer vision especially in high level vision applications such as recognition and understanding. However, it is rarely used to solve low-level vision problems such as image compression studied in this paper. Here, we move forward a step and propose a novel compression framework based on CNNs. To achieve high-quality image compression at low bit rates, two CNNs are seamlessly integrated into an end-to-end compression framework. The first CNN, named compact convolutional neural network (ComCNN), learns an optimal compact representation from an input image, which preserves the structural information and is then encoded using an image codec (e.g., JPEG, JPEG2000 or BPG). The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end. To make two CNNs effectively collaborate, we develop a unified end-to-end learning algorithm to simultaneously learn ComCNN and RecCNN, which facilitates the accurate reconstruction of the decoded image using RecCNN. Such a design also makes the proposed compression framework compatible with existing image coding standards. Experimental results validate that the proposed compression framework greatly outperforms several compression frameworks that use existing image coding standards with state-of-the-art deblocking or denoising post-processing methods.
May 31 2017 cs.CV
Recent advances in visual tracking showed that deep Convolutional Neural Networks (CNN) trained for image classification can be strong feature extractors for discriminative trackers. However, due to the drastic difference between image classification and tracking, extra treatments such as model ensemble and feature engineering must be carried out to bridge the two domains. Such procedures are either time consuming or hard to generalize well across datasets. In this paper we discovered that the internal structure of Region Proposal Network (RPN)'s top layer feature can be utilized for robust visual tracking. We showed that such property has to be unleashed by a novel loss function which simultaneously considers classification accuracy and bounding box quality. Without ensemble and any extra treatment on feature maps, our proposed method achieved state-of-the-art results on several large scale benchmarks including OTB50, OTB100 and VOT2016. We will make our code publicly available.
A key issue in the control of distributed discrete systems modeled as Markov decisions processes, is that often the state of the system is not directly observable at any single location in the system. The participants in the control scheme must share information with one another regarding the state of the system in order to collectively make informed control decisions, but this information sharing can be costly. Harnessing recent results from information theory regarding distributed function computation, in this paper we derive, for several information sharing model structures, the minimum amount of control information that must be exchanged to enable local participants to derive the same control decisions as an imaginary omniscient controller having full knowledge of the global state. Incorporating consideration for this amount of information that must be exchanged into the reward enables one to trade the competing objectives of minimizing this control information exchange and maximizing the performance of the controller. An alternating optimization framework is then provided to help find the efficient controllers and messaging schemes. A series of running examples from wireless resource allocation illustrate the ideas and design tradeoffs.
Apr 20 2017 cs.CV
Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive when evaluated in benchmarks consider mAP for high IoU thresholds. In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation. We achieved this by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are "deep in context". We evaluated our method in the challenging KITTI dataset which measures methods under IoU threshold of 0.7. We showed that with RRC, a single reduced VGG-16 based model already significantly outperformed all the previously published results. At the time this paper was written our models ranked the first in KITTI car detection (the hard level), the first in cyclist detection and the second in pedestrian detection. These results were not reached by the previous single stage methods. The code is publicly available.
By offloading intensive computation tasks to the edge cloud located at the cellular base stations, mobile-edge computation offloading (MECO) has been regarded as a promising means to accomplish the ambitious millisecond-scale end-to-end latency requirement of the fifth-generation networks. In this paper, we investigate the latency-minimization problem in a multi-user time-division multiple access MECO system with joint communication and computation resource allocation. Three different computation models are studied, i.e., local compression, edge cloud compression, and partial compression offloading. First, closed-form expressions of optimal resource allocation and minimum system delay for both local and edge cloud compression models are derived. Then, for the partial compression offloading model, we formulate a piecewise optimization problem and prove that the optimal data segmentation strategy has a piecewise structure. Based on this result, an optimal joint communication and computation resource allocation algorithm is developed. To gain more insights, we also analyze a specific scenario where communication resource is adequate while computation resource is limited. In this special case, the closed-form solution of the piecewise optimization problem can be derived. Our proposed algorithms are finally verified by numerical results, which show that the novel partial compression offloading model can significantly reduce the end-to-end latency.
Mar 20 2017 cs.CV
Paleness or pallor is a manifestation of blood loss or low hemoglobin concentrations in the human blood that can be caused by pathologies such as anemia. This work presents the first automated screening system that utilizes pallor site images, segments, and extracts color and intensity-based features for multi-class classification of patients with high pallor due to anemia-like pathologies, normal patients and patients with other abnormalities. This work analyzes the pallor sites of conjunctiva and tongue for anemia screening purposes. First, for the eye pallor site images, the sclera and conjunctiva regions are automatically segmented for regions of interest. Similarly, for the tongue pallor site images, the inner and outer tongue regions are segmented. Then, color-plane based feature extraction is performed followed by machine learning algorithms for feature reduction and image level classification for anemia. In this work, a suite of classification algorithms image-level classifications for normal (class 0), pallor (class 1) and other abnormalities (class 2). The proposed method achieves 86% accuracy, 85% precision and 67% recall in eye pallor site images and 98.2% accuracy and precision with 100% recall in tongue pallor site images for classification of images with pallor. The proposed pallor screening system can be further fine-tuned to detect the severity of anemia-like pathologies using controlled set of local images that can then be used for future benchmarking purposes.
Mar 10 2017 cs.NI
Internet or things (IoT) is changing our daily life rapidly. Although new technologies are emerging everyday and expanding their influence in this rapidly growing area, many classic theories can still find their places. In this paper, we study the important applications of the classic network coding theory in two important components of Internet of things, including the IoT core network, where data is sensed and transmitted, and the distributed cloud storage, where the data generated by the IoT core network is stored. First we propose an adaptive network coding (ANC) scheme in the IoT core network to improve the transmission efficiency. We demonstrate the efficacy of the scheme and the performance advantage over existing schemes through simulations. %Next we study the application of network coding in the distributed cloud storage. Next we introduce the optimal storage allocation problem in the network coding based distributed cloud storage, which aims at searching for the most reliable allocation that distributes the $n$ data components into $N$ data centers, given the failure probability $p$ of each data center. Then we propose a polynomial-time optimal storage allocation (OSA) scheme to solve the problem. Both the theoretical analysis and the simulation results show that the storage reliability could be greatly improved by the OSA scheme.
Top-$N$ recommender systems have been extensively studied. However, the sparsity of user-item activities has not been well resolved. While many hybrid systems were proposed to address the cold-start problem, the profile information has not been sufficiently leveraged. Furthermore, the heterogeneity of profiles between users and items intensifies the challenge. In this paper, we propose a content-based top-$N$ recommender system by learning the global term weights in profiles. To achieve this, we bring in PathSim, which could well measures the node similarity with heterogeneous relations (between users and items). Starting from the original TF-IDF value, the global term weights gradually converge, and eventually reflect both profile and activity information. To facilitate training, the derivative is reformulated into matrix form, which could easily be paralleled. We conduct extensive experiments, which demonstrate the superiority of the proposed method.
Feb 29 2016 cs.CR
Discrete exponential operation, such as modular exponentiation and scalar multiplication on elliptic curves, is a basic operation of many public-key cryptosystems. However, the exponential operations are considered prohibitively expensive for resource-constrained mobile devices. In this paper, we address the problem of secure outsourcing of exponentiation operations to one single untrusted server. Our proposed scheme (ExpSOS) only requires very limited number of modular multiplications at local mobile environment thus it can achieve impressive computational gain. ExpSOS also provides a secure verification scheme with probability approximately 1 to ensure that the mobile end-users can always receive valid results. The comprehensive analysis as well as the simulation results in real mobile device demonstrates that our proposed ExpSOS can significantly improve the existing schemes in efficiency, security and result verifiability. We apply ExpSOS to securely outsource several cryptographic protocols to show that ExpSOS is widely applicable to many cryptographic computations.
Feb 16 2016 cs.LG
Speaker identification refers to the task of localizing the face of a person who has the same identity as the ongoing voice in a video. This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable. In this paper, we describe a novel multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies both visual and auditory modalities from the beginning of each sequence input. The key idea is to extend the conventional LSTM by not only sharing weights across time steps, but also sharing weights across modalities. We show that modeling the temporal dependency across face and voice can significantly improve the robustness to content quality degradations and variations. We also found that our multimodal LSTM is robustness to distractors, namely the non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory dataset and showed that our system outperforms the state-of-the-art systems in speaker identification with lower false alarm rate and higher recognition accuracy.
Regenerating code is a class of code very suitable for distributed storage systems, which can maintain optimal bandwidth and storage space. Two types of important regenerating code have been constructed: the minimum storage regeneration (MSR) code and the minimum bandwidth regeneration (MBR) code. However, in hostile networks where adversaries can compromise storage nodes, the storage capacity of the network can be significantly affected. In this paper, we propose two optimal constructions of regenerating codes through rate-matching that can combat against this kind of adversaries in hostile networks: 2-layer rate-matched regenerating code and $m$-layer rate-matched regenerating code. For the 2-layer code, we can achieve the optimal storage efficiency for given system requirements. Our comprehensive analysis shows that our code can detect and correct malicious nodes with higher storage efficiency compared to the universally resilient regenerating code which is a straightforward extension of regenerating code with error detection and correction capability. Then we propose the $m$-layer code by extending the 2-layer code and achieve the optimal error correction efficiency by matching the code rate of each layer's regenerating code. We also demonstrate that the optimized parameter can achieve the maximum storage capacity under the same constraint. Compared to the universally resilient regenerating code, our code can achieve much higher error correction efficiency.
Nov 10 2015 cs.CR
Computation outsourcing is an integral part of cloud computing. It enables end-users to outsource their computational tasks to the cloud and utilize the shared cloud resources in a pay-per-use manner. However, once the tasks are outsourced, the end-users will lose control of their data, which may result in severe security issues especially when the data is sensitive. To address this problem, secure outsourcing mechanisms have been proposed to ensure security of the end-users' outsourced data. In this paper, we investigate outsourcing of general computational problems which constitute the mathematical basics for problems emerged from various fields such as engineering and finance. To be specific, we propose affine mapping based schemes for the problem transformation and outsourcing so that the cloud is unable to learn any key information from the transformed problem. Meanwhile, the overhead for the transformation is limited to an acceptable level compared to the computational savings introduced by the outsourcing itself. Furthermore, we develop cost-aware schemes to balance the trade-offs between end-users' various security demands and computational overhead. We also propose a verification scheme to ensure that the end-users will always receive a valid solution from the cloud. Our extensive complexity and security analysis show that our proposed Cost-Aware Secure Outsourcing (CASO) scheme is both practical and effective.
Distributed storage plays a crucial role in the current cloud computing framework. After the theoretical bound for distributed storage was derived by the pioneer work of the regenerating code, Reed-Solomon code based regenerating codes were developed. The RS code based minimum storage regeneration code (RS-MSR) and the minimum bandwidth regeneration code (RS-MBR) can achieve theoretical bounds on the MSR point and the MBR point respectively in code regeneration. They can also maintain the MDS property in code reconstruction. However, in the hostile network where the storage nodes can be compromised and the packets can be tampered with, the storage capacity of the network can be significantly affected. In this paper, we propose a Hermitian code based minimum storage regenerating (H-MSR) code and a minimum bandwidth regenerating (H-MBR) code. We first prove that our proposed Hermitian code based regenerating codes can achieve the theoretical bounds for MSR point and MBR point respectively. We then propose data regeneration and reconstruction algorithms for the H-MSR code and the H-MBR code in both error-free network and hostile network. Theoretical evaluation shows that our proposed schemes can detect the erroneous decodings and correct more errors in hostile network than the RS-MSR code and the RS-MBR code with the same code rate. Our analysis also demonstrates that the proposed H-MSR and H-MBR codes have lower computational complexity than the RS-MSR/RS-MBR codes in both code regeneration and code reconstruction.
Aug 04 2015 cs.SE
Good software cost prediction is important for effective project management such as budgeting, project planning and control. In this paper, we present an intelligent approach to software cost prediction. By integrating the neuro-fuzzy technique with the well-accepted COCOMO model, our approach can make the best use of both expert knowledge and historical project data. Its major advantages include learning ability, good interpretability, and robustness to imprecise and uncertain inputs. The validation using industry project data shows that the model greatly improves prediction accuracy in comparison with the COCOMO model.
Accurate estimation such as cost estimation, quality estimation and risk analysis is a major issue in management. We propose a patent pending soft computing framework to tackle this challenging problem. Our generic framework is independent of the nature and type of estimation. It consists of neural network, fuzzy logic, and an algorithmic estimation model. We made use of the Constructive Cost Model (COCOMO), Analysis of Variance (ANOVA), and Function Point Analysis as the algorithmic models and validated the accuracy of the Neuro-Fuzzy Algorithmic (NFA) Model in software cost estimation using industrial project data. Our model produces more accurate estimation than using an algorithmic model alone. We also discuss the prototypes of our tools that implement the NFA Model. We conclude with our roadmap and direction to enrich the model in tackling different estimation challenges.
Jul 23 2015 cs.NI
Wireless sensor networks (WSNs) operating in the license-free spectrum suffer from uncontrolled interference as those spectrum bands become increasingly crowded. The emerging cognitive radio sensor networks (CRSNs) provide a promising solution to address this challenge by enabling sensor nodes to opportunistically access licensed channels. However, since sensor nodes have to consume considerable energy to support CR functionalities, such as channel sensing and switching, the opportunistic channel accessing should be carefully devised for improving the energy efficiency in CRSN. To this end, we investigate the dynamic channel accessing problem to improve the energy efficiency for a clustered CRSN. Under the primary users' protection requirement, we study the resource allocation issues to maximize the energy efficiency of utilizing a licensed channel for intra-cluster and inter-cluster data transmission, respectively. With the consideration of the energy consumption in channel sensing and switching, we further determine the condition when sensor nodes should sense and switch to a licensed channel for improving the energy efficiency, according to the packet loss rate of the license-free channel. In addition, two dynamic channel accessing schemes are proposed to identify the channel sensing and switching sequences for intra-cluster and inter-cluster data transmission, respectively. Extensive simulation results demonstrate that the proposed channel accessing schemes can significantly reduce the energy consumption in CRSNs.
Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online.
Jul 14 2015 cs.AR
Hybrid memory systems comprised of dynamic random access memory (DRAM) and non-volatile memory (NVM) have been proposed to exploit both the capacity advantage of NVM and the latency and dynamic energy advantages of DRAM. An important problem for such systems is how to place data between DRAM and NVM to improve system performance. In this paper, we devise the first mechanism, called UBM (page Utility Based hybrid Memory management), that systematically estimates the system performance benefit of placing a page in DRAM versus NVM and uses this estimate to guide data placement. UBM's estimation method consists of two major components. First, it estimates how much an application's stall time can be reduced if the accessed page is placed in DRAM. To do this, UBM comprehensively considers access frequency, row buffer locality, and memory level parallelism (MLP) to estimate the application's stall time reduction. Second, UBM estimates how much each application's stall time reduction contributes to overall system performance. Based on this estimation method, UBM can determine and place the most critical data in DRAM to directly optimize system performance. Experimental results show that UBM improves system performance by 14% on average (and up to 39%) compared to the best of three state-of-the-art mechanisms for a large number of data-intensive workloads from the SPEC CPU2006 and Yahoo Cloud Serving Benchmark (YCSB) suites.
It is well known that apps running on mobile devices extensively track and leak users' personally identifiable information (PII); however, these users have little visibility into PII leaked through the network traffic generated by their devices, and have poor control over how, when and where that traffic is sent and handled by third parties. In this paper, we present the design, implementation, and evaluation of ReCon: a cross-platform system that reveals PII leaks and gives users control over them without requiring any special privileges or custom OSes. ReCon leverages machine learning to reveal potential PII leaks by inspecting network traffic, and provides a visualization tool to empower users with the ability to control these leaks via blocking or substitution of PII. We evaluate ReCon's effectiveness with measurements from controlled experiments using leaks from the 100 most popular iOS, Android, and Windows Phone apps, and via an IRB-approved user study with 92 participants. We show that ReCon is accurate, efficient, and identifies a wider range of PII than previous approaches.
May 19 2015 cs.CY
Under the background of the new media era with the rapid development of interactive advertising, this paper used case study method based on the summary of the research of the communication effect of interactive advertising from both domestic and foreign academia. This paper divided interactive advertising into three types to examine ---- interactive ads on official website, interactive ads based on SNS and interactive ads based on mobile media. Furthermore, this paper induced and summarized a self-enhanced dissemination mechanism of the interactive advertising, including three parts which are micro level, meso level and macro level mechanism, micro level embodies core interaction, inner interaction and outer interaction which reveal the whole process of interact with contents, with people and with computer, and the communication approach and spread speed shown in meso level which is self-fission-type spread, finally in macro level the communication effect of IA achieved the spiral increasing. In a word, this article enriches research procedure of the interactive advertising communication effects.
In many resource allocation problems, a centralized controller needs to award some resource to a user selected from a collection of distributed users with the goal of maximizing the utility the user would receive from the resource. This can be modeled as the controller computing an extremum of the distributed users' utilities. The overhead rate necessary to enable the controller to reproduce the users' local state can be prohibitively high. An approach to reduce this overhead is interactive communication wherein rate savings are achieved by tolerating an increase in delay. In this paper, we consider the design of a simple achievable scheme based on successive refinements of scalar quantization at each user. The optimal quantization policy is computed via a dynamic program and we demonstrate that tolerating a small increase in delay can yield significant rate savings. We then consider two simpler quantization policies to investigate the scaling properties of the rate-delay trade-offs. Using a combination of these simpler policies, the performance of the optimal policy can be closely approximated with lower computational costs.
Jan 30 2015 cs.CV
We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep network architectures, the vector processing hardware (e.g. GPU) undisputedly plays a vital role in modern CNN implementations to support massive computation. Though much attention was paid in the extent literature to understand the algorithmic side of deep CNN, little research was dedicated to the vectorization for scaling up CNNs. In this paper, we studied the vectorization process of key building blocks in deep CNNs, in order to better understand and facilitate parallel implementation. Key steps in training and testing deep CNNs are abstracted as matrix and vector operators, upon which parallelism can be easily achieved. We developed and compared six implementations with various degrees of vectorization with which we illustrated the impact of vectorization on the speed of model training and testing. Besides, a unified CNN framework for both high-level and low-level vision tasks is provided, along with a vectorized Matlab implementation with state-of-the-art speed performance.
Dec 01 2014 cs.NI
Mobile sensing has become a promising paradigm for mobile users to obtain information by task crowdsourcing. However, due to the social preferences of mobile users, the quality of sensing reports may be impacted by the underlying social attributes and selfishness of individuals. Therefore, it is crucial to consider the social impacts and trustworthiness of mobile users when selecting task participants in mobile sensing. In this paper, we propose a Social Aware Crowdsourcing with Reputation Management (SACRM) scheme to select the well-suited participants and allocate the task rewards in mobile sensing. Specifically, we consider the social attributes, task delay and reputation in crowdsourcing and propose a participant selection scheme to choose the well-suited participants for the sensing task under a fixed task budget. A report assessment and rewarding scheme is also introduced to measure the quality of the sensing reports and allocate the task rewards based the assessed report quality. In addition, we develop a reputation management scheme to evaluate the trustworthiness and cost performance ratio of mobile users for participant selection. Theoretical analysis and extensive simulations demonstrate that SACRM can efficiently improve the crowdsourcing utility and effectively stimulate the participants to improve the quality of their sensing reports.
A key aspect of many resource allocation problems is the need for the resource controller to compute a function, such as the max or arg max, of the competing users metrics. Information must be exchanged between the competing users and the resource controller in order for this function to be computed. In many practical resource controllers the competing users' metrics are communicated to the resource controller, which then computes the desired extremization function. However, in this paper it is shown that information rate savings can be obtained by recognizing that controller only needs to determine the result of this extremization function. If the extremization function is to be computed losslessly, the rate savings are shown in most cases to be at most 2 bits independent of the number of competing users. Motivated by the small savings in the lossless case, simple achievable schemes for both the lossy and interactive variants of this problem are considered. It is shown that both of these approaches have the potential to realize large rate savings, especially in the case where the number of competing users is large. For the lossy variant, it is shown that the proposed simple achievable schemes are in fact close to the fundamental limit given by the rate distortion function.
Oct 30 2013 cs.LG
Sophisticated automatic incident detection (AID) technology plays a key role in contemporary transportation systems. Though many papers were devoted to study incident classification algorithms, few study investigated how to enhance feature representation of incidents to improve AID performance. In this paper, we propose to use an unsupervised feature learning algorithm to generate higher level features to represent incidents. We used real incident data in the experiments and found that effective feature mapping function can be learnt from the data crosses the test sites. With the enhanced features, detection rate (DR), false alarm rate (FAR) and mean time to detect (MTTD) are significantly improved in all of the three representative cases. This approach also provides an alternative way to reduce the amount of labeled data, which is expensive to obtain, required in training better incident classifiers since the feature learning is unsupervised.
With the proliferation of its applications in various industries, sentiment analysis by using publicly available web data has become an active research area in text classification during these years. It is argued by researchers that semi-supervised learning is an effective approach to this problem since it is capable to mitigate the manual labeling effort which is usually expensive and time-consuming. However, there was a long-term debate on the effectiveness of unlabeled data in text classification. This was partially caused by the fact that many assumptions in theoretic analysis often do not hold in practice. We argue that this problem may be further understood by adding an additional dimension in the experiment. This allows us to address this problem in the perspective of bias and variance in a broader view. We show that the well-known performance degradation issue caused by unlabeled data can be reproduced as a subset of the whole scenario. We argue that if the bias-variance trade-off is to be better balanced by a more effective feature selection method unlabeled data is very likely to boost the classification performance. We then propose a feature selection framework in which labeled and unlabeled training samples are both considered. We discuss its potential in achieving such a balance. Besides, the application in financial sentiment analysis is chosen because it not only exemplifies an important application, the data possesses better illustrative power as well. The implications of this study in text classification and financial sentiment analysis are both discussed.
Continuous motorization and urbanization around the globe leads to an expansion of population in major cities. Therefore, ever-growing pressure imposed on the existing mass transit systems calls for a better technology, Intelligent Transportation Systems (ITS), to solve many new and demanding management issues. Many studies in the extant ITS literature attempted to address these issues within which various research methodologies were adopted. However, there is very few paper summarized what does optimal control theory (OCT), one of the sharpest tools to tackle management issues in engineering, do in solving these issues. It\textquoterights both important and interesting to answer the following two questions. (1) How does OCT contribute to ITS research objectives? (2) What are the research gaps and possible future research directions? We searched 11 top transportation and control journals and reviewed 41 research articles in ITS area in which OCT was used as the main research methodology. We categorized the articles by four different ways to address our research questions. We can conclude from the review that OCT is widely used to address various aspects of management issues in ITS within which a large portion of the studies aimed to reduce traffic congestion. We also critically discussed these studies and pointed out some possible future research directions towards which OCT can be used.
The outstanding problem of controlling complex networks is relevant to many areas of science and engineering, and has the potential to generate technological breakthroughs as well. We address the physically important issue of the energy required for achieving control by deriving and validating scaling laws for the lower and upper energy bounds. These bounds represent a reasonable estimate of the energy cost associated with control, and provide a step forward from the current research on controllability toward ultimate control of complex networked dynamical systems.
The challenge of self-optimization for orthogonal frequency-division multiple-access (OFDMA) interference channels is that users inherently compete harmfully and simultaneous water-filling (WF) would lead to a Pareto-inefficient equilibrium. To overcome this, we first introduce the role of environmental interference derivative in the WF optimization of the interactive OFDMA game and then study the environmental interference derivative properties of Stackelberg equilibrium (SE). Such properties provide important insights to devise free OFDMA games for achieving various SEs, realizable by simultaneous WF regulated by specifically chosen operational interference derivatives. We also present a definition of all-Stackelberg-leader equilibrium (ASE) where users are all foresighted to each other, albeit each with only local channel state information (CSI), and can thus most effectively reconcile their competition to maximize the user rates. We show that under certain environmental conditions, the free games are both unique and optimal. Simulation results reveal that our distributed ASE game achieves the performance very close to the near-optimal centralized iterative spectrum balancing (ISB) method in .
The h-extra connectivity is an important parameter to measure the reliability and fault tolerance ability of large interconnection networks. The k-ary n-cube is an important interconnection network of parallel computing systems. The 1-restricted connectivity of k-ary n-cubes has been obtained by Chen et al. for k > 3 in [Y.-C. Chen, J. J. M. Tan, Restricted connectivity for three families of interconnection networks, Applied Mathematics and Computation 188 (2) (2007)1848--1855]. Nevertheless, the h-extra connectivity of 3-ary n-cubes has not been obtained yet. In this paper we prove that the 1-extra connectivity of a 3-ary n-cube is 4n-3 for n> 1 and the 2-extra connectivity of 3-ary n-cube is 6n-7 for n> 2.
Body Sensor Networks (BSNs) provide continuous health monitoring and analysis of physiological parameters. A high degree of Quality-of-Service (QoS) for BSN is extremely required. Inter-user interference is introduced by the simultaneous communication of BSNs congregating in the same area. In this paper, a decentralized inter-user interference suppression algorithm for BSN, namely DISG, is proposed. Each BSN measures the SINR from other BSNs and then adaptively selects the suitable channel and transmission power. By utilizing non-cooperative game theory and no regret learning algorithm, DISG provides an adaptive inter-user interference suppression strategy. The correctness and effectiveness of DISG is theoretically proved, and the experimental results show that DISG can reduce the effect of inter-user interference effectively.
Nov 16 2010 cs.NI
A high degree of reliability for critical data transmission is required in body sensor networks (BSNs). However, BSNs are usually vulnerable to channel impairments due to body fading effect and RF interference, which may potentially cause data transmission to be unreliable. In this paper, an adaptive and flexible fault-tolerant communication scheme for BSNs, namely AFTCS, is proposed. AFTCS adopts a channel bandwidth reservation strategy to provide reliable data transmission when channel impairments occur. In order to fulfill the reliability requirements of critical sensors, fault-tolerant priority and queue are employed to adaptively adjust the channel bandwidth allocation. Simulation results show that AFTCS can alleviate the effect of channel impairments, while yielding lower packet loss rate and latency for critical sensors at runtime.
Information overload in the modern society calls for highly efficient recommendation algorithms. In this letter we present a novel diffusion based recommendation model, with users' ratings built into a transition matrix. To speed up computation we introduce a Green function method. The numerical tests on a benchmark database show that our prediction is superior to the standard recommendation methods.