Apr 20 2018 cs.NI
This letter characterizes the optimal policies for bandwidth use and storage for the problem of distributed storage in Internet of Things (IoT) scenarios, where lost nodes cannot be replaced by new nodes as is typically assumed in Data Center and Cloud scenarios. We develop an information flow model that captures the overall process of data transmission between IoT devices, from the initial preparation stage (generating redundancy from the original data) to the different repair stages with fewer and fewer devices. Our numerical results show that in a system with 10 nodes, the proposed optimal scheme can save as much as 10.3% of bandwidth use, and as much as 44% storage use with respect to the closest suboptimal approach.
Apr 19 2018 cs.MM
This paper revisits the reversible data hiding scheme, of Liu et al. in 2018, for JPEG images. After that, a novel reversible data hiding scheme, in which modification directions of partial nonzero quantized alternating current (AC) coefficients are utilized to decrease distortion and file size increase caused by data hiding, is proposed. Experimental results have shown that the proposed scheme has indeed advantages in visual quality and smaller increase in file size of marked JPEG images while compared to the state-of-the-art scheme with the same embedding payload so far.
Unmanned aerial vehicle (UAV) system is vulnerable to the control signal spoofing attack due to the openness of the wireless communications. In this correspondence, a physical layer approach is proposed to combat the control signal spoofing attack, i.e,. to determine whether the received control signal packet is from the ground control station (GCS) or a potential malicious attacker (MA), which does not need to share any secret key. We consider the worst case where the UAV does not have any prior knowledge about the MA. Utilizing the channel feature of the angles of arrival, the distance-based path loss, and the Rician-$\kappa$ factor, we construct a generalized log-likelihood radio (GLLR) test framework to handle the problem. Accurate approximations of the false alarm and successful detection rate are provided to efficiently evaluate the performance.
Apr 19 2018 cs.MM
There exist many zero quantized discrete cosine transform (QDCT) coefficients in transform domain for video compression standards, such as H.264/AVC. Besides, the modification, i.e., increasing or decreasing by 1 on zero QDCT coefficients of high frequency area, has an insignificant effect on visual quality of H.264/AVC video. Thus, in this paper, we propose a novel video-based reversible data hiding method by using a mapping rule and zero QDCT coefficient-pairs from high frequency. The proposed method can obtain high embedding capacity and low distortion in terms of Peak-signal-noise-to-ratio (PSNR). Experimental results have demonstrated the payload-distortion performance of the proposed reversible data hiding method. The proposed scheme indeed has an advantage in the embedding capacity when compared with the related schemes.
Apr 18 2018 cs.CE
This study suggests a coupling uncertainty analysis method to investigate the stiffness characteristics of variable stiffness (VS) composite. The D-vine copula function is used to address the coupling of random variables. To identify the copula relation between random variables, a novel one-step Bayesian copula model selection (OBCS) method is proposed to obtain a suitable copula function as well as the marginal CDF of random variables. The entire process is Monte Carlo simulation (MCS). However, due to the expensive computational cost of complete finite element analysis (FEA) in MCS, a fast solver, reanalysis method is introduced. To further improve the efficiency of entire procedure, a back propagation neural network (BPNN) model is also introduced based on the reanalysis method. Compared with the reanalysis method, BPNN shows a higher efficiency as well as sufficient accuracy. Finally, the fiber angle deviation of VS composite is investigated by the suggested strategy. Two numerical examples are presented to verify the feasibility of this method.
Apr 17 2018 cs.NI
The modularization of Service Function Chains (SFCs) in Network Function Virtualization (NFV) could introduce significant performance overhead and resource efficiency degradation due to introducing frequent packet transfer and consuming much more hardware resources. In response, we exploit the lightweight and individually scalable features of elements in Modularized SFCs (MSFCs) and propose CoCo, a compact and optimized consolidation framework for MSFC in NFV. CoCo addresses the above problems in two ways. First, CoCo Optimized Placer pays attention to the problem of which elements to consolidate and provides a performance-aware placement algorithm to place MSFCs compactly and optimize the global packet transfer cost. Second, CoCo Individual Scaler innovatively introduces a push-aside scaling up strategy to avoid degrading performance and taking up new CPU cores. To support MSFC consolidation, CoCo also provides an automatic runtime scheduler to ensure fairness when elements are consolidated on CPU core. Our evaluation results show that CoCo achieves significant performance improvement and efficient resource utilization.
Apr 13 2018 cs.CY
Given a large collection of urban datasets, how can we find their hidden correlations? For example, New York City (NYC) provides open access to taxi data from year 2012 to 2015 with about half million taxi trips generated per day. In the meantime, we have a rich set of urban data in NYC including points-of-interest (POIs), geo-tagged tweets, weather, vehicle collisions, etc. Is it possible that these ubiquitous datasets can be used to explain the city traffic? Understanding the hidden correlation between external data and traffic data would allow us to answer many important questions in urban computing such as: If we observe a high traffic volume at Madison Square Garden (MSG) in NYC, is it because of the regular peak hour or a big event being held at MSG? If a disaster weather such as a hurricane or a snow storm hits the city, how would the traffic be affected? While existing studies may utilize external datasets for prediction task, they do not explicitly seek for direct explanations from the external datasets. In this paper, we present our results in attempts to understand taxi traffic dynamics in NYC from multiple external data sources. We use four real-world ubiquitous urban datasets, including POI, weather, geo-tagged tweet, and collision records. To address the heterogeneity of ubiquitous urban data, we present carefully-designed feature representations for various datasets. Extensive experiments on real data demonstrate the explanatory power on taxi traffic by using external datasets. More specifically, our analysis suggests that POIs can well describe the regular traffic patterns. At the same time, geo-tagged tweets can explain irregular traffic caused by big events and weather can explain the abnormal traffic drop.
Apr 13 2018 cs.CL
Unlike previous unknown nouns tagging task (Curran, 2005) (Ciaramita and Johnson, 2003), this is the first attempt to focus on out-of- vocabulary(OOV) lexical evaluation tasks that does not require any prior knowledge. The OOV words are words that only appear in test samples. The goal of tasks is to pro- vide solutions for OOV lexical classification and predication. The tasks require annotators to conclude the attributes of the OOV words based on their related contexts. Then, we uti- lize unsupervised word embedding methods such as Word2Vec(Mikolov et al., 2013) and Word2GM (Athiwaratkun and Wilson, 2017) to perform the baseline experiments on the cat- egorical classification task and OOV words at- tribute prediction tasks.
Domestic Violence against women is now recognized to be a serious and widespread problem worldwide. Domestic Violence and Abuse is at the root of so many issues in society and considered as the societal tabooed topic. Fortunately, with the popularity of social media, social welfare communities and victim support groups facilitate the victims to share their abusive stories and allow others to give advice and help victims. Hence, in order to offer the immediate resources for those needs, the specific messages from the victims need to be alarmed from other messages. In this paper, we regard intention mining as a binary classification problem (abuse or advice) with the usecase of abuse discourse. To address this problem, we extract rich feature sets from the raw corpus, using psycholinguistic clues and textual features by term-class interaction method. Machine learning algorithms are used to predict the accuracy of the classifiers between two different feature sets. Our experimental results with high classification accuracy give a promising solution to understand a big social problem through big social media and its use in serving information needs of various community welfare organizations.
Apr 05 2018 cs.CV
We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-in-Place Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.
Apr 04 2018 cs.MM
Data in mobile cloud environment are mainly transmitted via wireless noisy channels, which may result in transmission errors with a high probability due to its unreliable connectivity. For video transmission, unreliable connectivity may cause significant degradation of the content. Improving or keeping video quality over lossy channel is therefore a very important research topic. Error concealment with data hiding (ECDH) is an effective way to conceal the errors introduced by channels. It can reduce error propagation between neighbor blocks/frames comparing with the methods exploiting temporal/spatial correlations. The existing video ECDH methods often embed the motion vectors (MVs) into the specific locations. Nevertheless, specific embedding locations cannot resist against random errors. To compensate the unreliable connectivity in mobile cloud environment, in this paper, we present a video ECDH scheme using 3D reversible data hiding (RDH), in which each MV is repeated multiple times, and the repeated MVs are embedded into different macroblocks (MBs) randomly. Though the multiple embedding requires more embedding space, satisfactory trade-off between the introduced distortion and the reconstructed video quality can be achieved by tuning the repeating times of the MVs. For random embedding, the lost probability of the MVs decreases rapidly, resulting in better error concealment performance. Experimental results show that the PSNR values gain about 5dB at least comparing with the existing ECDH methods. Meanwhile, the proposed method improves the video quality significantly.
Apr 03 2018 cs.CV
Although Faster R-CNN and its variants have shown promising performance in object detection, they only exploit simple first-order representation of object proposals for final classification and regression. Recent classification methods demonstrate that the integration of high-order statistics into deep convolutional neural networks can achieve impressive improvement, but their goal is to model whole images by discarding location information so that they cannot be directly adopted to object detection. In this paper, we make an attempt to exploit high-order statistics in object detection, aiming at generating more discriminative representations for proposals to enhance the performance of detectors. To this end, we propose a novel Multi-scale Location-aware Kernel Representation (MLKP) to capture high-order statistics of deep features in proposals. Our MLKP can be efficiently computed on a modified multi-scale feature map using a low-dimensional polynomial kernel approximation.Moreover, different from existing orderless global representations based on high-order statistics, our proposed MLKP is location retentive and sensitive so that it can be flexibly adopted to object detection. Through integrating into Faster R-CNN schema, the proposed MLKP achieves very competitive performance with state-of-the-art methods, and improves Faster R-CNN by 4.9% (mAP), 4.7% (mAP) and 5.0% (AP at IOU=[0.5:0.05:0.95]) on PASCAL VOC 2007, VOC 2012 and MS COCO benchmarks, respectively. Code is available at: https://github.com/Hwang64/MLKP.
Histological analysis of tissue samples is one of the most widely used methods for disease diagnosis. After taking a sample from a patient, it goes through a lengthy and laborious preparation, which stains the tissue to visualize different histological features under a microscope. Here, we demonstrate a label-free approach to create a virtually-stained microscopic image using a single wide-field auto-fluorescence image of an unlabeled tissue sample, bypassing the standard histochemical staining process, saving time and cost. This method is based on deep learning, and uses a convolutional neural network trained using a generative adversarial network model to transform an auto-fluorescence image of an unlabeled tissue section into an image that is equivalent to the bright-field image of the stained-version of the same sample. We validated this method by successfully creating virtually-stained microscopic images of human tissue samples, including sections of salivary gland, thyroid, kidney, liver and lung tissue, also covering three different stains. This label-free virtual-staining method eliminates cumbersome and costly histochemical staining procedures, and would significantly simplify tissue preparation in pathology and histology fields.
While recent progresses in neural network approaches to single-channel speech separation, or more generally the cocktail party problem, achieved significant improvement, their performance for complex mixtures is still not satisfactory. In this work, we propose a novel multi-channel framework for multi-talker separation. In the proposed model, an input multi-channel mixture signal is firstly converted to a set of beamformed signals using fixed beam patterns. For this beamforming, we propose to use differential beamformers as they are more suitable for speech separation. Then each beamformed signal is fed into a single-channel anchored deep attractor network to generate separated signals. And the final separation is acquired by post selecting the separating output for each beams. To evaluate the proposed system, we create a challenging dataset comprising mixtures of 2, 3 or 4 speakers. Our results show that the proposed system largely improves the state of the art in speech separation, achieving 11.5 dB, 11.76 dB and 11.02 dB average signal-to-distortion ratio improvement for 4, 3 and 2 overlapped speaker mixtures, which is comparable to the performance of a minimum variance distortionless response beamformer that uses oracle location, source, and noise information. We also run speech recognition with a clean trained acoustic model on the separated speech, achieving relative word error rate (WER) reduction of 45.76\%, 59.40\% and 62.80\% on fully overlapped speech of 4, 3 and 2 speakers, respectively. With a far talk acoustic model, the WER is further reduced.
Distributed model training is vulnerable to worst-case system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To tolerate node failures and adversarial attacks, recent work suggests using variants of the geometric median to aggregate distributed updates at the PS, in place of bulk averaging. Although median-based update rules are robust to adversarial nodes, their computational cost can be prohibitive in large-scale settings and their convergence guarantees often require relatively strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are then used by the parameter server to eliminate the effects of adversarial updates. We present problem-independent robustness guarantees for DRACO and show that the model it produces is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times to orders of magnitude faster than median-based approaches.
Mar 28 2018 cs.CV
Current face or object detection methods via convolutional neural network (such as OverFeat, R-CNN and DenseNet) explicitly extract multi-scale features based on an image pyramid. However, such a strategy increases the computational burden for face detection. In this paper, we propose a fast face detection method based on discriminative complete features (DCFs) extracted by an elaborately designed convolutional neural network, where face detection is directly performed on the complete feature maps. DCFs have shown the ability of scale invariance, which is beneficial for face detection with high speed and promising performance. Therefore, extracting multi-scale features on an image pyramid employed in the conventional methods is not required in the proposed method, which can greatly improve its efficiency for face detection. Experimental results on several popular face detection datasets show the efficiency and the effectiveness of the proposed method for face detection.
Mar 28 2018 cs.CV
Object proposal generation methods have been widely applied to many computer vision tasks. However, existing object proposal generation methods often suffer from the problems of motion blur, low contrast, deformation, etc., when they are applied to video related tasks. In this paper, we propose an effective and highly accurate target-specific object proposal generation (TOPG) method, which takes full advantage of the context information of a video to alleviate these problems. Specifically, we propose to generate target-specific object proposals by integrating the information of two important objectness cues: colors and edges, which are complementary to each other for different challenging environments in the process of generating object proposals. As a result, the recall of the proposed TOPG method is significantly increased. Furthermore, we propose an object proposal ranking strategy to increase the rank accuracy of the generated object proposals. The proposed TOPG method has yielded significant recall gain (about 20%-60% higher) compared with several state-of-the-art object proposal methods on several challenging visual tracking datasets. Then, we apply the proposed TOPG method to the task of visual tracking and propose a TOPG-based tracker (called as TOPGT), where TOPG is used as a sample selection strategy to select a small number of high-quality target candidates from the generated object proposals. Since the object proposals generated by the proposed TOPG cover many hard negative samples and positive samples, these object proposals can not only be used for training an effective classifier, but also be used as target candidates for visual tracking. Experimental results show the superior performance of TOPGT for visual tracking compared with several other state-of-the-art visual trackers (about 3%-11% higher than the winner of the VOT2015 challenge in term of distance precision).
Mar 28 2018 cs.GR
DeepWarp is an efficient and highly re-usable deep neural network (DNN) based nonlinear deformable simulation framework. Unlike other deep learning applications such as image recognition, where different inputs have a uniform and consistent format (e.g. an array of all the pixels in an image), the input for deformable simulation is quite variable, high-dimensional, and parametrization-unfriendly. Consequently, even though DNN is known for its rich expressivity of nonlinear functions, directly using DNN to reconstruct the force-displacement relation for general deformable simulation is nearly impossible. DeepWarp obviates this difficulty by partially restoring the force-displacement relation via warping the nodal displacement simulated using a simplistic constitutive model -- the linear elasticity. In other words, DeepWarp yields an incremental displacement fix based on a simplified (therefore incorrect) simulation result other than returning the unknown displacement directly. We contrive a compact yet effective feature vector including geodesic, potential and digression to sort training pairs of per-node linear and nonlinear displacement. DeepWarp is robust under different model shapes and tessellations. With the assistance of deformation substructuring, one DNN training is able to handle a wide range of 3D models of various geometries including most examples shown in the paper. Thanks to the linear elasticity and its constant system matrix, the underlying simulator only needs to perform one pre-factorized matrix solve at each time step, and DeepWarp is able to simulate large models in real time.
Social awareness and social ties are becoming increasingly popular with emerging mobile and handheld devices. Social trust degree describing the strength of the social ties has drawn lots of research interests in many fields in wireless communications, such as resource sharing, cooperative communication and so on. In this paper, we propose a hybrid cooperative beamforming and jamming scheme to secure communication based on the social trust degree under a stochastic geometry framework. The friendly nodes are categorized into relays and jammers according to their locations and social trust degrees with the source node. We aim to analyze the involved connection outage probability (COP) and secrecy outage probability (SOP) of the performance in the networks. To achieve this target, we propose a double Gamma ratio (DGR) approach through Gamma approximation. Based on this, the COP and SOP are tractably obtained in closed-form. We further consider the SOP in the presence of Poisson Point Process (PPP) distributed eavesdroppers and derive an upper bound. The simulation results verify our theoretical findings, and validate that the social trust degree has dramatic influences on the security performance in the networks.
Existing research studies on vision and language grounding for robot navigation focus on improving model-free deep reinforcement learning (DRL) models in synthetic environments. However, model-free DRL models do not consider the dynamics in the real-world environments, and they often fail to generalize to new scenes. In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task. Our look-ahead module tightly integrates a look-ahead policy model with an environment model that predicts the next state and the reward. Experimental results suggest that our proposed method significantly outperforms the baselines and achieves the best on the real-world Room-to-Room dataset. Moreover, our scalable method is more generalizable when transferring to unseen environments, and the relative success rate is increased by 15.5% on the unseen test set.
The rapid development of deep learning methods has permitted the fast and accurate medical decision making from complex structured data, like CT images or MRI. However, some problems still exist in such applications that may lead to imperfect predictions. Previous observations have shown that, confounding factors, if handled inappropriately, will lead to biased prediction results towards some major properties of the data distribution. In other words, naively applying deep learning methods in these applications will lead to unfair prediction results for the minority group defined by the characteristics including age, gender, or even the hospital that collects the data, etc. In this paper, extending previous successes in correcting confounders, we propose a more stable method, namely Confounder Filtering, that can effectively reduce the influence of confounding factors, leading to better generalizability of trained discriminative deep neural networks, therefore, fairer prediction results. Our experimental results indicate that the Confounder Filtering method is able to improve the performance for different neural networks including CNN, LSTM, and other arbitrary architecture, different data types including CT-scan, MRI, and EEG brain wave data, as well as different confounding factors including age, gender, and physical factors of medical devices etc
Mar 21 2018 cs.CV
As we move towards large-scale object detection, it is unrealistic to expect annotated training data for all object classes at sufficient scale, and so methods capable of unseen object detection are required. We propose a novel zero-shot method based on training an end-to-end model that fuses semantic attribute prediction with visual features to propose object bounding boxes for seen and unseen classes. While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time. Our method retains the efficiency and effectiveness of YOLO for objects seen during training, while improving its performance for novel and unseen objects. The ability of state-of-art detection methods to learn discriminative object features to reject background proposals also limits their performance for unseen objects. We posit that, to detect unseen objects, we must incorporate semantic information into the visual domain so that the learned visual features reflect this information and leads to improved recall rates for unseen objects. We test our method on PASCAL VOC and MS COCO dataset and observed significant improvements on the average precision of unseen classes.
Recognizing number of communities and detecting community structures of complex network are discussed in this paper. As a visual and feasible algorithm, block model has been successfully applied to detect community structures in complex network. In order to measure the quality of the block model, we first define an objective function WQ value. For obtaining block model B of a network, GSA algorithm is applied to optimize WQ with the help of random keys. After executing processes AO (Adding Ones) and RO (Removing Ones) on block model B, the number of communities of a network can be recognized distinctly. Furthermore, based on the advantage of block model that its sort order of nodes is in correspondence with sort order of communities, so a new fuzzy boundary algorithm for detecting community structures is proposed and successfully applied to some representative networks. Finally, experimental results demonstrate the feasibility of the proposed algorithm.
Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused on exploring such relationship. Motivated by this, this paper conducts an experimental comparison for the effects of missing, inconsistent and conflicting data on classification, clustering, and regression algorithms. Based on the experimental findings, we provide guidelines for algorithm selection and data cleaning.
Anomaly detection on road networks can be used to sever for emergency response and is of great importance to traffic management. However, none of the existing approaches can deal with the diversity of anomaly types. In this paper, we propose a novel framework to detect multiple types of anomalies. The framework incorporates real-time and historical traffic into a tensor model and acquires spatial and different scale temporal pattern of traffic in unified model using tensor factorization. Furthermore, we propose a sliding window tensor factorization to improve the computational efficiency. Basing on this, we can identify different anomaly types through measuring the deviation from different spatial and temporal pattern. Then, to promote a deeper understanding of the detected anomalies, we use an optimization method to discover the path-level anomalies. The core idea is that the anomalous path inference is formulated as L1 inverse problem by considering the sparsity of anomalies and flow on paths simultaneously. We conduct synthetic experiments and real case studies based on a real-world dataset of taxi trajectories. Experiments verify that the proposed framework outperforms all baseline methods on efficiency and effectiveness, and the framework can provide a better understanding for anomalous events.
To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.
Mar 12 2018 cs.CG
Origami structures enabled by folding and unfolding can create complex 3D shapes. However, even a small 3D shape can have large 2D unfoldings. The huge initial dimension of the 2D flattened structure makes fabrication difficult, and defeats the main purpose, namely compactness, of many origami-inspired engineering. In this work, we propose a novel algorithmic kirigami method that provides super compaction of an arbitrary 3D shape with non-negligible surface thickness called "algorithmic stacking". Our approach computationally finds a way of cutting the thick surface of the shape into a strip. This strip forms a Hamiltonian cycle that covers the entire surface and can realize transformation between two target shapes: from a super compact stacked shape to the input 3D shape. Depending on the surface thickness, the stacked structure takes merely 0.001% to 6% of the original volume. This super compacted structure not only can be manufactured in a workspace that is significantly smaller than the provided 3D shape, but also makes packing and transportation easier for a deployable application. We further demonstrate that, the proposed stackable structure also provides high pluripotency and can transform into multiple 3D target shapes if these 3D shapes can be dissected in specific ways and form a common stacked structure. In contrast to many designs of origami structure that usually target at a particular shape, our results provide a universal platform for pluripotent 3D transformable structures.
Mar 06 2018 cs.CL
Identifying implicit discourse relations between text spans is a challenging task because it requires understanding the meaning of the text. To tackle this task, recent studies have tried several deep learning methods but few of them exploited the syntactic information. In this work, we explore the idea of incorporating syntactic parse tree into neural networks. Specifically, we employ the Tree-LSTM model and Tree-GRU model, which are based on the tree structure, to encode the arguments in a relation. Moreover, we further leverage the constituent tags to control the semantic composition process in these tree-structured neural networks. Experimental results show that our method achieves state-of-the-art performance on PDTB corpus.
This paper considers an application of model predictive control to automotive air conditioning (A/C) system in future connected and automated vehicles (CAVs) with battery electric or hybrid electric powertrains. A control-oriented prediction model for A/C system is proposed, identified, and validated against a higher fidelity simulation model (CoolSim). Based on the developed prediction model, a nonlinear model predictive control (NMPC) problem is formulated and solved online to minimize the energy consumption of the A/C system. Simulation results illustrate the desirable characteristics of the proposed NMPC solution such as being able to enforce physical constraints of the A/C system and maintain cabin temperature within a specified range. Moreover, it is shown that by utilizing the vehicle speed preview and through coordinated adjustment of the cabin temperature constraints, energy efficiency improvements of up to 9% can be achieved.
Feb 28 2018 cs.CR
In this work, we present our early stage results on a Conflicts Check Protocol (CCP) that enables preventing potential attacks on bitcoin system. Based on the observation and discovery of a common symptom that many attacks may generate, CCP refines the current bitcoin systems by proposing a novel arbitration mechanism that is capable to determine the approval or abandon of certain transactions involved in confliction. This work examines the security issue of bitcoin from a new perspective, which may extend to a larger scope of attack analysis and prevention
Feb 27 2018 cs.CV
The deep convolutional neural networks have achieved significant improvements in accuracy and speed for single image super-resolution. However, as the depth of network grows, the information flow is weakened and the training becomes harder and harder. On the other hand, most of the models adopt a single-stream structure with which integrating complementary contextual information under different receptive fields is difficult. To improve information flow and to capture sufficient knowledge for reconstructing the high-frequency details, we propose a cascaded multi-scale cross network (CMSC) in which a sequence of subnetworks is cascaded to infer high resolution features in a coarse-to-fine manner. In each cascaded subnetwork, we stack multiple multi-scale cross (MSC) modules to fuse complementary multi-scale information in an efficient way as well as to improve information flow across the layers. Meanwhile, by introducing residual-features learning in each stage, the relative information between high-resolution and low-resolution features is fully utilized to further boost reconstruction performance. We train the proposed network with cascaded-supervision and then assemble the intermediate predictions of the cascade to achieve high quality image reconstruction. Extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over state-of-the-art super-resolution methods.
A steganographer network corresponds to a graphic structure that the involved vertices (or called nodes) denote social entities such as the data encoders and data decoders, and the associated edges represent any real communicable channels or other social links that could be utilized for steganography. Unlike traditional steganographic algorithms, a steganographer network models steganographic communication by an abstract way such that the concerned underlying characteristics of steganography are quantized as analyzable parameters in the network. In this paper, we will analyze two problems in a steganographer network. The first problem is a passive attack to a steganographer network where a network monitor has collected a list of suspicious vertices corresponding to the data encoders or decoders. The network monitor expects to break (disconnect) the steganographic communication down between the suspicious vertices while keeping the cost as low as possible. The second one relates to determining a set of vertices corresponding to the data encoders (senders) such that all vertices can share a message by neighbors. We point that, the two problems are equivalent to the minimum cut problem and the minimum-weight dominating set problem.
Feb 26 2018 cs.AI
Real-time bidding (RTB) is almost the most important mechanism in online display advertising, where proper bid for each page view plays a vital and essential role for good marketing results. Budget constrained bidding is a typical scenario in RTB mechanism where the advertisers hope to maximize total value of winning impressions under a pre-set budget constraint. However, the optimal strategy is hard to be derived due to complexity and volatility of the auction environment. To address the challenges, in this paper, we formulate budget constrained bidding as a Markov Decision Process. Quite different from prior model-based work, we propose a novel framework based on model-free reinforcement learning which sequentially regulates the bidding parameter rather than directly producing bid. Along this line, we further innovate a reward function which deploys a deep neural network to learn appropriate reward and thus leads the agent to deliver the optimal policy effectively; we also design an adaptive $\epsilon$-greedy strategy which adjusts the exploration behaviour dynamically and further improves the performance. Experimental results on real dataset demonstrate the effectiveness of our framework.
Feb 23 2018 cs.CV
In this paper, we propose a novel feature learning framework for video person re-identification (re-ID). The proposed framework largely aims to exploit the adequate temporal information of video sequences and tackle the poor spatial alignment of moving pedestrians. More specifically, for exploiting the temporal information, we design a temporal residual learning (TRL) module to simultaneously extract the generic and specific features of consecutive frames. The TRL module is equipped with two bi-directional LSTM (BiLSTM), which are respectively responsible to describe a moving person in different aspects, providing complementary information for better feature representations. To deal with the poor spatial alignment in video re-ID datasets, we propose a spatial-temporal transformer network (ST^2N) module. Transformation parameters in the ST^2N module are learned by leveraging the high-level semantic information of the current frame as well as the temporal context knowledge from other frames. The proposed ST^2N module with less learnable parameters allows effective person alignments under significant appearance changes. Extensive experimental results on the large-scale MARS, PRID2011, ILIDS-VID and SDU-VID datasets demonstrate that the proposed method achieves consistently superior performance and outperforms most of the very recent state-of-the-art methods.
Feb 23 2018 cs.CV
In this paper we propose an effective non-rigid object tracking method based on spatial-temporal consistent saliency detection. In contrast to most existing trackers that use a bounding box to specify the tracked target, the proposed method can extract the accurate regions of the target as tracking output, which achieves better description of the non-rigid objects while reduces background pollution to the target model. Furthermore, our model has several unique features. First, a tailored deep fully convolutional neural network (TFCN) is developed to model the local saliency prior for a given image region, which not only provides the pixel-wise outputs but also integrates the semantic information. Second, a multi-scale multi-region mechanism is proposed to generate local region saliency maps that effectively consider visual perceptions with different spatial layouts and scale variations. Subsequently, these saliency maps are fused via a weighted entropy method, resulting in a final discriminative saliency map. Finally, we present a non-rigid object tracking algorithm based on the proposed saliency detection method by utilizing a spatial-temporal consistent saliency map (STCSM) model to conduct target-background classification and using a simple fine-tuning scheme for online updating. Numerous experimental results demonstrate that the proposed algorithm achieves competitive performance in comparison with state-of-the-art methods for both saliency detection and visual tracking, especially outperforming other related trackers on the non-rigid object tracking datasets.
Feb 22 2018 cs.CV
We propose a lightweight neural network model, Deformable Volume Network (Devon) for learning optical flow. Devon benefits from a multi-stage framework to iteratively refine its prediction. Each stage is by itself a neural network with an identical architecture. The optical flow between two stages is propagated with a newly proposed module, the deformable cost volume. The deformable cost volume does not distort the original images or their feature maps and therefore avoids the artifacts associated with warping, a common drawback in previous models. Devon only has one million parameters. Experiments show that Devon achieves comparable results to previous neural network models, despite of its small size.
Feb 19 2018 cs.AI
Recently, the interest in reinforcement learning in game playing has been renewed. This is evidenced by the groundbreaking results achieved by AlphaGo. General Game Playing (GGP) provides a good testbed for reinforcement learning, currently one of the hottest fields of AI. In GGP, a specification of games rules is given. The description specifies a reinforcement learning problem, leaving programs to find strategies for playing well. Q-learning is one of the canonical reinforcement learning methods, which is used as baseline on some previous work (Banerjee & Stone, IJCAI 2007). We implement Q-learning in GGP for three small board games (Tic-Tac-Toe, Connect-Four, Hex). We find that Q-learning converges, and thus that this general reinforcement learning method is indeed applicable to General Game Playing. However, convergence is slow, in comparison to MCTS (a reinforcement learning method reported to achieve good results). We enhance Q-learning with Monte Carlo Search. This enhancement improves performance of pure Q-learning, although it does not yet out-perform MCTS. Future work is needed into the relation between MCTS and Q-learning, and on larger problem instances.
Feb 15 2018 cs.CR
Passwords are ubiquitous and most commonly used to authenticate users when logging into online services. Using high entropy passwords is critical to prevent unauthorized access and password policies emerged to enforce this requirement on passwords. However, with current methods of password storage, poor practices and server breaches have leaked many passwords to the public. To protect one's sensitive information in case of such events, passwords should be hidden from servers. Verifier-based password authenticated key exchange, proposed by Bellovin and Merrit (IEEE S\&P, 1992), allows authenticated secure channels to be established with a hash of a password (verifier). Unfortunately, this restricts password policies as passwords cannot be checked from their verifier. To address this issue, Kiefer and Manulis (ESORICS 2014) proposed zero-knowledge password policy check (ZKPPC). A ZKPPC protocol allows users to prove in zero knowledge that a hash of the user's password satisfies the password policy required by the server. Unfortunately, their proposal is not quantum resistant with the use of discrete logarithm-based cryptographic tools and there are currently no other viable alternatives. In this work, we construct the first post-quantum ZKPPC using lattice-based tools. To this end, we introduce a new randomised password hashing scheme for ASCII-based passwords and design an accompanying zero-knowledge protocol for policy compliance. Interestingly, our proposal does not follow the framework established by Kiefer and Manulis and offers an alternate construction without homomorphic commitments. Although our protocol is not ready to be used in practice, we think it is an important first step towards a quantum-resistant privacy-preserving password-based authentication and key exchange system.
Feb 13 2018 cs.CY
The personalized health care service utilizes the relational patient data and big data analytics to tailor the medication recommendations. However, most of the health care data are in unstructured form and it consumes a lot of time and effort to pull them into relational form. This study proposes a novel data lake architecture to reduce the data ingestion time and improve the precision of healthcare analytics. It also removes the data silos and enhances the analytics by allowing the connectivity to the third-party data providers (such as clinical lab results, chemist, insurance company,etc.). The data lake architecture uses the Hadoop Distributed File System (HDFS) to provide the storage for both structured and unstructured data. This study uses K-means clustering algorithm to find the patient clusters with similar health conditions. Subsequently, it employs a support vector machine to find the most successful healthcare recommendations for the each cluster. Our experiment results demonstrate the ability of data lake to reduce the time for ingesting data from various data vendors regardless of its format. Moreover, it is evident that the data lake poses the potential to generate clusters of patients more precisely than the existing approaches. It is obvious that the data lake provides a unified storage location for the data in its native format. It can also improve the personalized healthcare medication recommendations by removing the data silos.
Feb 07 2018 cs.CV
In this paper, we propose a geometry-contrastive generative adversarial network GC-GAN for generating facial expression images conditioned on geometry information. Specifically, given an input face and a target expression designated by a set of facial landmarks, an identity-preserving face can be generated guided by the target expression. In order to embed facial geometry onto a semantic manifold, we incorporate contrastive learning into conditional GANs. Experiment results demonstrate that the manifold is sensitive to the changes of facial geometry both globally and locally. Benefited from the semantic manifold, dynamic smooth transitions between different facial expressions are exhibited via geometry interpolation. Furthermore, our method can also be applied in facial expression transfer even there exist big differences in face shape between target faces and driving faces.
Feb 06 2018 cs.CV
In this paper, we propose a simple and effective geometric model fitting method to fit and segment multi-structure data even in the presence of severe outliers. We cast the task of geometric model fitting as a representative mode-seeking problem on hypergraphs. Specifically, a hypergraph is firstly constructed, where the vertices represent model hypotheses and the hyperedges denote data points. The hypergraph involves higher-order similarities (instead of pairwise similarities used on a simple graph), and it can characterize complex relationships between model hypotheses and data points. In addition, we develop a hypergraph reduction technique to remove "insignificant" vertices while retaining as many "significant" vertices as possible in the hypergraph. Based on the simplified hypergraph, we then propose a novel mode-seeking algorithm to search for representative modes within reasonable time. Finally, the proposed mode-seeking algorithm detects modes according to two key elements, i.e., the weighting scores of vertices and the similarity analysis between vertices. Overall, the proposed fitting method is able to efficiently and effectively estimate the number and the parameters of model instances in the data simultaneously. Experimental results demonstrate that the proposed method achieves significant superiority over several state-of-the-art model fitting methods on both synthetic data and real images.
Phase retrieval problem has been studied in various applications. It is an inverse problem without the standard uniqueness guarantee. To make complete theoretical analyses and devise efficient algorithms to recover the signal is sophisticated. In this paper, we come up with a model called \textitphase retrieval with background information which recovers the signal with the known background information from the intensity of their combinational Fourier transform spectrum. We prove that the uniqueness of phase retrieval can be guaranteed even considering those trivial solutions when the background information is sufficient. Under this condition, we construct a loss function and utilize the projected gradient descent method to search for the ground truth. We prove that the stationary point is the global optimum with probability 1. Numerical simulations demonstrate the projected gradient descent method performs well both for 1-D and 2-D signals. Furthermore, this method is quite robust to the Gaussian noise and the bias of the background information.
Multi-person articulated pose tracking in complex unconstrained videos is an important and challenging problem. In this paper, going along the road of top-down approaches, we propose a decent and efficient pose tracker based on pose flows. First, we design an online optimization framework to build association of cross-frame poses and form pose flows. Second, a novel pose flow non maximum suppression (NMS) is designed to robustly reduce redundant pose flows and re-link temporal disjoint pose flows. Extensive experiments show our method significantly outperforms best reported results on two standard Pose Tracking datasets (PoseTrack dataset and PoseTrack Challenge dataset) by 13 mAP 25 MOTA and 6 mAP 3 MOTA respectively. Moreover, in the case of working on detected poses in individual frames, the extra computation of proposed pose tracker is very minor, requiring 0.01 second per frame only.
Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.
Feb 05 2018 cs.CY
As a consequence of the huge advancement of the Electronic Health Record (EHR) in healthcare settings, the My Health Record (MHR) is introduced in Australia. However security and privacy of the MHR system have been encumbering the development of the system. Even though the MHR system is claimed as patient-cenred and patient-controlled, there are several instances where healthcare providers (other than the usual provider) and system operators who maintain the system can easily access the system and these unauthorised accesses can lead to a breach of the privacy of the patients. This is one of the main concerns of the consumers that affect the uptake of the system. In this paper, we propose a patient centred MHR framework which requests authorisation from the patient to access their sensitive health information. The proposed model increases the involvement and satisfaction of the patients in their healthcare and also suggests mobile security system to give an online permission to access the MHR system.
An Electronic Health Record (EHR) system must enable efficient availability of meaningful, accurate and complete data to assist improved clinical administration through the development, implementation and optimisation of clinical pathways. Therefore data integrity is the driving force in EHR systems and is an essential aspect of service delivery at all levels. However, preserving data integrity in EHR systems has become a major problem because of its consequences in promoting high standards of patient care. In this paper, we review and address the impact of data integrity of the use of EHR system and its associated issues. We determine and analyse three phases of data integrity of an EHR system. Finally, we also present an appropriate method to preserve the integrity in EHR systems. To analyse and evaluate the data integrity, one of the major clinical systems in Australia is considered. This will demonstrate the impact on quality and safety of patient care.
Though the big data benchmark suites like BigDataBench and CloudSuite have been used in architecture and system researches, we have not yet answered the fundamental issue-- what are abstractions of frequently-appearing units of computation in big data analytics, which we call big data dwarfs. For the first time, we identify eight big data dwarfs, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations among a wide variety of big data analytics workloads. We implement the eight dwarfs on different software stacks as the dwarf components. We present the application of the big data dwarfs to construct big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems while they are qualified for both earlier architecture design and later system evaluation across different architectures.
Jan 31 2018 cs.MA
In this paper we present a novel crowd simulation method by modeling the generation and contagion of panic emotion under multi-hazard circumstances. Specifically, we first classify hazards into different types (transient and persistent, concurrent and non-concurrent, static and dynamic ) based on their inherent characteristics. Then, we introduce the concept of perilous field for each hazard and further transform the critical level of the field to its invoked-panic emotion. After that, we propose an emotional contagion model to simulate the evolving process of panic emotion caused by multiple hazards in these situations. Finally, we introduce an Emotional Reciprocal Velocity Obstacles (ERVO) model to simulate the crowd behaviors by augmenting the traditional RVO model with emotional contagion, which combines the emotional impact and local avoidance together for the first time. Our experimental results show that this method can soundly generate realistic group behaviors as well as panic emotion dynamics in a crowd in multi-hazard environments.
The artificial neural network shows powerful ability of inference, but it is still criticized for lack of interpretability and prerequisite needs of big dataset. This paper proposes the Rule-embedded Neural Network (ReNN) to overcome the shortages. ReNN first makes local-based inferences to detect local patterns, and then uses rules based on domain knowledge about the local patterns to generate rule-modulated map. After that, ReNN makes global-based inferences that synthesizes the local patterns and the rule-modulated map. To solve the optimization problem caused by rules, we use a two-stage optimization strategy to train the ReNN model. By introducing rules into ReNN, we can strengthen traditional neural networks with long-term dependencies which are difficult to learn with limited empirical dataset, thus improving inference accuracy. The complexity of neural networks can be reduced since long-term dependencies are not modeled with neural connections, and thus the amount of data needed to optimize the neural networks can be reduced. Besides, inferences from ReNN can be analyzed with both local patterns and rules, and thus have better interpretability. In this paper, ReNN has been validated with a time-series detection problem.
Millimeter wave offers a sensible solution to the capacity crunch faced by 5G wireless communications. This paper comprehensively studies physical layer security in a multi-input single-output (MISO) millimeter wave system where multiple single-antenna eavesdroppers are randomly located. Concerning the specific propagation characteristics of millimeter wave, we investigate two secure transmission schemes, namely maximum ratio transmitting (MRT) beamforming and artificial noise (AN) beamforming. Specifically, we first derive closed-form expressions of the connection probability for both schemes. We then analyze the secrecy outage probability (SOP) in both non-colluding eavesdroppers and colluding eavesdroppers scenarios. Also, we maximize the secrecy throughput under a SOP constraint, and obtain optimal transmission parameters, especially the power allocation between AN and the information signal for AN beamforming. Numerical results are provided to verify our theoretical analysis. We observe that the density of eavesdroppers, the spatially resolvable paths of the destination and eavesdroppers all contribute to the secrecy performance and the parameter design of millimeter wave systems.
In the fifth generation era, the pervasive applications of Internet of Things and massive machine-type communications have initiated increasing research interests on the backscatter wireless powered communication (B-WPC) technique due to its ultra high energy efficiency and low cost. The ubiquitous B-WPC network is characterized by nodes with dynamic spatial positions and sporadic short packets, of which the performance has not been fully investigated. In this paper, we give a comprehensive analysis of a multi-antenna B-WPC network with sporadic short packets under a stochastic geometry framework. By exploiting a time-space Poisson point process model, the behavior of the network is well captured in a decentralized and asynchronous transmission way. We then analyze the energy and information outage performance in the energy harvest and backscatter modulation phases of the backscatter network, respectively. The optimal transmission slot length and division are obtained by maximizing the network-wide spatial throughput. Moreover, we find an interesting result that there exists the optimal tradeoff between the durations of the energy harvest and backscatter modulation phases for spatial throughput maximization. Numerical results are demonstrated to verify our analytical findings and show that this tradeoff region gets shrunk when the outage constraints become more stringent.
Jan 30 2018 cs.CV
Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by $L_2$ normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.
Jan 29 2018 cs.CR
In this work, we provide the first lattice-based group signature that offers full dynamicity (i.e., users have the flexibility in joining and leaving the group), and thus, resolve a prominent open problem posed by previous works. Moreover, we achieve this non-trivial feat in a relatively simple manner. Starting with Libert et al.'s fully static construction (Eurocrypt 2016) - which is arguably the most efficient lattice-based group signature to date, we introduce simple-but-insightful tweaks that allow to upgrade it directly into the fully dynamic setting. More startlingly, our scheme even produces slightly shorter signatures than the former, thanks to an adaptation of a technique proposed by Ling et al. (PKC 2013), allowing to prove inequalities in zero-knowledge. Our design approach consists of upgrading Libert et al.'s static construction (EUROCRYPT 2016) - which is arguably the most efficient lattice-based group signature to date - into the fully dynamic setting. Somewhat surprisingly, our scheme produces slightly shorter signatures than the former, thanks to a new technique for proving inequality in zero-knowledge without relying on any inequality check. The scheme satisfies the strong security requirements of Bootle et al.'s model (ACNS 2016), under the Short Integer Solution (SIS) and the Learning With Errors (LWE) assumptions. Furthermore, we demonstrate how to equip the obtained group signature scheme with the deniability functionality in a simple way. This attractive functionality, put forward by Ishida et al. (CANS 2016), enables the tracing authority to provide an evidence that a given user is not the owner of a signature in question. In the process, we design a zero-knowledge protocol for proving that a given LWE ciphertext does not decrypt to a particular message.
Jan 26 2018 cs.CR
Group signature is a fundamental cryptographic primitive, aiming to protect anonymity and ensure accountability of users. It allows group members to anonymously sign messages on behalf of the whole group, while incorporating a tracing mechanism to identify the signer of any suspected signature. Most of the existing group signature schemes, however, do not guarantee security once users' secret keys are exposed. To reduce potential damages caused by key exposure attacks, Song (CCS 2001) put forward the concept of forward-secure group signatures (FSGS). For the time being, all known secure FSGS schemes are based on number-theoretic assumptions, and are vulnerable against quantum computers. In this work, we construct the first lattice-based FSGS scheme. In Nakanishi et al.'s model, our scheme achieves forward-secure traceability under the Short Integer Solution (SIS) assumption, and offers full anonymity under the Learning With Errors (LWE) assumption. At the heart of our construction is a scalable lattice-based key-evolving mechanism, allowing users to periodically update their secret keys and to efficiently prove in zero-knowledge that the key-evolution process is done correctly. To realize this essential building block, we first employ the Bonsai-tree structure by Cash et al. (EUROCRYPT 2010) to handle the key evolution process, and then develop Langlois et al.'s construction (PKC 2014) to design its supporting zero-knowledge protocol. In comparison to all known lattice-based group signatures (that are \emphnot forward-secure), our scheme only incurs a very reasonable overhead: the bit-sizes of keys and signatures are at most O(log N), where N is the number of group users; and at most O(log^3 T), where T is the number of time periods.
Online news recommender systems aim to address the information explosion of news and make personalized recommendation for users. In general, news language is highly condensed, full of knowledge entities and common sense. However, existing methods are unaware of such external knowledge and cannot fully discover latent knowledge-level connections among news. The recommended results for a user are consequently limited to simple patterns and cannot be extended reasonably. Moreover, news recommendation also faces the challenges of high time-sensitivity of news and dynamic diversity of users' interests. To solve the above problems, in this paper, we propose a deep knowledge-aware network (DKN) that incorporates knowledge graph representation into news recommendation. DKN is a content-based deep recommendation framework for click-through rate prediction. The key component of DKN is a multi-channel and word-entity-aligned knowledge-aware convolutional neural network (KCNN) that fuses semantic-level and knowledge-level representations of news. KCNN treats words and entities as multiple channels, and explicitly keeps their alignment relationship during convolution. In addition, to address users' diverse interests, we also design an attention module in DKN to dynamically aggregate a user's history with respect to current candidate news. Through extensive experiments on a real online news platform, we demonstrate that DKN achieves substantial gains over state-of-the-art deep recommendation models. We also validate the efficacy of the usage of knowledge in DKN.
Jan 25 2018 cs.CR
Efficient user revocation is a necessary but challenging problem in many multi-user cryptosystems. Among known approaches, server-aided revocation yields a promising solution, because it allows to outsource the major workloads of system users to a computationally powerful third party, called the server, whose only requirement is to carry out the computations correctly. Such a revocation mechanism was considered in the settings of identity-based encryption and attribute-based encryption by Qin et al. (ESORICS 2015) and Cui et al. (ESORICS 2016), respectively. In this work, we consider the server-aided revocation mechanism in the more elaborate setting of predicate encryption (PE). The latter, introduced by Katz, Sahai, and Waters (EUROCRYPT 2008), provides fine-grained and role-based access to encrypted data and can be viewed as a generalization of identity-based and attribute-based encryption. Our contribution is two-fold. First, we formalize the model of server-aided revocable predicate encryption (SR-PE), with rigorous definitions and security notions. Our model can be seen as a non-trivial adaptation of Cui et al.'s work into the PE context. Second, we put forward a lattice-based instantiation of SR-PE. The scheme employs the PE scheme of Agrawal, Freeman and Vaikuntanathan (ASIACRYPT 2011) and the complete subtree method of Naor, Naor, and Lotspiech (CRYPTO 2001) as the two main ingredients, which work smoothly together thanks to a few additional techniques. Our scheme is proven secure in the standard model (in a selective manner), based on the hardness of the Learning With Errors (LWE) problem.
A class of specialized neurons, called lobula plate tangential cells (LPTCs) has been shown to respond strongly to wide-field motion. The classic model, elementary motion detector (EMD) and its improved model, two-quadrant detector (TQD) have been proposed to simulate LPTCs. Although EMD and TQD can percept background motion, their outputs are so cluttered that it is difficult to discriminate actual motion direction of the background. In this paper, we propose a max operation mechanism to model a newly-found transmedullary neuron Tm9 whose physiological properties do not map onto EMD and TQD. This proposed max operation mechanism is able to improve the detection performance of TQD in cluttered background by filtering out irrelevant motion signals. We will demonstrate the functionality of this proposed mechanism in wide-field motion perception.
Discriminating targets moving against a cluttered background is a huge challenge, let alone detecting a target as small as one or a few pixels and tracking it in flight. In the fly's visual system, a class of specific neurons, called small target motion detectors (STMDs), have been identified as showing exquisite selectivity for small target motion. Some of the STMDs have also demonstrated directional selectivity which means these STMDs respond strongly only to their preferred motion direction. Directional selectivity is an important property of these STMD neurons which could contribute to tracking small targets such as mates in flight. However, little has been done on systematically modeling these directional selective STMD neurons. In this paper, we propose a directional selective STMD-based neural network (DSTMD) for small target detection in a cluttered background. In the proposed neural network, a new correlation mechanism is introduced for direction selectivity via correlating signals relayed from two pixels. Then, a lateral inhibition mechanism is implemented on the spatial field for size selectivity of STMD neurons. Extensive experiments showed that the proposed neural network not only is in accord with current biological findings, i.e. showing directional preferences, but also worked reliably in detecting small targets against cluttered backgrounds.
Consider a data publishing setting for a data set with public and private features. The objective of the publisher is to maximize the amount of information about the public features in a revealed data set, while keeping the information leaked about the private features bounded. The goal of this paper is to analyze the performance of privacy mechanisms that are constructed to match the distribution learned from the data set. Two distinct scenarios are considered: (i) mechanisms are designed to provide a privacy guarantee for the learned distribution; and (ii) mechanisms are designed to provide a privacy guarantee for every distribution in a given neighborhood of the learned distribution. For the first scenario, given any privacy mechanism, upper bounds on the difference between the privacy-utility guarantees for the learned and true distributions are presented. In the second scenario, upper bounds on the reduction in utility incurred by providing a uniform privacy guarantee are developed.
Jan 17 2018 cs.CL
To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators. Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotator-generic and -specific information. The annotator-generic information is the common knowledge for entities easily mastered by the crowd. Finally, we build our Chinese NE tagger based on the LSTM-CRF model. In our experiments, we create two data sets for Chinese NER tasks from two domains. The experimental results show that our system achieves better scores than strong baseline systems.
In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we present an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. We then aim to find a \textitcorrection function that can be used to perturb the input distributions of each group in order to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expression for the correction function that can be used to compute it efficiently. We illustrate the use of the correction function for a recidivism prediction application derived from the ProPublica COMPAS dataset.
Jan 17 2018 cs.CL
The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-toright manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.
Jan 16 2018 cs.MM
The conventional reversible data hiding (RDH) algorithms often consider the host as a whole to embed a payload. In order to achieve satisfactory rate-distortion performance, the secret bits are embedded into the noise-like component of the host such as prediction errors. From the rate-distortion view, it may be not optimal since the data embedding units use the identical parameters. This motivates us to present a segmented data embedding strategy for RDH in this paper, in which the raw host could be partitioned into multiple sub-hosts such that each one can freely optimize and use the embedding parameters. Moreover, it enables us to apply different RDH algorithms within different sub-hosts, which is defined as ensemble. Notice that, the ensemble defined here is different from that in machine learning. Accordingly, the conventional operation corresponds to a special case of our work. Since it is a general strategy, we combine some state-of-the-art algorithms to construct a new system using the proposed embedding strategy to evaluate the rate-distortion performance. Experimental results have shown that, the ensemble RDH system outperforms the original versions, which has shown the superiority and applicability.
Jan 16 2018 cs.MM
In reversible data embedding, to avoid overflow and underflow problem, before data embedding, boundary pixels are recorded as side information, which may be losslessly compressed. The existing algorithms often assume that a natural image has little boundary pixels so that the size of side information is small. Accordingly, a relatively high pure payload could be achieved. However, there actually may exist a lot of boundary pixels in a natural image, implying that, the size of side information could be very large. Therefore, when to directly use the existing algorithms, the pure embedding capacity may be not sufficient. In order to address this problem, in this paper, we present a new and efficient framework to reversible data embedding in images that have lots of boundary pixels. The core idea is to losslessly preprocess boundary pixels so that it can significantly reduce the side information. Experimental results have shown the superiority and applicability of our work.
Transmitter-side channel state information (CSI) of the legitimate destination plays a critical role in physical layer secure transmissions. However, channel training procedure is vulnerable to the pilot spoofing attack (PSA) or pilot jamming attack (PJA) by an active eavesdropper (Eve), which inevitably results in severe private information leakage. In this paper, we propose a random channel training (RCT) based secure downlink transmission framework for a time division duplex (TDD) multiple antennas base station (BS). In the proposed RCT scheme, multiple orthogonal pilot sequences (PSs) are simultaneously allocated to the legitimate user (LU), and the LU randomly selects one PS from the assigned PS set to transmit. Under either the PSA or PJA, we provide the detailed steps for the BS to identify the PS transmitted by the LU, and to simultaneously estimate channels of the LU and Eve. The probability that the BS makes an incorrect decision on the PS of the LU is analytically investigated. Finally, closed-form secure beamforming (SB) vectors are designed and optimized to enhance the secrecy rates during the downlink transmissions. Numerical results show that the secrecy performance is greatly improved compared to the conventional channel training scheme wherein only one PS is assigned to the LU.
Large-scale rumor spreading could pose severe social and economic damages. The emergence of online social networks along with the new media can even make rumor spreading more severe. Effective control of rumor spreading is of theoretical and practical significance. This paper takes the first step to understand how the blockchain technology can help limit the spread of rumors. Specifically, we develop a new paradigm for social networks embedded with the blockchain technology, which employs decentralized contracts to motivate trust networks as well as secure information exchange contract. We design a blockchain-based sequential algorithm which utilizes virtual information credits for each peer-to-peer information exchange. We validate the effectiveness of the blockchain-enabled social network on limiting the rumor spreading. Simulation results validate our algorithm design in avoiding rapid and intense rumor spreading, and motivate better mechanism design for trusted social networks.
Jan 09 2018 cs.CV
This paper addresses the problem of detecting relevant motion caused by objects of interest (e.g., person and vehicles) in large scale home surveillance videos. The traditional method usually consists of two separate steps, i.e., detecting moving objects with background subtraction running on the camera, and filtering out nuisance motion events (e.g., trees, cloud, shadow, rain/snow, flag) with deep learning based object detection and tracking running on cloud. The method is extremely slow and therefore not cost effective, and does not fully leverage the spatial-temporal redundancies with a pre-trained off-the-shelf object detector. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. ReMotENet parses an entire video clip in one forward pass of a neural network to achieve significant speedup. Meanwhile, it exploits the properties of home surveillance videos, e.g., relevant motion is sparse both spatially and temporally, and enhances 3D ConvNets with a spatial-temporal attention model and reference-frame subtraction to encourage the network to focus on the relevant moving objects. Experiments demonstrate that our method can achieve comparable or event better performance than the object detection based method but with three to four orders of magnitude speedup (up to 20k times) on GPU devices. Our network is efficient, compact and light-weight. It can detect relevant motion on a 15s surveillance video clip within 4-8 milliseconds on a GPU and a fraction of second (0.17-0.39) on a CPU with a model size of less than 1MB.
The use of low-resolution analog-to-digital converters (ADCs) can significantly reduce power consumption and hardware cost. However, severe nonlinear distortion due to low-resolution ADCs makes achieving reliable data transmission challenging. Particularly, for orthogonal frequency division multiplexing (OFDM) transmission, the orthogonality among subcarriers is destroyed, which invalidates the conventional linear receivers that highly relies on this orthogonality. In this study, we develop an efficient system architecture for OFDM transmission with ultra-low resolution ADC (e.g., 1-2 bits). A novel channel estimator is proposed to estimate the desired channel parameters without their priori distributions. In particular, we integrate the linear minimum mean squared error channel estimator into the generalized Turbo (GTurbo) framework and derive its corresponding extrinsic information to guarantee the convergence of the GTurbo-based algorithm. We also propose feasible schemes for automatic gain control, noise power estimation, and synchronization. Furthermore, we construct a proof-of-concept prototyping system and conduct over-the-air (OTA) experiments to examine the feasibility/reliability of the entire system. To the best of our knowledge, this is the first work that focused on system design and implementation of communications with low-resolution ADCs worldwide. Numerical simulation and OTA experiment results demonstrate that the proposed system supports reliable OFDM data transmission with ultra-low resolution ADCs.
This paper describes a procedure for the creation of large-scale video datasets for action classification and localization from unconstrained, realistic web data. The scalability of the proposed procedure is demonstrated by building a novel video benchmark, named SLAC (Sparsely Labeled ACtions), consisting of over 520K untrimmed videos and 1.75M clip annotations spanning 200 action categories. Using our proposed framework, annotating a clip takes merely 8.8 seconds on average. This represents a saving in labeling time of over 95% compared to the traditional procedure of manual trimming and localization of actions. Our approach dramatically reduces the amount of human labeling by automatically identifying hard clips, i.e., clips that contain coherent actions but lead to prediction disagreement between action classifiers. A human annotator can disambiguate whether such a clip truly contains the hypothesized action in a handful of seconds, thus generating labels for highly informative samples at little cost. We show that our large-scale dataset can be used to effectively pre-train action recognition models, significantly improving final metrics on smaller-scale benchmarks after fine-tuning. On Kinetics, UCF-101 and HMDB-51, models pre-trained on SLAC outperform baselines trained from scratch, by 2.0%, 20.1% and 35.4% in top-1 accuracy, respectively when RGB input is used. Furthermore, we introduce a simple procedure that leverages the sparse labels in SLAC to pre-train action localization models. On THUMOS14 and ActivityNet-v1.3, our localization model improves the mAP of baseline model by 8.6% and 2.5%, respectively.
Dec 27 2017 cs.CV
Despite recent progress, computational visual aesthetic is still challenging. Image cropping, which refers to the removal of unwanted scene areas, is an important step to improve the aesthetic quality of an image. However, it is challenging to evaluate whether cropping leads to aesthetically pleasing results because the assessment is typically subjective. In this paper, we propose a novel cascaded cropping regression (CCR) method to perform image cropping by learning the knowledge from professional photographers. The proposed CCR method improves the convergence speed of the cascaded method, which directly uses random-ferns regressors. In addition, a two-step learning strategy is proposed and used in the CCR method to address the problem of lacking labelled cropping data. Specifically, a deep convolutional neural network (CNN) classifier is first trained on large-scale visual aesthetic datasets. The deep CNN model is then designed to extract features from several image cropping datasets, upon which the cropping bounding boxes are predicted by the proposed CCR method. Experimental results on public image cropping datasets demonstrate that the proposed method significantly outperforms several state-of-the-art image cropping methods.