Feb 19 2018 cs.AI
Recently, the interest in reinforcement learning in game playing has been renewed. This is evidenced by the groundbreaking results achieved by AlphaGo. General Game Playing (GGP) provides a good testbed for reinforcement learning, currently one of the hottest fields of AI. In GGP, a specification of games rules is given. The description specifies a reinforcement learning problem, leaving programs to find strategies for playing well. Q-learning is one of the canonical reinforcement learning methods, which is used as baseline on some previous work (Banerjee & Stone, IJCAI 2007). We implement Q-learning in GGP for three small board games (Tic-Tac-Toe, Connect-Four, Hex). We find that Q-learning converges, and thus that this general reinforcement learning method is indeed applicable to General Game Playing. However, convergence is slow, in comparison to MCTS (a reinforcement learning method reported to achieve good results). We enhance Q-learning with Monte Carlo Search. This enhancement improves performance of pure Q-learning, although it does not yet out-perform MCTS. Future work is needed into the relation between MCTS and Q-learning, and on larger problem instances.
Feb 15 2018 cs.CR
Passwords are ubiquitous and most commonly used to authenticate users when logging into online services. Using high entropy passwords is critical to prevent unauthorized access and password policies emerged to enforce this requirement on passwords. However, with current methods of password storage, poor practices and server breaches have leaked many passwords to the public. To protect one's sensitive information in case of such events, passwords should be hidden from servers. Verifier-based password authenticated key exchange, proposed by Bellovin and Merrit (IEEE S\&P, 1992), allows authenticated secure channels to be established with a hash of a password (verifier). Unfortunately, this restricts password policies as passwords cannot be checked from their verifier. To address this issue, Kiefer and Manulis (ESORICS 2014) proposed zero-knowledge password policy check (ZKPPC). A ZKPPC protocol allows users to prove in zero knowledge that a hash of the user's password satisfies the password policy required by the server. Unfortunately, their proposal is not quantum resistant with the use of discrete logarithm-based cryptographic tools and there are currently no other viable alternatives. In this work, we construct the first post-quantum ZKPPC using lattice-based tools. To this end, we introduce a new randomised password hashing scheme for ASCII-based passwords and design an accompanying zero-knowledge protocol for policy compliance. Interestingly, our proposal does not follow the framework established by Kiefer and Manulis and offers an alternate construction without homomorphic commitments. Although our protocol is not ready to be used in practice, we think it is an important first step towards a quantum-resistant privacy-preserving password-based authentication and key exchange system.
Feb 13 2018 cs.CY
The personalized health care service utilizes the relational patient data and big data analytics to tailor the medication recommendations. However, most of the health care data are in unstructured form and it consumes a lot of time and effort to pull them into relational form. This study proposes a novel data lake architecture to reduce the data ingestion time and improve the precision of healthcare analytics. It also removes the data silos and enhances the analytics by allowing the connectivity to the third-party data providers (such as clinical lab results, chemist, insurance company,etc.). The data lake architecture uses the Hadoop Distributed File System (HDFS) to provide the storage for both structured and unstructured data. This study uses K-means clustering algorithm to find the patient clusters with similar health conditions. Subsequently, it employs a support vector machine to find the most successful healthcare recommendations for the each cluster. Our experiment results demonstrate the ability of data lake to reduce the time for ingesting data from various data vendors regardless of its format. Moreover, it is evident that the data lake poses the potential to generate clusters of patients more precisely than the existing approaches. It is obvious that the data lake provides a unified storage location for the data in its native format. It can also improve the personalized healthcare medication recommendations by removing the data silos.
Feb 07 2018 cs.CV
In this paper, we propose a geometry-contrastive generative adversarial network GC-GAN for generating facial expression images conditioned on geometry information. Specifically, given an input face and a target expression designated by a set of facial landmarks, an identity-preserving face can be generated guided by the target expression. In order to embed facial geometry onto a semantic manifold, we incorporate contrastive learning into conditional GANs. Experiment results demonstrate that the manifold is sensitive to the changes of facial geometry both globally and locally. Benefited from the semantic manifold, dynamic smooth transitions between different facial expressions are exhibited via geometry interpolation. Furthermore, our method can also be applied in facial expression transfer even there exist big differences in face shape between target faces and driving faces.
Feb 06 2018 cs.CV
In this paper, we propose a simple and effective geometric model fitting method to fit and segment multi-structure data even in the presence of severe outliers. We cast the task of geometric model fitting as a representative mode-seeking problem on hypergraphs. Specifically, a hypergraph is firstly constructed, where the vertices represent model hypotheses and the hyperedges denote data points. The hypergraph involves higher-order similarities (instead of pairwise similarities used on a simple graph), and it can characterize complex relationships between model hypotheses and data points. In addition, we develop a hypergraph reduction technique to remove "insignificant" vertices while retaining as many "significant" vertices as possible in the hypergraph. Based on the simplified hypergraph, we then propose a novel mode-seeking algorithm to search for representative modes within reasonable time. Finally, the proposed mode-seeking algorithm detects modes according to two key elements, i.e., the weighting scores of vertices and the similarity analysis between vertices. Overall, the proposed fitting method is able to efficiently and effectively estimate the number and the parameters of model instances in the data simultaneously. Experimental results demonstrate that the proposed method achieves significant superiority over several state-of-the-art model fitting methods on both synthetic data and real images.
Phase retrieval problem has been studied in various applications. It is an inverse problem without the standard uniqueness guarantee. To make complete theoretical analyses and devise efficient algorithms to recover the signal is sophisticated. In this paper, we come up with a model called \textitphase retrieval with background information which recovers the signal with the known background information from the intensity of their combinational Fourier transform spectrum. We prove that the uniqueness of phase retrieval can be guaranteed even considering those trivial solutions when the background information is sufficient. Under this condition, we construct a loss function and utilize the projected gradient descent method to search for the ground truth. We prove that the stationary point is the global optimum with probability 1. Numerical simulations demonstrate the projected gradient descent method performs well both for 1-D and 2-D signals. Furthermore, this method is quite robust to the Gaussian noise and the bias of the background information.
Multi-person articulated pose tracking in complex unconstrained videos is an important and challenging problem. In this paper, going along the road of top-down approaches, we propose a decent and efficient pose tracker based on pose flows. First, we design an online optimization framework to build association of cross-frame poses and form pose flows. Second, a novel pose flow non maximum suppression (NMS) is designed to robustly reduce redundant pose flows and re-link temporal disjoint pose flows. Extensive experiments show our method significantly outperforms best reported results on two standard Pose Tracking datasets (PoseTrack dataset and PoseTrack Challenge dataset) by 13 mAP 25 MOTA and 6 mAP 3 MOTA respectively. Moreover, in the case of working on detected poses in individual frames, the extra computation of proposed pose tracker is very minor, requiring 0.01 second per frame only.
Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.
Feb 05 2018 cs.CY
As a consequence of the huge advancement of the Electronic Health Record (EHR) in healthcare settings, the My Health Record (MHR) is introduced in Australia. However security and privacy of the MHR system have been encumbering the development of the system. Even though the MHR system is claimed as patient-cenred and patient-controlled, there are several instances where healthcare providers (other than the usual provider) and system operators who maintain the system can easily access the system and these unauthorised accesses can lead to a breach of the privacy of the patients. This is one of the main concerns of the consumers that affect the uptake of the system. In this paper, we propose a patient centred MHR framework which requests authorisation from the patient to access their sensitive health information. The proposed model increases the involvement and satisfaction of the patients in their healthcare and also suggests mobile security system to give an online permission to access the MHR system.
An Electronic Health Record (EHR) system must enable efficient availability of meaningful, accurate and complete data to assist improved clinical administration through the development, implementation and optimisation of clinical pathways. Therefore data integrity is the driving force in EHR systems and is an essential aspect of service delivery at all levels. However, preserving data integrity in EHR systems has become a major problem because of its consequences in promoting high standards of patient care. In this paper, we review and address the impact of data integrity of the use of EHR system and its associated issues. We determine and analyse three phases of data integrity of an EHR system. Finally, we also present an appropriate method to preserve the integrity in EHR systems. To analyse and evaluate the data integrity, one of the major clinical systems in Australia is considered. This will demonstrate the impact on quality and safety of patient care.
Though the big data benchmark suites like BigDataBench and CloudSuite have been used in architecture and system researches, we have not yet answered the fundamental issue-- what are abstractions of frequently-appearing units of computation in big data analytics, which we call big data dwarfs. For the first time, we identify eight big data dwarfs, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations among a wide variety of big data analytics workloads. We implement the eight dwarfs on different software stacks as the dwarf components. We present the application of the big data dwarfs to construct big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems while they are qualified for both earlier architecture design and later system evaluation across different architectures.
The artificial neural network shows powerful ability of inference, but it is still criticized for lack of interpretability and prerequisite needs of big dataset. This paper proposes the Rule-embedded Neural Network (ReNN) to overcome the shortages. ReNN first makes local-based inferences to detect local patterns, and then uses rules based on domain knowledge about the local patterns to generate rule-modulated map. After that, ReNN makes global-based inferences that synthesizes the local patterns and the rule-modulated map. To solve the optimization problem caused by rules, we use a two-stage optimization strategy to train the ReNN model. By introducing rules into ReNN, we can strengthen traditional neural networks with long-term dependencies which are difficult to learn with limited empirical dataset, thus improving inference accuracy. The complexity of neural networks can be reduced since long-term dependencies are not modeled with neural connections, and thus the amount of data needed to optimize the neural networks can be reduced. Besides, inferences from ReNN can be analyzed with both local patterns and rules, and thus have better interpretability. In this paper, ReNN has been validated with a time-series detection problem.
Jan 31 2018 cs.MA
In this paper we present a novel crowd simulation method by modeling the generation and contagion of panic emotion under multi-hazard circumstances. Specifically, we first classify hazards into different types (transient and persistent, concurrent and non-concurrent, static and dynamic ) based on their inherent characteristics. Then, we introduce the concept of perilous field for each hazard and further transform the critical level of the field to its invoked-panic emotion. After that, we propose an emotional contagion model to simulate the evolving process of panic emotion caused by multiple hazards in these situations. Finally, we introduce an Emotional Reciprocal Velocity Obstacles (ERVO) model to simulate the crowd behaviors by augmenting the traditional RVO model with emotional contagion, which combines the emotional impact and local avoidance together for the first time. Our experimental results show that this method can soundly generate realistic group behaviors as well as panic emotion dynamics in a crowd in multi-hazard environments.
Millimeter wave offers a sensible solution to the capacity crunch faced by 5G wireless communications. This paper comprehensively studies physical layer security in a multi-input single-output (MISO) millimeter wave system where multiple single-antenna eavesdroppers are randomly located. Concerning the specific propagation characteristics of millimeter wave, we investigate two secure transmission schemes, namely maximum ratio transmitting (MRT) beamforming and artificial noise (AN) beamforming. Specifically, we first derive closed-form expressions of the connection probability for both schemes. We then analyze the secrecy outage probability (SOP) in both non-colluding eavesdroppers and colluding eavesdroppers scenarios. Also, we maximize the secrecy throughput under a SOP constraint, and obtain optimal transmission parameters, especially the power allocation between AN and the information signal for AN beamforming. Numerical results are provided to verify our theoretical analysis. We observe that the density of eavesdroppers, the spatially resolvable paths of the destination and eavesdroppers all contribute to the secrecy performance and the parameter design of millimeter wave systems.
In the fifth generation era, the pervasive applications of Internet of Things and massive machine-type communications have initiated increasing research interests on the backscatter wireless powered communication (B-WPC) technique due to its ultra high energy efficiency and low cost. The ubiquitous B-WPC network is characterized by nodes with dynamic spatial positions and sporadic short packets, of which the performance has not been fully investigated. In this paper, we give a comprehensive analysis of a multi-antenna B-WPC network with sporadic short packets under a stochastic geometry framework. By exploiting a time-space Poisson point process model, the behavior of the network is well captured in a decentralized and asynchronous transmission way. We then analyze the energy and information outage performance in the energy harvest and backscatter modulation phases of the backscatter network, respectively. The optimal transmission slot length and division are obtained by maximizing the network-wide spatial throughput. Moreover, we find an interesting result that there exists the optimal tradeoff between the durations of the energy harvest and backscatter modulation phases for spatial throughput maximization. Numerical results are demonstrated to verify our analytical findings and show that this tradeoff region gets shrunk when the outage constraints become more stringent.
Jan 30 2018 cs.CV
Face recognition has achieved revolutionary advancement owing to the advancement of the deep convolutional neural network (CNN). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, traditional softmax loss of deep CNN usually lacks the power of discrimination. To address this problem, recently several loss functions such as central loss \citecenterloss, large margin softmax loss \citelsoftmax, and angular softmax loss \citesphereface have been proposed. All these improvement algorithms share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we design a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as cosine loss by L2 normalizing both features and weight vectors to remove radial variation, based on which a cosine margin term \emph$m$ is introduced to further maximize decision margin in angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. To test our approach, extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmark experiments, which confirms the effectiveness of our approach.
Jan 29 2018 cs.CR
In this work, we provide the first lattice-based group signature that offers full dynamicity (i.e., users have the flexibility in joining and leaving the group), and thus, resolve a prominent open problem posed by previous works. Moreover, we achieve this non-trivial feat in a relatively simple manner. Starting with Libert et al.'s fully static construction (Eurocrypt 2016) - which is arguably the most efficient lattice-based group signature to date, we introduce simple-but-insightful tweaks that allow to upgrade it directly into the fully dynamic setting. More startlingly, our scheme even produces slightly shorter signatures than the former, thanks to an adaptation of a technique proposed by Ling et al. (PKC 2013), allowing to prove inequalities in zero-knowledge. Our design approach consists of upgrading Libert et al.'s static construction (EUROCRYPT 2016) - which is arguably the most efficient lattice-based group signature to date - into the fully dynamic setting. Somewhat surprisingly, our scheme produces slightly shorter signatures than the former, thanks to a new technique for proving inequality in zero-knowledge without relying on any inequality check. The scheme satisfies the strong security requirements of Bootle et al.'s model (ACNS 2016), under the Short Integer Solution (SIS) and the Learning With Errors (LWE) assumptions. Furthermore, we demonstrate how to equip the obtained group signature scheme with the deniability functionality in a simple way. This attractive functionality, put forward by Ishida et al. (CANS 2016), enables the tracing authority to provide an evidence that a given user is not the owner of a signature in question. In the process, we design a zero-knowledge protocol for proving that a given LWE ciphertext does not decrypt to a particular message.
Jan 26 2018 cs.CR
Group signature is a fundamental cryptographic primitive, aiming to protect anonymity and ensure accountability of users. It allows group members to anonymously sign messages on behalf of the whole group, while incorporating a tracing mechanism to identify the signer of any suspected signature. Most of the existing group signature schemes, however, do not guarantee security once users' secret keys are exposed. To reduce potential damages caused by key exposure attacks, Song (CCS 2001) put forward the concept of forward-secure group signatures (FSGS). For the time being, all known secure FSGS schemes are based on number-theoretic assumptions, and are vulnerable against quantum computers. In this work, we construct the first lattice-based FSGS scheme. In Nakanishi et al.'s model, our scheme achieves forward-secure traceability under the Short Integer Solution (SIS) assumption, and offers full anonymity under the Learning With Errors (LWE) assumption. At the heart of our construction is a scalable lattice-based key-evolving mechanism, allowing users to periodically update their secret keys and to efficiently prove in zero-knowledge that the key-evolution process is done correctly. To realize this essential building block, we first employ the Bonsai-tree structure by Cash et al. (EUROCRYPT 2010) to handle the key evolution process, and then develop Langlois et al.'s construction (PKC 2014) to design its supporting zero-knowledge protocol. In comparison to all known lattice-based group signatures (that are \emphnot forward-secure), our scheme only incurs a very reasonable overhead: the bit-sizes of keys and signatures are at most O(log N), where N is the number of group users; and at most O(log^3 T), where T is the number of time periods.
Online news recommender systems aim to address the information explosion of news and make personalized recommendation for users. In general, news language is highly condensed, full of knowledge entities and common sense. However, existing methods are unaware of such external knowledge and cannot fully discover latent knowledge-level connections among news. The recommended results for a user are consequently limited to simple patterns and cannot be extended reasonably. Moreover, news recommendation also faces the challenges of high time-sensitivity of news and dynamic diversity of users' interests. To solve the above problems, in this paper, we propose a deep knowledge-aware network (DKN) that incorporates knowledge graph representation into news recommendation. DKN is a content-based deep recommendation framework for click-through rate prediction. The key component of DKN is a multi-channel and word-entity-aligned knowledge-aware convolutional neural network (KCNN) that fuses semantic-level and knowledge-level representations of news. KCNN treats words and entities as multiple channels, and explicitly keeps their alignment relationship during convolution. In addition, to address users' diverse interests, we also design an attention module in DKN to dynamically aggregate a user's history with respect to current candidate news. Through extensive experiments on a real online news platform, we demonstrate that DKN achieves substantial gains over state-of-the-art deep recommendation models. We also validate the efficacy of the usage of knowledge in DKN.
Jan 25 2018 cs.CR
Efficient user revocation is a necessary but challenging problem in many multi-user cryptosystems. Among known approaches, server-aided revocation yields a promising solution, because it allows to outsource the major workloads of system users to a computationally powerful third party, called the server, whose only requirement is to carry out the computations correctly. Such a revocation mechanism was considered in the settings of identity-based encryption and attribute-based encryption by Qin et al. (ESORICS 2015) and Cui et al. (ESORICS 2016), respectively. In this work, we consider the server-aided revocation mechanism in the more elaborate setting of predicate encryption (PE). The latter, introduced by Katz, Sahai, and Waters (EUROCRYPT 2008), provides fine-grained and role-based access to encrypted data and can be viewed as a generalization of identity-based and attribute-based encryption. Our contribution is two-fold. First, we formalize the model of server-aided revocable predicate encryption (SR-PE), with rigorous definitions and security notions. Our model can be seen as a non-trivial adaptation of Cui et al.'s work into the PE context. Second, we put forward a lattice-based instantiation of SR-PE. The scheme employs the PE scheme of Agrawal, Freeman and Vaikuntanathan (ASIACRYPT 2011) and the complete subtree method of Naor, Naor, and Lotspiech (CRYPTO 2001) as the two main ingredients, which work smoothly together thanks to a few additional techniques. Our scheme is proven secure in the standard model (in a selective manner), based on the hardness of the Learning With Errors (LWE) problem.
A class of specialized neurons, called lobula plate tangential cells (LPTCs) has been shown to respond strongly to wide-field motion. The classic model, elementary motion detector (EMD) and its improved model, two-quadrant detector (TQD) have been proposed to simulate LPTCs. Although EMD and TQD can percept background motion, their outputs are so cluttered that it is difficult to discriminate actual motion direction of the background. In this paper, we propose a max operation mechanism to model a newly-found transmedullary neuron Tm9 whose physiological properties do not map onto EMD and TQD. This proposed max operation mechanism is able to improve the detection performance of TQD in cluttered background by filtering out irrelevant motion signals. We will demonstrate the functionality of this proposed mechanism in wide-field motion perception.
Discriminating targets moving against a cluttered background is a huge challenge, let alone detecting a target as small as one or a few pixels and tracking it in flight. In the fly's visual system, a class of specific neurons, called small target motion detectors (STMDs), have been identified as showing exquisite selectivity for small target motion. Some of the STMDs have also demonstrated directional selectivity which means these STMDs respond strongly only to their preferred motion direction. Directional selectivity is an important property of these STMD neurons which could contribute to tracking small targets such as mates in flight. However, little has been done on systematically modeling these directional selective STMD neurons. In this paper, we propose a directional selective STMD-based neural network (DSTMD) for small target detection in a cluttered background. In the proposed neural network, a new correlation mechanism is introduced for direction selectivity via correlating signals relayed from two pixels. Then, a lateral inhibition mechanism is implemented on the spatial field for size selectivity of STMD neurons. Extensive experiments showed that the proposed neural network not only is in accord with current biological findings, i.e. showing directional preferences, but also worked reliably in detecting small targets against cluttered backgrounds.
Consider a data publishing setting for a data set with public and private features. The objective of the publisher is to maximize the amount of information about the public features in a revealed data set, while keeping the information leaked about the private features bounded. The goal of this paper is to analyze the performance of privacy mechanisms that are constructed to match the distribution learned from the data set. Two distinct scenarios are considered: (i) mechanisms are designed to provide a privacy guarantee for the learned distribution; and (ii) mechanisms are designed to provide a privacy guarantee for every distribution in a given neighborhood of the learned distribution. For the first scenario, given any privacy mechanism, upper bounds on the difference between the privacy-utility guarantees for the learned and true distributions are presented. In the second scenario, upper bounds on the reduction in utility incurred by providing a uniform privacy guarantee are developed.
Jan 17 2018 cs.CL
To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators. Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotator-generic and -specific information. The annotator-generic information is the common knowledge for entities easily mastered by the crowd. Finally, we build our Chinese NE tagger based on the LSTM-CRF model. In our experiments, we create two data sets for Chinese NER tasks from two domains. The experimental results show that our system achieves better scores than strong baseline systems.
In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we present an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. We then aim to find a \textitcorrection function that can be used to perturb the input distributions of each group in order to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expression for the correction function that can be used to compute it efficiently. We illustrate the use of the correction function for a recidivism prediction application derived from the ProPublica COMPAS dataset.
Jan 17 2018 cs.CL
The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-toright manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.
Jan 16 2018 cs.MM
The conventional reversible data hiding (RDH) algorithms often consider the host as a whole to embed a payload. In order to achieve satisfactory rate-distortion performance, the secret bits are embedded into the noise-like component of the host such as prediction errors. From the rate-distortion view, it may be not optimal since the data embedding units use the identical parameters. This motivates us to present a segmented data embedding strategy for RDH in this paper, in which the raw host could be partitioned into multiple sub-hosts such that each one can freely optimize and use the embedding parameters. Moreover, it enables us to apply different RDH algorithms within different sub-hosts, which is defined as ensemble. Notice that, the ensemble defined here is different from that in machine learning. Accordingly, the conventional operation corresponds to a special case of our work. Since it is a general strategy, we combine some state-of-the-art algorithms to construct a new system using the proposed embedding strategy to evaluate the rate-distortion performance. Experimental results have shown that, the ensemble RDH system outperforms the original versions, which has shown the superiority and applicability.
Jan 16 2018 cs.MM
In reversible data embedding, to avoid overflow and underflow problem, before data embedding, boundary pixels are recorded as side information, which may be losslessly compressed. The existing algorithms often assume that a natural image has little boundary pixels so that the size of side information is small. Accordingly, a relatively high pure payload could be achieved. However, there actually may exist a lot of boundary pixels in a natural image, implying that, the size of side information could be very large. Therefore, when to directly use the existing algorithms, the pure embedding capacity may be not sufficient. In order to address this problem, in this paper, we present a new and efficient framework to reversible data embedding in images that have lots of boundary pixels. The core idea is to losslessly preprocess boundary pixels so that it can significantly reduce the side information. Experimental results have shown the superiority and applicability of our work.
Transmitter-side channel state information (CSI) of the legitimate destination plays a critical role in physical layer secure transmissions. However, channel training procedure is vulnerable to the pilot spoofing attack (PSA) or pilot jamming attack (PJA) by an active eavesdropper (Eve), which inevitably results in severe private information leakage. In this paper, we propose a random channel training (RCT) based secure downlink transmission framework for a time division duplex (TDD) multiple antennas base station (BS). In the proposed RCT scheme, multiple orthogonal pilot sequences (PSs) are simultaneously allocated to the legitimate user (LU), and the LU randomly selects one PS from the assigned PS set to transmit. Under either the PSA or PJA, we provide the detailed steps for the BS to identify the PS transmitted by the LU, and to simultaneously estimate channels of the LU and Eve. The probability that the BS makes an incorrect decision on the PS of the LU is analytically investigated. Finally, closed-form secure beamforming (SB) vectors are designed and optimized to enhance the secrecy rates during the downlink transmissions. Numerical results show that the secrecy performance is greatly improved compared to the conventional channel training scheme wherein only one PS is assigned to the LU.
Large-scale rumor spreading could pose severe social and economic damages. The emergence of online social networks along with the new media can even make rumor spreading more severe. Effective control of rumor spreading is of theoretical and practical significance. This paper takes the first step to understand how the blockchain technology can help limit the spread of rumors. Specifically, we develop a new paradigm for social networks embedded with the blockchain technology, which employs decentralized contracts to motivate trust networks as well as secure information exchange contract. We design a blockchain-based sequential algorithm which utilizes virtual information credits for each peer-to-peer information exchange. We validate the effectiveness of the blockchain-enabled social network on limiting the rumor spreading. Simulation results validate our algorithm design in avoiding rapid and intense rumor spreading, and motivate better mechanism design for trusted social networks.
Jan 09 2018 cs.CV
This paper addresses the problem of detecting relevant motion caused by objects of interest (e.g., person and vehicles) in large scale home surveillance videos. The traditional method usually consists of two separate steps, i.e., detecting moving objects with background subtraction running on the camera, and filtering out nuisance motion events (e.g., trees, cloud, shadow, rain/snow, flag) with deep learning based object detection and tracking running on cloud. The method is extremely slow and therefore not cost effective, and does not fully leverage the spatial-temporal redundancies with a pre-trained off-the-shelf object detector. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. ReMotENet parses an entire video clip in one forward pass of a neural network to achieve significant speedup. Meanwhile, it exploits the properties of home surveillance videos, e.g., relevant motion is sparse both spatially and temporally, and enhances 3D ConvNets with a spatial-temporal attention model and reference-frame subtraction to encourage the network to focus on the relevant moving objects. Experiments demonstrate that our method can achieve comparable or event better performance than the object detection based method but with three to four orders of magnitude speedup (up to 20k times) on GPU devices. Our network is efficient, compact and light-weight. It can detect relevant motion on a 15s surveillance video clip within 4-8 milliseconds on a GPU and a fraction of second (0.17-0.39) on a CPU with a model size of less than 1MB.
The use of low-resolution analog-to-digital converters (ADCs) can significantly reduce power consumption and hardware cost. However, severe nonlinear distortion due to low-resolution ADCs makes achieving reliable data transmission challenging. Particularly, for orthogonal frequency division multiplexing (OFDM) transmission, the orthogonality among subcarriers is destroyed, which invalidates the conventional linear receivers that highly relies on this orthogonality. In this study, we develop an efficient system architecture for OFDM transmission with ultra-low resolution ADC (e.g., 1-2 bits). A novel channel estimator is proposed to estimate the desired channel parameters without their priori distributions. In particular, we integrate the linear minimum mean squared error channel estimator into the generalized Turbo (GTurbo) framework and derive its corresponding extrinsic information to guarantee the convergence of the GTurbo-based algorithm. We also propose feasible schemes for automatic gain control, noise power estimation, and synchronization. Furthermore, we construct a proof-of-concept prototyping system and conduct over-the-air (OTA) experiments to examine the feasibility/reliability of the entire system. To the best of our knowledge, this is the first work that focused on system design and implementation of communications with low-resolution ADCs worldwide. Numerical simulation and OTA experiment results demonstrate that the proposed system supports reliable OFDM data transmission with ultra-low resolution ADCs.
This paper describes a procedure for the creation of large-scale video datasets for action classification and localization from unconstrained, realistic web data. The scalability of the proposed procedure is demonstrated by building a novel video benchmark, named SLAC (Sparsely Labeled ACtions), consisting of over 520K untrimmed videos and 1.75M clip annotations spanning 200 action categories. Using our proposed framework, annotating a clip takes merely 8.8 seconds on average. This represents a saving in labeling time of over 95% compared to the traditional procedure of manual trimming and localization of actions. Our approach dramatically reduces the amount of human labeling by automatically identifying hard clips, i.e., clips that contain coherent actions but lead to prediction disagreement between action classifiers. A human annotator can disambiguate whether such a clip truly contains the hypothesized action in a handful of seconds, thus generating labels for highly informative samples at little cost. We show that our large-scale dataset can be used to effectively pre-train action recognition models, significantly improving final metrics on smaller-scale benchmarks after fine-tuning. On Kinetics, UCF-101 and HMDB-51, models pre-trained on SLAC outperform baselines trained from scratch, by 2.0%, 20.1% and 35.4% in top-1 accuracy, respectively when RGB input is used. Furthermore, we introduce a simple procedure that leverages the sparse labels in SLAC to pre-train action localization models. On THUMOS14 and ActivityNet-v1.3, our localization model improves the mAP of baseline model by 8.6% and 2.5%, respectively.
Dec 27 2017 cs.CV
Despite recent progress, computational visual aesthetic is still challenging. Image cropping, which refers to the removal of unwanted scene areas, is an important step to improve the aesthetic quality of an image. However, it is challenging to evaluate whether cropping leads to aesthetically pleasing results because the assessment is typically subjective. In this paper, we propose a novel cascaded cropping regression (CCR) method to perform image cropping by learning the knowledge from professional photographers. The proposed CCR method improves the convergence speed of the cascaded method, which directly uses random-ferns regressors. In addition, a two-step learning strategy is proposed and used in the CCR method to address the problem of lacking labelled cropping data. Specifically, a deep convolutional neural network (CNN) classifier is first trained on large-scale visual aesthetic datasets. The deep CNN model is then designed to extract features from several image cropping datasets, upon which the cropping bounding boxes are predicted by the proposed CCR method. Experimental results on public image cropping datasets demonstrate that the proposed method significantly outperforms several state-of-the-art image cropping methods.
Social awareness and social ties are becoming increasingly fashionable with emerging mobile and handheld devices. Social trust degree describing the strength of the social ties has drawn lots of research interests in many fields including secure cooperative communications. Such trust degree reflects the users' willingness for cooperation, which impacts the selection of the cooperative users in the practical networks. In this paper, we propose a cooperative relay and jamming selection scheme to secure communication based on the social trust degree under a stochastic geometry framework. We aim to analyze the involved secrecy outage probability (SOP) of the system's performance. To achieve this target, we propose a double Gamma ratio (DGR) approach through Gamma approximation. Based on this, the SOP is tractably obtained in closed form. The simulation results verify our theoretical findings, and validate that the social trust degree has dramatic influences on the network's secrecy performance.
Many predicted structured objects (e.g., sequences, matchings, trees) are evaluated using the F-score, alignment error rate (AER), or other multivariate performance measures. Since inductively optimizing these measures using training data is typically computationally difficult, empirical risk minimization of surrogate losses is employed, using, e.g., the hinge loss for (structured) support vector machines. These approximations often introduce a mismatch between the learner's objective and the desired application performance, leading to inconsistency. We take a different approach: adversarially approximate training data while optimizing the exact F-score or AER. Structured predictions under this formulation result from solving zero-sum games between a predictor seeking the best performance and an adversary seeking the worst while required to (approximately) match certain structured properties of the training data. We explore this approach for word alignment (AER evaluation) and named entity recognition (F-score evaluation) with linear-chain constraints.
Dec 15 2017 cs.NI
By means of network densification, ultra dense networks (UDNs) can efficiently broaden the network coverage and enhance the system throughput. In parallel, unmanned aerial vehicles (UAVs) communications and networking have attracted increasing attention recently due to their high agility and numerous applications. In this article, we present a vision of UAV-supported UDNs. Firstly, we present four representative scenarios to show the broad applications of UAV-supported UDNs in communications, caching and energy transfer. Then, we highlight the efficient power control in UAV-supported UDNs by discussing the main design considerations and methods in a comprehensive manner. Furthermore, we demonstrate the performance superiority of UAV-supported UDNs via case study simulations, compared to traditional fixed infrastructure based networks. In addition, we discuss the dominating technical challenges and open issues ahead.
Mobile-phones have facilitated the creation of field-portable, cost-effective imaging and sensing technologies that approach laboratory-grade instrument performance. However, the optical imaging interfaces of mobile-phones are not designed for microscopy and produce spatial and spectral distortions in imaging microscopic specimens. Here, we report on the use of deep learning to correct such distortions introduced by mobile-phone-based microscopes, facilitating the production of high-resolution, denoised and colour-corrected images, matching the performance of benchtop microscopes with high-end objective lenses, also extending their limited depth-of-field. After training a convolutional neural network, we successfully imaged various samples, including blood smears, histopathology tissue sections, and parasites, where the recorded images were highly compressed to ease storage and transmission for telemedicine applications. This method is applicable to other low-cost, aberrated imaging systems, and could offer alternatives for costly and bulky microscopes, while also providing a framework for standardization of optical images for clinical and biomedical applications.
Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.
Dec 12 2017 cs.MM
Steganography aims to conceal the very fact that the communication takes place, by embedding a message into a digit object such as image without introducing noticeable artifacts. A number of steganographic systems have been developed in past years, most of which, however, are confined to the laboratory conditions where the real-world use of steganography are rarely concerned. In this paper, we introduce an alternative perspective to steganography. A graph-theoretic model to steganography on social networks is presented to analyze real-world steganographic scenarios. In the graph, steganographic participants are corresponding to the vertices with meaningless unique identifiers. Each edge allows the two vertices to communicate with each other by any steganographic algorithm. Meanwhile, the edges are associated with weights to quantize the corresponding communication risk (or say cost). The optimization task is to minimize the overall risk, which is modeled as additive over the social network. We analyze different scenarios on a social network, and provide the suited solutions to the corresponding optimization tasks. We prove that a multiplicative probabilistic graph is equivalent to an additive weighted graph. From the viewpoint of an attacker, he may hope to detect suspicious communication channels, the data encoder(s) and the data decoder(s). We present limited detection analysis to steganographic communication on a network.
Dec 12 2017 cs.CV
Because of affected by weather conditions, camera pose and range, etc. Objects are usually small, blur, occluded and diverse pose in the images gathered from outdoor surveillance cameras or access control system. It is challenging and important to detect faces precisely for face recognition system in the field of public security. In this paper, we design a based on context modeling structure named Feature Hierarchy Encoder-Decoder Network for face detection(FHEDN), which can detect small, blur and occluded face with hierarchy by hierarchy from the end to the beginning likes encoder-decoder in a single network. The proposed network is consist of multiple context modeling and prediction modules, which are in order to detect small, blur, occluded and diverse pose faces. In addition, we analyse the influence of distribution of training set, scale of default box and receipt field size to detection performance in implement stage. Demonstrated by experiments, Our network achieves promising performance on WIDER FACE and FDDB benchmarks.
A new approach for efficiently exploring the configuration space and computing the free energy of large atomic and molecular systems is proposed, motivated by an analogy with reinforcement learning. There are two major components in this new approach. Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics. Like deep reinforcement learning, this biasing potential is trained on the fly using deep neural networks, with data collected judiciously from the exploration and an uncertainty indicator from the neural network model playing the role of the reward function. Parameterization using neural networks makes it feasible to handle cases with a large set of collective variables. This has the potential advantage that selecting precisely the right set of collective variables has now become less critical for capturing the structural transformations of the system. The method is illustrated by studying the full-atom, explicit solvent models of alanine dipeptide and tripeptide, as well as the system of a polyalanine-10 molecule with 20 collective variables.
Dec 06 2017 cs.CV
While attributes have been widely used for person re-identification (Re-ID) that matches the same person images across disjoint camera views, they are used either as extra features or for performing multi-task learning to assist the image-image person matching task. However, how to find a set of person images according to a given attribute description, which is very practical in many surveillance applications, remains a rarely investigated cross-modal matching problem in Person Re-ID. In this work, we present this challenge and employ adversarial learning to formulate the attribute-image cross-modal person Re-ID model. By imposing the regularization on the semantic consistency constraint across modalities, the adversarial learning enables generating image-analogous concepts for query attributes and getting it matched with image in both global level and semantic ID level. We conducted extensive experiments on three attribute datasets and demonstrated that the adversarial modelling is so far the most effective for the attributeimage cross-modal person Re-ID problem.
Dec 06 2017 cs.CV
It is well known that metamodel or surrogate modeling techniques have been widely applied in engineering problems due to their higher efficiency. However, with the increase of the linearity and dimensions, it is difficult for the present popular metamodeling techniques to construct reliable metamodel and apply to more and more complicated high dimensional problems. Recently, neural networks (NNs), especially deep neural networks (DNNs) have been widely recognized as feasible and effective tools for multidiscipline. Actually, some popular NNs, such as back propagation neural networks (BPNNs) can be regarded as a kind of metamodeling techniques. However, for high dimensional problems, it seems difficult for a BPNN to construct a metamodel. In this study, to construct the high accurate metamodel efficiently, another powerful NN, convolutional neural networks (CNNs) are introduced to construct metamodels. Considering the distinctive characteristic of the CNNs, the CNNs are considered to be a potential modeling tool to handle highly nonlinear and dimensional problems with the limited training samples.
Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily depends on the structure of social networks as well as the content of votings. In this paper, we investigate how to utilize these two factors in a comprehensive manner when doing voting recommendation. First, due to the fact that existing text mining methods such as topic model and semantic model cannot well process the content of votings that is typically short and ambiguous, we propose a novel Topic-Enhanced Word Embedding (TEWE) method to learn word and document representation by jointly considering their topics and semantics. Then we propose our Joint Topic-Semantic-aware social Matrix Factorization (JTS-MF) model for voting recommendation. JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization. To evaluate the performance of TEWE representation and JTS-MF model, we conduct extensive experiments on real online voting dataset. The results prove the efficacy of our approach against several state-of-the-art baselines.
Dec 05 2017 cs.AR
Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's historical spatial and temporal memory access pattern. However, they are error prone and the mis-predictions lead to cache pollution and exert extra pressure on memory subsystem. In this paper, a novel scheme of data cache prefetching with perceptron learning is proposed. The key idea is a two-level prefetching mechanism. A primary decision is made by utilizing previous table-based prefetching mechanism, e.g. stride prefetching or Markov prefetching, and then, a neural network, perceptron is taken to detect and trace program memory access patterns, to help reject those unnecessary prefetching decisions. The perceptron can learn from both local and global history in time and space, and can be easily implemented by hardware. This mechanism boost execution performance by ideally mitigating cache pollution and eliminating redundant memory request issued by prefetcher. Detailed evaluation and analysis were conducted based on SPEC CPU 2006 benchmarks. The simulation results show that generally the proposed scheme yields a geometric mean of 60.64%-83.84% decrease in prefetching memory requests without loss in instruction per cycle(IPC)(floating between -2.22% and 2.55%) and cache hit rate(floating between -1.67% and 2.46%). Though it is possible that perceptron may refuse useful blocks and thus cause minor raise in cache miss rate, lower memory request count can decrease average memory access latency, which compensate for the loss, and in the meantime, enhance overall performance in multi-programmed workloads.
In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion analysis. Previous works mainly focus on textual sentiment classification, however, text information can only disclose the "tip of the iceberg" about users' true opinions, of which the most are unobserved but implied by other sources of information such as social relation and users' profile. To address this problem, in this paper we investigate how to predict possibly existing sentiment links in the presence of heterogeneous information. First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users' sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method. Then we propose a novel and flexible end-to-end Signed Heterogeneous Information Network Embedding (SHINE) framework to extract users' latent representations from heterogeneous networks and predict the sign of unobserved sentiment links. SHINE utilizes multiple deep autoencoders to map each user into a low-dimension feature space while preserving the network structure. We demonstrate the superiority of SHINE over state-of-the-art baselines on link prediction and node recommendation in two real-world datasets. The experimental results also prove the efficacy of SHINE in cold start scenario.
Dec 04 2017 cs.CL
In this paper, we propose a model using generative adversarial net (GAN) to generate realistic text. Instead of using standard GAN, we combine variational autoencoder (VAE) with generative adversarial net. The use of high-level latent random variables is helpful to learn the data distribution and solve the problem that generative adversarial net always emits the similar data. We propose the VGAN model where the generative model is composed of recurrent neural network and VAE. The discriminative model is a convolutional neural network. We train the model via policy gradient. We apply the proposed model to the task of text generation and compare it to other recent neural network based models, such as recurrent neural network language model and SeqGAN. We evaluate the performance of the model by calculating negative log-likelihood and the BLEU score. We conduct experiments on three benchmark datasets, and results show that our model outperforms other previous models.
Dec 01 2017 cs.CV
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly advantages in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which gives rise to CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101 and HMDB51.
This work proposes a novel location-based multi-group multicast framework which is termed as non-orthogonal multiple access (NOMA) assisted multi-region geocast. This novel spectrum sharing framework exploits the NOMA technology to realize the simultaneous delivery of different messages to different user groups characterized by different geographical locations. The essence of the proposed framework is that the geographical information of user groups unites NOMA and multi-group multicast to enhance the spectral efficiency (SE) and the energy efficiency (EE) of wireless transmissions. Specifically, we investigate the downlink beamforming design of the proposed framework in multiple-input single-output (MISO) settings. The decoding strategy for NOMA is designed and guaranteed by users' geographical information and required quality of service (QoS). The majorization and minimization (MM) algorithm is exploited to solve the non-convex and intractable problems therein. Comprehensive numerical experiments are further provided to show that NOMA holds tremendous promise but also limitations in terms of SE and EE, compared with spatial division multiple access (SDMA) and orthogonal multiple access (OMA).
The Alternating Direction Method of Multipliers (ADMM) decoding of Low Density Parity Check (LDPC) codes has received many attentions due to its excellent performance at the error floor region. In this paper, we develop a parameter-free decoder based on Linear Program (LP) decoding by replacing the binary constraint with the intersection of a box and an $\ell_p$ sphere. An efficient $\ell_2$-box ADMM is designed to handle this model in a distributed fashion. Numerical experiments demonstrates that our decoder attains better adaptability to different Signal-to-Noise Ratio and channels.
Recent systems on structured prediction focus on increasing the level of structural dependencies within the model. However, our study suggests that complex structures entail high overfitting risks. To control the structure-based overfitting, we propose to conduct structure regularization decoding (SR decoding). The decoding of the complex structure model is regularized by the additionally trained simple structure model. We theoretically analyze the quantitative relations between the structural complexity and the overfitting risk. The analysis shows that complex structure models are prone to the structure-based overfitting. Empirical evaluations show that the proposed method improves the performance of the complex structure models by reducing the structure-based overfitting. On the sequence labeling tasks, the proposed method substantially improves the performance of the complex neural network models. The maximum F1 error rate reduction is 36.4% for the third-order model. The proposed method also works for the parsing task. The maximum UAS improvement is 5.5% for the tri-sibling model. The results are competitive with or better than the state-of-the-art results.
Nov 28 2017 cs.CV
Most thermal infrared (TIR) tracking methods are discriminative, which treat the tracking problem as a classification task. However, the objective of the classifier (label prediction) is not coupled to the objective of the tracker (location estimation). The classification task focuses on the between-class difference of the arbitrary objects, while the tracking task mainly deals with the within-class difference of the same objects. In this paper, we cast the TIR tracking problem as a similarity verification task, which is well coupled to the objective of tracking task. We propose a TIR tracker via a hierarchical Siamese convolutional neural network (CNN), named HSNet. To obtain both spatial and semantic features of the TIR object, we design a Siamese CNN coalescing the multiple hierarchical convolutional layers. Then, we train this network end to end on a large visible video detection dataset to learn the similarity between paired objects before we transfer the network into the TIR domain. Next, this pre-trained Siamese network is used to evaluate the similarity between the target template and target candidates. Finally, we locate the most similar one as the tracked target. Extensive experimental results on the benchmarks: VOT-TIR 2015 and VOT-TIR 2016, show that our proposed method achieves favorable performance against the state-of-the-art methods.
The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces "fake" samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on real-world datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over state-of-the-art baselines.
Nov 21 2017 cs.CL
In this paper, we propose a novel BTG-forest-based alignment method. Based on a fast unsupervised initialization of parameters using variational IBM models, we synchronously parse parallel sentences top-down and align hierarchically under the constraint of BTG. Our two-step method can achieve the same run-time and comparable translation performance as fast_align while it yields smaller phrase tables. Final SMT results show that our method even outperforms in the experiment of distantly related languages, e.g., English-Japanese.
We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy.
We present a robust and precise localization system that achieves centimeter-level localization accuracy in disparate city scenes. Our system adaptively uses information from complementary sensors such as GNSS, LiDAR, and IMU to achieve high localization accuracy and resilience in challenging scenes, such as urban downtown, highways, and tunnels. Rather than relying only on LiDAR intensity or 3D geometry, we make innovative use of LiDAR intensity and altitude cues to significantly improve localization system accuracy and robustness. Our GNSS RTK module utilizes the help of the multi-sensor fusion framework and achieves a better ambiguity resolution success rate. An error-state Kalman filter is applied to fuse the localization measurements from different sources with novel uncertainty estimation. We validate, in detail, the effectiveness of our approaches, achieving 5-10cm RMS accuracy and outperforming previous state-of-the-art systems. Importantly, our system, while deployed in a large autonomous driving fleet, made our vehicles fully autonomous in crowded city streets despite road construction that occurred from time to time. A dataset including more than 60 km real traffic driving in various urban roads is used to comprehensively test our system.
Nov 16 2017 cs.CL
In this paper, we introduce DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, aiming to tackle real-world MRC problems. In comparison to prior datasets, DuReader has the following characteristics: (a) the questions and the documents are all extracted from real application data, and the answers are human generated; (b) it provides rich annotations for question types, especially yes-no and opinion questions, which take a large proportion in real users' questions but have not been well studied before; (c) it provides multiple answers for each question. The first release of DuReader contains 200k questions, 1,000k documents, and 420k answers, which, to the best of our knowledge, is the largest Chinese MRC dataset so far. Experimental results show there exists big gap between the state-of-the-art baseline systems and human performance, which indicates DuReader is a challenging dataset that deserves future study. The dataset and the code of the baseline systems are publicly available now.
Nov 15 2017 cs.CV
Scene text detection is a challenging problem in computer vision. In this paper, we propose a novel text detection network based on prevalent object detection frameworks. In order to obtain stronger semantic feature, we adopt ResNet as feature extraction layers and exploit multi-level feature by combining hierarchical convolutional networks. A vertical proposal mechanism is utilized to avoid proposal classification, while regression layer remains working to improve localization accuracy. Our approach evaluated on ICDAR2013 dataset achieves F-measure of 0.91, which outperforms previous state-of-the-art results in scene text detection.
While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.
In this paper, we derive a temporal arbitrage policy for storage via reinforcement learning. Real-time price arbitrage is an important source of revenue for storage units, but designing good strategies have proven to be difficult because of the highly uncertain nature of the prices. Instead of current model predictive or dynamic programming approaches, we use reinforcement learning to design a two-thresholds policy. This policy is learned through repeated charge and discharge actions performed by the storage unit through updating a value matrix. We design a reward function that does not only reflect the instant profit of charge/discharge decisions but also incorporate the history information. Simulation results demonstrate our designed reward function leads to significant performance improvement compared with existing algorithms.
Different from the traditional benchmarking methodology that creates a new benchmark or proxy for every possible workload, this paper presents a scalable big data benchmarking methodology. Among a wide variety of big data analytics workloads, we identify eight big data dwarfs, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. We implement the eight dwarfs on different software stacks, e.g., OpenMP, MPI, Hadoop as the dwarf components. For the purpose of architecture simulation, we construct and tune big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks preserve the micro-architecture, memory, and I/O characteristics, and they shorten the simulation time by 100s times while maintain the average micro-architectural data accuracy above 90 percentage on both X86 64 and ARMv8 processors. We will open-source the big data dwarf components and proxy benchmarks soon.
Nov 07 2017 cs.DB
Currently data explosion poses great challenges to approximate aggregation on efficiency and accuracy. To address this problem, we propose a novel approach to calculate aggregation answers in high accuracy using only a small share of data. We introduce leverages to reflect individual differences of samples from the statistical perspective. Two kinds of estimators, the leverage-based estimator and the sketch estimator (a "rough picture" of the aggregation answer), are in constraint relations and iteratively improved according to the actual conditions until their difference is below a threshold. Due to the iteration mechanism and the leverages, our approach achieves high accuracy. Moreover, some features, including not requiring recording sampled data and easy to extend to various execution modes (such as, the online mode), make our approach well suited to deal with big data. Experiments show that our approach has extraordinary performance, and when compared with the uniform sampling, our approach can achieve high-quality answers with only 1/3 of the same sample size.
Large scale applications are increasingly built by composing sets of microservices. In this model the functionality for a single application might be split across 100s or 1000s of microservices. Resource provisioning for these applications is complex, requiring administrators to understand both the functioning of each microservice, and dependencies between microservices in an application. In this paper we present ThrottleBot, a system that automates the process of determining what resource when allocated to which microservice is likely to have the greatest impact on application performance. We demonstrate the efficacy of our approach by applying ThrottleBot to both synthetic and real world applications. We believe that ThrottleBot when combined with existing microservice orchestrators, e.g., Kubernetes, enables push-button deployment of web scale applications.
Nov 02 2017 cs.CR
Energy has been increasingly generated or collected by different entities on the power grid (e.g., universities, hospitals and householdes) via solar panels, wind turbines or local generators in the past decade. With local energy, such electricity consumers can be considered as "microgrids" which can simulataneously generate and consume energy. Some microgrids may have excessive energy that can be shared to other power consumers on the grid. To this end, all the entities have to share their local private information (e.g., their local demand, local supply and power quality data) to each other or a third-party to find and implement the optimal energy sharing solution. However, such process is constrained by privacy concerns raised by the microgrids. In this paper, we propose a privacy preserving scheme for all the microgrids which can securely implement their energy sharing against both semi-honest and colluding adversaries. The proposed approach includes two secure communication protocols that can ensure quantified privacy leakage and handle collusions.
Oct 31 2017 cs.MM
A large-scale video quality dataset called the VideoSet has been constructed recently to measure human subjective experience of H.264 coded video in terms of the just-noticeable-difference (JND). It measures the first three JND points of 5-second video of resolution 1080p, 720p, 540p and 360p. Based on the VideoSet, we propose a method to predict the satisfied-user-ratio (SUR) curves using a machine learning framework. First, we partition a video clip into local spatial-temporal segments and evaluate the quality of each segment using the VMAF quality index. Then, we aggregate these local VMAF measures to derive a global one. Finally, the masking effect is incorporated and the support vector regression (SVR) is used to predict the SUR curves, from which the JND points can be derived. Experimental results are given to demonstrate the performance of the proposed SUR prediction method.
Millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) with hybrid precoding is a promising technique for the future 5G wireless communications. Due to a large number of antennas but a much smaller number of radio frequency (RF) chains, estimating the high-dimensional mmWave massive MIMO channel will bring the large pilot overhead. To overcome this challenge, this paper proposes a super-resolution channel estimation scheme based on two-dimensional (2D) u- nitary ESPRIT algorithm. By exploiting the angular sparsity of mmWave channels, the continuously distributed angle of arrivals/departures (AoAs/AoDs) can be jointly estimated with high accuracy. Specifically, by designing the uplink training signals at both base station (BS) and mobile station (MS), we first use low pilot overhead to estimate a low-dimensional effective channel, which has the same shift-invariance of array response as the high-dimensional mmWave MIMO channel to be estimated. From the low-dimensional effective channel, the super- resolution estimates of AoAs and AoDs can be jointly obtained by exploiting the 2D unitary ESPRIT channel estimation algorithm. Furthermore, the associated path gains can be acquired based on the least squares (LS) criterion. Finally, we can reconstruct the high-dimensional mmWave MIMO channel according to the obtained AoAs, AoDs, and path gains. Simulation results have confirmed that the proposed scheme is superior to conventional schemes with a much lower pilot overhead.
A controllable crack propagation (CCP) strategy is suggested. It is well known that crack always leads the failure by crossing the critical domain in engineering structure. Therefore, the CCP method is proposed to control the crack to propagate along the specified path, which is away from the critical domain. To complete this strategy, two optimization methods are engaged. Firstly, a back propagation neural network (BPNN) assisted particle swarm optimization (PSO) is suggested. In this method, to improve the efficiency of CCP, the BPNN is used to build the metamodel instead of the forward evaluation. Secondly, the popular PSO is used. Considering the optimization iteration is a time consuming process, an efficient reanalysis based extended finite element methods (X-FEM) is used to substitute the complete X-FEM solver to calculate the crack propagation path. Moreover, an adaptive subdomain partition strategy is suggested to improve the fitting accuracy between real crack and specified paths. Several typical numerical examples demonstrate that both optimization methods can carry out the CCP. The selection of them should be determined by the tradeoff between efficiency and accuracy.
Oct 24 2017 cs.CL
Verbs are important in semantic understanding of natural language. Traditional verb representations, such as FrameNet, PropBank, VerbNet, focus on verbs' roles. These roles are too coarse to represent verbs' semantics. In this paper, we introduce verb patterns to represent verbs' semantics, such that each pattern corresponds to a single semantic of the verb. First we analyze the principles for verb patterns: generality and specificity. Then we propose a nonparametric model based on description length. Experimental results prove the high effectiveness of verb patterns. We further apply verb patterns to context-aware conceptualization, to show that verb patterns are helpful in semantic-related tasks.
An Electronic Health Record (EHR) is designed to store diverse data accurately from a range of health care providers and to capture the status of a patient by a range of health care providers across time. Realising the numerous benefits of the system, EHR adoption is growing globally and many countries invest heavily in electronic health systems. In Australia, the Government invested $467 million to build key components of the Personally Controlled Electronic Health Record (PCEHR) system in July 2012. However, in the last three years, the uptake from individuals and health care providers has not been satisfactory. Unauthorised access of the PCEHR was one of the major barriers. We propose an improved access control model for the PCEHR system to resolve the unauthorised access issue. We discuss the unauthorised access issue with real examples and present a potential solution to overcome the issue to make the PCEHR system a success in Australia.