May 16 2018 cs.CR
In this paper, we present an end-to-end view of IoT security and privacy and a case study. Our contribution is three-fold. First, we present our end-to-end view of an IoT system and this view can guide risk assessment and design of an IoT system. We identify 10 basic IoT functionalities that are related to security and privacy. Based on this view, we systematically present security and privacy requirements in terms of IoT system, software, networking and big data analytics in the cloud. Second, using the end-to-end view of IoT security and privacy, we present a vulnerability analysis of the Edimax IP camera system. We are the first to exploit this system and have identified various attacks that can fully control all the cameras from the manufacturer. Our real-world experiments demonstrate the effectiveness of the discovered attacks and raise the alarms again for the IoT manufacturers. Third, such vulnerabilities found in the exploit of Edimax cameras and our previous exploit of Edimax smartplugs can lead to another wave of Mirai attacks, which can be either botnets or worm attacks. To systematically understand the damage of the Mirai malware, we model propagation of the Mirai and use the simulations to validate the modeling. The work in this paper raises the alarm again for the IoT device manufacturers to better secure their products in order to prevent malware attacks like Mirai.
May 11 2018 cs.CL
This paper proposes hybrid semi-Markov conditional random fields (SCRFs) for neural sequence labeling in natural language processing. Based on conventional conditional random fields (CRFs), SCRFs have been designed for the tasks of assigning labels to segments by extracting features from and describing transitions between segments instead of words. In this paper, we improve the existing SCRF methods by employing word-level and segment-level information simultaneously. First, word-level labels are utilized to derive the segment scores in SCRFs. Second, a CRF output layer and an SCRF output layer are integrated into an unified neural network and trained jointly. Experimental results on CoNLL 2003 named entity recognition (NER) shared task show that our model achieves state-of-the-art performance when no external knowledge is used.
Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expensive, and non-reproducible process, it hinders rapid prototyping of new VC technology. We address artifact assessment using an alternative, objective approach leveraging from prior work on spoofing countermeasures (CMs) for automatic speaker verification. Therein, CMs are used for rejecting `fake' inputs such as replayed, synthetic or converted speech but their potential for automatic speech artifact assessment remains unknown. This study serves to fill that gap. As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts. Equal error rate (EER) of the CM, a confusability index of VC samples with real human speech, serves as our artifact measure. Two clusters of VCC'18 entries are identified: low-quality ones with detectable artifacts (low EERs), and higher quality ones with less artifacts. None of the VCC'18 systems, however, is perfect: all EERs are < 30 % (the `ideal' value would be 50 %). Our preliminary findings suggest potential of CMs outside of their original application, as a supplemental optimization and benchmarking tool to enhance VC technology.
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems. The objective of the challenge was to perform speaker conversion (i.e. transform the vocal identity) of a source speaker to a target speaker while maintaining linguistic information. As an update to the previous challenge, we considered both parallel and non-parallel data to form the Hub and Spoke tasks, respectively. A total of 23 teams from around the world submitted their systems, 11 of them additionally participated in the optional Spoke task. A large-scale crowdsourced perceptual evaluation was then carried out to rate the submitted converted speech in terms of naturalness and similarity to the target speaker identity. In this paper, we present a brief summary of the state-of-the-art techniques for VC, followed by a detailed explanation of the challenge tasks and the results that were obtained.
This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward (FF) layers. The LSTM layers form a hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck (BN) features derived from narrowband speech using a deep neural network (DNN)-based state classifier, are employed as auxiliary input to further improve the quality of generated wideband speech. The experimental results of comparing several waveform modeling methods show that the HRNN-based method can achieve better speech quality and run-time efficiency than the dilated convolutional neural network (DCNN)-based method and the plain sample-level recurrent neural network (SRNN)-based method. Our proposed method also outperforms the conventional vocoder-based BWE method using LSTM-RNNs in terms of the subjective quality of the reconstructed wideband speech.
Jan 17 2018 cs.CV
The location of broken insulators in aerial images is a challenging task. This paper, focusing on the self-blast glass insulator, proposes a deep learning solution. We address the broken insulators location problem as a low signal-noise-ratio image location framework with two modules: 1) object detection based on Fast R-CNN, and 2) classification of pixels based on U-net. A diverse aerial image set of some grid in China is tested to validated the proposed approach. Furthermore, a comparison is made among different methods and the result shows that our approach is accurate and real-time.
Nov 16 2017 cs.CL
In this paper, we propose a sequential neural encoder with latent structured description (SNELSD) for modeling sentences. This model introduces latent chunk-level representations into conventional sequential neural encoders, i.e., recurrent neural networks (RNNs) with long short-term memory (LSTM) units, to consider the compositionality of languages in semantic modeling. An SNELSD model has a hierarchical structure that includes a detection layer and a description layer. The detection layer predicts the boundaries of latent word chunks in an input sentence and derives a chunk-level vector for each word. The description layer utilizes modified LSTM units to process these chunk-level vectors in a recurrent manner and produces sequential encoding outputs. These output vectors are further concatenated with word vectors or the outputs of a chain LSTM encoder to obtain the final sentence representation. All the model parameters are learned in an end-to-end manner without a dependency on additional text chunking or syntax parsing. A natural language inference (NLI) task and a sentiment analysis (SA) task are adopted to evaluate the performance of our proposed model. The experimental results demonstrate the effectiveness of the proposed SNELSD model on exploring task-dependent chunking patterns during the semantic modeling of sentences. Furthermore, the proposed method achieves better performance than conventional chain LSTMs and tree-structured LSTMs on both tasks.
Nov 15 2017 cs.CL
Modeling informal inference in natural language is very challenging. With the recent availability of large annotated data, it has become feasible to train complex models such as neural networks to perform natural language inference (NLI), which have achieved state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform NLI from the data? If not, how can NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we aim to answer these questions by enriching the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models with external knowledge further improve the state of the art on the Stanford Natural Language Inference (SNLI) dataset.
Aug 07 2017 cs.CL
The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task. This paper describes our system (alpha) that is ranked among the top in the Shared Task, on both the in-domain test set (obtaining a 74.9% accuracy) and on the cross-domain test set (also attaining a 74.9% accuracy), demonstrating that the model generalizes well to the cross-domain data. Our model is equipped with intra-sentence gated-attention composition which helps achieve a better performance. In addition to submitting our model to the Shared Task, we have also tested it on the Stanford Natural Language Inference (SNLI) dataset. We obtain an accuracy of 85.5%, which is the best reported result on SNLI when cross-sentence attention is not allowed, the same condition enforced in RepEval 2017.
Jul 13 2017 cs.CV
Recently, there has been a growing interest in monitoring brain activity for individual recognition system. So far these works are mainly focussing on single channel data or fragment data collected by some advanced brain monitoring modalities. In this study we propose new individual recognition schemes based on spatio-temporal resting state Electroencephalography (EEG) data. Besides, instead of using features derived from artificially-designed procedures, modified deep learning architectures which aim to automatically extract an individual's unique features are developed to conduct classification. Our designed deep learning frameworks are proved of a small but consistent advantage of replacing the $softmax$ layer with Random Forest. Additionally, a voting layer is added at the top of designed neural networks in order to tackle the classification problem arisen from EEG streams. Lastly, various experiments are implemented to evaluate the performance of the designed deep learning architectures; Results indicate that the proposed EEG-based individual recognition scheme yields a high degree of classification accuracy: $81.6\%$ for characteristics in high risk (CHR) individuals, $96.7\%$ for clinically stable first episode patients with schizophrenia (FES) and $99.2\%$ for healthy controls (HC).
Nov 15 2016 cs.AI
In this paper, we propose commonsense knowledge enhanced embeddings (KEE) for solving the Pronoun Disambiguation Problems (PDP). The PDP task we investigate in this paper is a complex coreference resolution task which requires the utilization of commonsense knowledge. This task is a standard first round test set in the 2016 Winograd Schema Challenge. In this task, traditional linguistic features that are useful for coreference resolution, e.g. context and gender information, are no longer effective anymore. Therefore, the KEE models are proposed to provide a general framework to make use of commonsense knowledge for solving the PDP problems. Since the PDP task doesn't have training data, the KEE models would be used during the unsupervised feature extraction process. To evaluate the effectiveness of the KEE models, we propose to incorporate various commonsense knowledge bases, including ConceptNet, WordNet, and CauseCom, into the KEE training process. We achieved the best performance by applying the proposed methods to the 2016 Winograd Schema Challenge. In addition, experiments conducted on the standard PDP task indicate that, the proposed KEE models could solve the PDP problems by achieving 66.7% accuracy, which is a new state-of-the-art performance.
Oct 27 2016 cs.CL
Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e.g., documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.
Sep 21 2016 cs.CL
Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to train neural network based inference models, which have shown to be very effective. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition, we achieve additional improvement. Particularly, incorporating syntactic parsing information contributes to our best result---it further improves the performance even when added to the already very strong model.
Mar 25 2016 cs.CL
This paper proposes a model to learn word embeddings with weighted contexts based on part-of-speech (POS) relevance weights. POS is a fundamental element in natural language. However, state-of-the-art word embedding models fail to consider it. This paper proposes to use position-dependent POS relevance weighting matrices to model the inherent syntactic relationship among words within a context window. We utilize the POS relevance weights to model each word-context pairs during the word embedding training process. The model proposed in this paper paper jointly optimizes word vectors and the POS relevance matrices. Experiments conducted on popular word analogy and word similarity tasks all demonstrated the effectiveness of the proposed method.
In this paper, we propose a new deep learning approach, called neural association model (NAM), for probabilistic reasoning in artificial intelligence. We propose to use neural networks to model association between any two events in a domain. Neural networks take one event as input and compute a conditional probability of the other event to model how likely these two events are to be associated. The actual meaning of the conditional probabilities varies between applications and depends on how the models are trained. In this work, as two case studies, we have investigated two NAM structures, namely deep neural networks (DNN) and relation-modulated neural nets (RMNN), on several probabilistic reasoning tasks in AI, including recognizing textual entailment, triple classification in multi-relational knowledge bases and commonsense reasoning. Experimental results on several popular datasets derived from WordNet, FreeBase and ConceptNet have all demonstrated that both DNNs and RMNNs perform equally well and they can significantly outperform the conventional methods available for these reasoning tasks. Moreover, compared with DNNs, RMNNs are superior in knowledge transfer, where a pre-trained model can be quickly extended to an unseen relation after observing only a few training samples. To further prove the effectiveness of the proposed models, in this work, we have applied NAMs to solving challenging Winograd Schema (WS) problems. Experiments conducted on a set of WS problems prove that the proposed models have the potential for commonsense reasoning.
Sep 08 2015 cs.CL
This paper proposes an algorithm to improve the calculation of confidence measure for spoken term detection (STD). Given an input query term, the algorithm first calculates a measurement named document ranking weight for each document in the speech database to reflect its relevance with the query term by summing all the confidence measures of the hypothesized term occurrences in this document. The confidence measure of each term occurrence is then re-estimated through linear interpolation with the calculated document ranking weight to improve its reliability by integrating document-level information. Experiments are conducted on three standard STD tasks for Tamil, Vietnamese and English respectively. The experimental results all demonstrate that the proposed algorithm achieves consistent improvements over the state-of-the-art method for confidence measure calculation. Furthermore, this algorithm is still effective even if a high accuracy speech recognizer is not available, which makes it applicable for the languages with limited speech resources.
Oct 29 2014 cs.SY
In this paper, we extend the popular integral control technique to systems evolving on Lie groups. More explicitly, we provide an alternative definition of "integral action" for proportional(-derivative)-controlled systems whose configuration evolves on a nonlinear space, where configuration errors cannot be simply added up to compute a definite integral. We then prove that the proposed integral control allows to cancel the drift induced by a constant bias in both first order (velocity) and second order (torque) control inputs for fully actuated systems evolving on abstract Lie groups. We illustrate the approach by 3-dimensional motion control applications.
Mar 20 2014 cs.CR
In this paper, we introduce a novel computer vision based attack that discloses inputs on a touch enabled device, while the attacker cannot see any text or popups from a video of the victim tapping on the touch screen. In the attack, we use the optical flow algorithm to identify touching frames where the finger touches the screen surface. We innovatively use intersections of detected edges of the touch screen to derive the homography matrix mapping the touch screen surface in video frames to a reference image of the virtual keyboard. We analyze the shadow formation around the fingertip and use the k-means clustering algorithm to identify touched points. Homography can then map these touched points to keys of the virtual keyboard. Our work is substantially different from existing work. We target password input and are able to achieve a high success rate. We target scenarios like classrooms, conferences and similar gathering places and use a webcam or smartphone camera. In these scenes, single-lens reflex (SLR) cameras and high-end camcorders used in related work will appear suspicious. To defeat such computer vision based attacks, we design, implement and evaluate the Privacy Enhancing Keyboard (PEK) where a randomized virtual keyboard is used to input sensitive information.
Mar 18 2010 cs.NI
Multicast is the ability of a communication network to accept a single message from an application and to deliver copies of the message to multiple recipients at different location. With the development of Internet, Multicast is widely applied in all kinds of multimedia real-time application: distributed multimedia systems, collaborative computing, video-conferencing, distance education, etc. In order to construct a delay-constrained multicast routing tree, average distance heuristic (ADH) algorithm is analyzed firstly. Then a delay-constrained algorithm called DCADH (delay-constrained average distance heuristic) is presented. By using ADH a least cost multicast routing tree can be constructed; if the path delay can't meet the delay upper bound, a shortest delay path which is computed by Dijkstra algorithm will be merged into the existing multicast routing tree to meet the delay upper bound. Simulation experiments show that DCADH has a good performance in achieving a low-cost multicast routing tree.