May 16 2018 cs.CR
In this paper, we present an end-to-end view of IoT security and privacy and a case study. Our contribution is three-fold. First, we present our end-to-end view of an IoT system and this view can guide risk assessment and design of an IoT system. We identify 10 basic IoT functionalities that are related to security and privacy. Based on this view, we systematically present security and privacy requirements in terms of IoT system, software, networking and big data analytics in the cloud. Second, using the end-to-end view of IoT security and privacy, we present a vulnerability analysis of the Edimax IP camera system. We are the first to exploit this system and have identified various attacks that can fully control all the cameras from the manufacturer. Our real-world experiments demonstrate the effectiveness of the discovered attacks and raise the alarms again for the IoT manufacturers. Third, such vulnerabilities found in the exploit of Edimax cameras and our previous exploit of Edimax smartplugs can lead to another wave of Mirai attacks, which can be either botnets or worm attacks. To systematically understand the damage of the Mirai malware, we model propagation of the Mirai and use the simulations to validate the modeling. The work in this paper raises the alarm again for the IoT device manufacturers to better secure their products in order to prevent malware attacks like Mirai.
Apr 24 2018 cs.CV
Skeleton-based action recognition task is entangled with complex spatio-temporal variations of skeleton joints, and remains challenging for Recurrent Neural Networks (RNNs). In this work, we propose a temporal-then-spatial recalibration scheme to alleviate such complex variations, resulting in an end-to-end Memory Attention Networks (MANs) which consist of a Temporal Attention Recalibration Module (TARM) and a Spatio-Temporal Convolution Module (STCM). Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence. The STCM treats the attention calibrated skeleton joint sequences as images and leverages the Convolution Neural Networks (CNNs) to further model the spatial and temporal information of skeleton data. These two modules (TARM and STCM) seamlessly form a single network architecture that can be trained in an end-to-end fashion. MANs significantly boost the performance of skeleton-based action recognition and achieve the best results on four challenging benchmark datasets: NTU RGB+D, HDM05, SYSU-3D and UT-Kinect.
We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to aligning the image based on vanishing points, predicting multiple layout elements (corners, boundaries, size and translation), and fitting a constrained Manhattan layout to the resulting predictions. Our method compares well in speed and accuracy to other existing work on panoramas, achieves among the best accuracy for perspective images, and can handle both cuboid-shaped and more general Manhattan layouts.
Jan 26 2018 cs.CR
Smartphone carrier companies rely on mobile networks for keeping an accurate record of customer data usage for billing purposes. In this paper, we present a vulnerability that allows an attacker to force the victim's smartphone to consume data through the cellular network by starting the data download on the victim's cell phone without the victim's knowledge. The attack is based on switching the victim's smartphones from the Wi-Fi network to the cellular network while downloading a large data file. This attack has been implemented in real-life scenarios where the test's outcomes demonstrate that the attack is feasible and that mobile networks do not record customer data usage accurately.
Oct 27 2017 cs.CV
Inferring the location, shape, and class of each object in a single image is an important task in computer vision. In this paper, we aim to predict the full 3D parse of both visible and occluded portions of the scene from one RGBD image. We parse the scene by modeling objects as detailed CAD models with class labels and layouts as 3D planes. Such an interpretation is useful for visual reasoning and robotics, but difficult to produce due to the high degree of occlusion and the diversity of object classes. We follow the recent approaches that retrieve shape candidates for each RGBD region proposal, transfer and align associated 3D models to compose a scene that is consistent with observations. We propose to use support inference to aid interpretation and propose a retrieval scheme that uses convolutional neural networks (CNNs) to classify regions and retrieve objects with similar shapes. We demonstrate the performance of our method compared with the state-of-the-art on our new NYUd v2 dataset annotations which are semi-automatically labelled with detailed 3D shapes for all the objects.
The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data. Inspired by the nature of human perception of 3D shapes as a collection of simple parts, we explore such an abstract shape representation based on primitives. Given a single depth image of an object, we present 3D-PRNN, a generative recurrent neural network that synthesizes multiple plausible shapes composed of a set of primitives. Our generative model encodes symmetry characteristics of common man-made objects, preserves long-range structural coherence, and describes objects of varying complexity with a compact representation. We also propose a method based on Gaussian Fields to generate a large scale dataset of primitive-based shape representations to train our network. We evaluate our approach on a wide range of examples and show that it outperforms nearest-neighbor based shape retrieval methods and is on-par with voxel-based generative models while using a significantly reduced parameter space.
The classical uncertainty principle of harmonic analysis states that a nontrivial function and its Fourier transform cannot both be sharply localized. It plays an important role in signal processing and physics. This paper generalizes the uncertainty principle for measurable sets from complex domain to hypercomplex domain using quaternion algebras, associated with the Quaternion Fourier transform. The performance is then evaluated in signal recovery problems where there is an interplay of missing and time-limiting data.
Jul 06 2016 cs.CV
Rotation moment invariants have been of great interest in image processing and pattern recognition. This paper presents a novel kind of rotation moment invariants based on the Slepian functions, which were originally introduced in the method of separation of variables for Helmholtz equations. They were first proposed for time series by Slepian and his coworkers in the 1960s. Recent studies have shown that these functions have an good performance in local approximation compared to other approximation basis. Motivated by the good approximation performance, we construct the Slepian-based moments and derive the rotation invariant. We not only theoretically prove the invariance, but also discuss the experiments on real data. The proposed rotation invariants are robust to noise and yield decent performance in facial expression classification.
One initial goal for the DRMF is to seed our digital compendium with fundamental orthogonal polynomial formulae. We had used the data from the NIST Digital Library of Mathematical Functions (DLMF) as initial seed for our DRMF project. The DLMF input LaTeX source already contains some semantic information encoded using a highly customized set of semantic LaTeX macros. Those macros could be converted to content MathML using LaTeXML. During that conversion the semantics were translated to an implicit DLMF content dictionary. This year, we have developed a semantic enrichment process whose goal is to infer semantic information from generic LaTeX sources. The generated context-free semantic information is used to build DRMF formula home pages for each individual formula. We demonstrate this process using selected chapters from the book "Hypergeometric Orthogonal Polynomials and their $q$-Analogues" (2010) by Koekoek, Lesky and Swarttouw (KLS) as well as an actively maintained addendum to this book by Koornwinder (KLSadd). The generic input KLS and KLSadd LaTeX sources describe the printed representation of the formulae, but does not contain explicit semantic information. See http://drmf.wmflabs.org.
Apr 10 2015 cs.CV
One major goal of vision is to infer physical models of objects, surfaces, and their layout from sensors. In this paper, we aim to interpret indoor scenes from one RGBD image. Our representation encodes the layout of walls, which must conform to a Manhattan structure but is otherwise flexible, and the layout and extent of objects, modeled with CAD-like 3D shapes. We represent both the visible and occluded portions of the scene, producing a complete 3D parse. Such a scene interpretation is useful for robotics and visual reasoning, but difficult to produce due to the well-known challenge of segmentation, the high degree of occlusion, and the diversity of objects in indoor scene. We take a data-driven approach, generating sets of potential object regions, matching to regions in training images, and transferring and aligning associated 3D models while encouraging fit to observations and overall consistency. We demonstrate encouraging results on the NYU v2 dataset and highlight a variety of interesting directions for future work.
Feb 17 2015 cs.GR
We present a multi-scale approach to sketch-based shape retrieval. It is based on a novel multi-scale shape descriptor called Pyramidof- Parts, which encodes the features and spatial relationship of the semantic parts of query sketches. The same descriptor can also be used to represent 2D projected views of 3D shapes, allowing effective matching of query sketches with 3D shapes across multiple scales. Experimental results show that the proposed method outperforms the state-of-the-art method, whether the sketch segmentation information is obtained manually or automatically by considering each stroke as a semantic part.
Dec 08 2009 cs.FL
A language L is prefix-closed if, whenever a word w is in L, then every prefix of w is also in L. We define suffix-, factor-, and subword-closed languages in the same way, where by subword we mean subsequence. We study the quotient complexity (usually called state complexity) of operations on prefix-, suffix-, factor-, and subword-closed languages. We find tight upper bounds on the complexity of the prefix-, suffix-, factor-, and subword-closure of arbitrary languages, and on the complexity of boolean operations, concatenation, star and reversal in each of the four classes of closed languages. We show that repeated application of positive closure and complement to a closed language results in at most four distinct languages, while Kleene closure and complement gives at most eight languages.