Nov 09 2017 cs.CR
Nov 09 2017 cs.CR
A recent report indicates that there is a new malicious app introduced every 4 seconds. This rapid malware distribution rate causes existing malware detection systems to fall far behind, allowing malicious apps to escape vetting efforts and be distributed by even legitimate app stores. When trusted downloading sites distribute malware, several negative consequences ensue. First, the popularity of these sites would allow such malicious apps to quickly and widely infect devices. Second, analysts and researchers who rely on machine learning based detection techniques may also download these apps and mistakenly label them as benign since they have not been disclosed as malware. These apps are then used as part of their benign dataset during model training and testing. The presence of contaminants in benign dataset can compromise the effectiveness and accuracy of their detection and classification techniques. To address this issue, we introduce PUDROID (Positive and Unlabeled learning-based malware detection for Android) to automatically and effectively remove contaminants from training datasets, allowing machine learning based malware classifiers and detectors to be more effective and accurate. To further improve the performance of such detectors, we apply a feature selection strategy to select pertinent features from a variety of features. We then compare the detection rates and accuracy of detection systems using two datasets; one using PUDROID to remove contaminants and the other without removing contaminants. The results indicate that once we remove contaminants from the datasets, we can significantly improve both malware detection rate and detection accuracy
Oct 03 2017 cs.CV
For intelligent robotics applications, extending 3D mapping to 3D semantic mapping enables robots to, not only localize themselves with respect to the scene's geometrical features but also simultaneously understand the higher level meaning of the scene contexts. Most previous methods focus on geometric 3D reconstruction and scene understanding independently notwithstanding the fact that joint estimation can boost the accuracy of the semantic mapping. In this paper, a dense RGB-D semantic mapping system with a Pixel-Voxel network is proposed, which can perform dense 3D mapping while simultaneously recognizing and semantically labelling each point in the 3D map. The proposed Pixel-Voxel network obtains global context information by using PixelNet to exploit the RGB image and meanwhile, preserves accurate local shape information by using VoxelNet to exploit the corresponding 3D point cloud. Unlike the existing architecture that fuses score maps from different models with equal weights, we proposed a Softmax weighted fusion stack that adaptively learns the varying contributions of PixelNet and VoxelNet, and fuses the score maps of the two models according to their respective confidence levels. The proposed Pixel-Voxel network achieves the state-of-the-art semantic segmentation performance on the SUN RGB-D benchmark dataset. The runtime of the proposed system can be boosted to 11-12Hz, enabling near to real-time performance using an i7 8-cores PC with Titan X GPU.
This paper presents a novel 3DOF pedestrian trajectory prediction approach for autonomous mobile service robots. While most previously reported methods are based on learning of 2D positions in monocular camera images, our approach uses range-finder sensors to learn and predict 3DOF pose trajectories (i.e. 2D position plus 1D rotation within the world coordinate system). Our approach, T-Pose-LSTM (Temporal 3DOF-Pose Long-Short-Term Memory), is trained using long-term data from real-world robot deployments and aims to learn context-dependent (environment- and time-specific) human activities. Our approach incorporates long-term temporal information (i.e. date and time) with short-term pose observations as input. A sequence-to-sequence LSTM encoder-decoder is trained, which encodes observations into LSTM and then decodes as predictions. For deployment, it can perform on-the-fly prediction in real-time. Instead of using manually annotated data, we rely on a robust human detection, tracking and SLAM system, providing us with examples in a global coordinate system. We validate the approach using more than 15K pedestrian trajectories recorded in a care home environment over a period of three months. The experiment shows that the proposed T-Pose-LSTM model advances the state-of-the-art 2D-based method for human trajectory prediction in long-term mobile robot deployments.
Sep 19 2017 cs.NI
Most end devices are now equipped with multiple network interfaces. Applications can exploit all available interfaces and benefit from multipath transmission. Recently Multipath TCP (MPTCP) was proposed to implement multipath transmission at the transport layer and has attracted lots of attention from academia and industry. However, MPTCP only supports TCP-based applications and its multipath routing flexibility is limited. In this paper, we investigate the possibility of orchestrating multipath transmission from the network layer of end devices, and develop a Multipath IP (MPIP) design consisting of signaling, session and path management, multipath routing, and NAT traversal. We implement MPIP in Linux and Android kernels. Through controlled lab experiments and Internet experiments, we demonstrate that MPIP can effectively achieve multipath gains at the network layer. It not only supports the legacy TCP and UDP protocols, but also works seamlessly with MPTCP. By facilitating user-defined customized routing, MPIP can route traffic from competing applications in a coordinated fashion to maximize the aggregate user Quality-of-Experience.
Sep 06 2017 cs.SY
In this paper, we present a novel decentralized controller to drive multiple unmanned aerial vehicles (UAVs) into a symmetric formation of regular polygon shape surrounding a mobile target. The proposed controller works for time-varying information exchange topologies among agents and preserves a network connectivity while steering UAVs into a formation. The proposed nonlinear controller is highly generalized and offers flexibility in achieving the control objective due to the freedom of choosing controller parameters from a range of values. By the virtue of additional tuning parameters, i.e. fractional powers on proportional and derivative difference terms, the nonlinear controller procures a family of UAV trajectories satisfying the same control objective. An appropriate adjustment of the parameters facilitates in generating smooth UAV trajectories without causing abrupt position jumps. The convergence of the closed-loop system is analyzed and established using the Lyapunov approach. Simulation results validate the effectiveness of the proposed controller which outperforms an existing formation controller by driving a team of UAVs elegantly in a target-centric formation. We also present a nonlinear observer to estimate vehicle velocities with the availability of position coordinates and heading angles. Simulation results show that the proposed nonlinear observer results in quick convergence of the estimates to its true values.
Sep 04 2017 cs.NI
Data sponsoring is a widely-used incentive method in today's cellular networks, where video content providers (CPs) cover part or all of the cellular data cost for mobile users so as to attract more video users and increase data traffic. In the forthcoming 5G cellular networks, edge caching is emerging as a promising technique to deliver videos with lower cost and higher quality. The key idea is to cache video contents on edge networks (e.g., femtocells and WiFi access points) in advance and deliver the cached contents to local video users directly (without involving cellular data cost for users). In this work, we aim to study how the edge caching will affect the CP's data sponsoring strategy as well as the users' behaviors and the data market. Specifically, we consider a single CP who offers both the edge caching service and the data sponsoring service to a set of heterogeneous mobile video users (with different mobility and video request patterns). We formulate the interactions of the CP and the users as a two-stage Stackelberg game, where the CP (leader) determines the budgets (efforts) for both services in Stage I, and the users (followers) decide whether and which service(s) they would like to subscribe to. We analyze the sub-game perfect equilibrium (SPE) of the proposed game systematically. Our analysis and experimental results show that by introducing the edge caching, the CP can increase his revenue by 105%.
Aug 15 2017 cs.CV
Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term Memory (LSTM), are able to learn temporal motion dynamics. However, naively applying RNNs to video sequences in a convolutional manner implicitly assumes that motions in videos are stationary across different spatial locations. This assumption is valid for short-term motions but invalid when the duration of the motion is long. In this work, we propose Lattice-LSTM (L2STM), which extends LSTM by learning independent hidden state transitions of memory cells for individual spatial locations. This method effectively enhances the ability to model dynamics across time and addresses the non-stationary issue of long-term motion dynamics without significantly increasing the model complexity. Additionally, we introduce a novel multi-modal training procedure for training our network. Unlike traditional two-stream architectures which use RGB and optical flow information as input, our two-stream model leverages both modalities to jointly train both input gates and both forget gates in the network rather than treating the two streams as separate entities with no information about the other. We apply this end-to-end system to benchmark datasets (UCF-101 and HMDB-51) of human action recognition. Experiments show that on both datasets, our proposed method outperforms all existing ones that are based on LSTM and/or CNNs of similar model complexities.
This paper proposes a single-shot approach for recognising clothing categories from 2.5D features. We propose two visual features, BSP (B-Spline Patch) and TSD (Topology Spatial Distances) for this task. The local BSP features are encoded by LLC (Locality-constrained Linear Coding) and fused with three different global features. Our visual feature is robust to deformable shapes and our approach is able to recognise the category of unknown clothing in unconstrained and random configurations. We integrated the category recognition pipeline with a stereo vision system, clothing instance detection, and dual-arm manipulators to achieve an autonomous sorting system. To verify the performance of our proposed method, we build a high-resolution RGBD clothing dataset of 50 clothing items of 5 categories sampled in random configurations (a total of 2,100 clothing samples). Experimental results show that our approach is able to reach 83.2\% accuracy while classifying clothing items which were previously unseen during training. This advances beyond the previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach in an autonomous robot sorting system, in which the robot recognises a clothing item from an unconstrained pile, grasps it, and sorts it into a box according to its category. Our proposed sorting system achieves reasonable sorting success rates with single-shot perception.
For safe and efficient planning and control in autonomous driving, we need a driving policy which can achieve desirable driving quality in long-term horizon with guaranteed safety and feasibility. Optimization-based approaches, such as Model Predictive Control (MPC), can provide such optimal policies, but their computational complexity is generally unacceptable for real-time implementation. To address this problem, we propose a fast integrated planning and control framework that combines learning- and optimization-based approaches in a two-layer hierarchical structure. The first layer, defined as the "policy layer", is established by a neural network which learns the long-term optimal driving policy generated by MPC. The second layer, called the "execution layer", is a short-term optimization-based controller that tracks the reference trajecotries given by the "policy layer" with guaranteed short-term safety and feasibility. Moreover, with efficient and highly-representative features, a small-size neural network is sufficient in the "policy layer" to handle many complicated driving scenarios. This renders online imitation learning with Dataset Aggregation (DAgger) so that the performance of the "policy layer" can be improved rapidly and continuously online. Several exampled driving scenarios are demonstrated to verify the effectiveness and efficiency of the proposed framework.
Jun 23 2017 cs.CV
Fine-grained categorization can benefit from part-based features which reveal subtle visual differences between object categories. Handcrafted features have been widely used for part detection and classification. Although a recent trend seeks to learn such features automatically using powerful deep learning models such as convolutional neural networks (CNN), their training and possibly also testing require manually provided annotations which are costly to obtain. To relax these requirements, we assume in this study a general problem setting in which the raw images are only provided with object-level class labels for model training with no other side information needed. Specifically, by extracting and interpreting the hierarchical hidden layer features learned by a CNN, we propose an elaborate CNN-based system for fine-grained categorization. When evaluated on the Caltech-UCSD Birds-200-2011, FGVC-Aircraft, Cars and Stanford dogs datasets under the setting that only object-level class labels are used for training and no other annotations are available for both training and testing, our method achieves impressive performance that is superior or comparable to the state of the art. Moreover, it sheds some light on ingenious use of the hierarchical features learned by CNN which has wide applicability well beyond the current fine-grained categorization task.
May 18 2017 cs.CY
The city has proven to be the most successful form of human agglomeration and provides wide employment opportunities for its dwellers. As advances in robotics and artificial intelligence revive concerns about the impact of automation on jobs, a question looms: How will automation affect employment in cities? Here, we provide a comparative picture of the impact of automation across U.S. urban areas. Small cities will undertake greater adjustments, such as worker displacement and job content substitutions. We demonstrate that large cities exhibit increased occupational and skill specialization due to increased abundance of managerial and technical professions. These occupations are not easily automatable, and, thus, reduce the potential impact of automation in large cities. Our results pass several robustness checks including potential errors in the estimation of occupational automation and sub-sampling of occupations. Our study provides the first empirical law connecting two societal forces: urban agglomeration and automation's impact on employment.
Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to such a strict order significantly degrades system performance and persistent memory endurance. This paper introduces a new mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering requirements at significantly lower performance and endurance loss. LOC consists of two key techniques. First, Eager Commit eliminates the need to perform a persistent commit record write within a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the write ordering between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism supports the tracking of committed transaction ID and multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of memory persistence from 66.9% to 34.9% and the memory write traffic overhead from 17.1% to 3.4% on a variety of workloads.
Apr 11 2017 cs.LG
Multi-Label Classification toolbox is a MATLAB/OCTAVE library for Multi-Label Classification (MLC). There exists a few Java libraries for MLC, but no MATLAB/OCTAVE library that covers various methods. This toolbox offers an environment for evaluation, comparison and visualization of the MLC results. One attraction of this toolbox is that it enables us to try many combinations of feature space dimension reduction, sample clustering, label space dimension reduction and ensemble, etc.
Mar 21 2017 cs.NI
Crowdsourced mobile video streaming enables nearby mobile video users to aggregate network resources to improve their video streaming performances. However, users are often selfish and may not be willing to cooperate without proper incentives. Designing an incentive mechanism for such a scenario is challenging due to the users' asynchronous downloading behaviors and their private valuations for multi-bitrate coded videos. In this work, we propose both single-object and multi-object multi-dimensional auction mechanisms, through which users sell the opportunities for downloading single and multiple video segments with multiple bitrates, respectively. Both auction mechanisms can achieves truthfulness (i.e, truthful private information revelation) and efficiency (i.e., social welfare maximization). Simulations with real traces show that crowdsourced mobile streaming facilitated by the auction mechanisms outperforms noncooperative stream ing by 48.6% (on average) in terms of social welfare. To evaluate the real-world performance, we also construct a demo system for crowdsourced mobile streaming and implement our proposed auction mechanism. Experiments over the demo system further show that those users who provide resources to others and those users who receive helps can increase their welfares by 15.5% and 35.4% (on average) via cooperation, respectively.
Mar 21 2017 cs.CV
This paper addresses the problem of RGBD object recognition in real-world applications, where large amounts of annotated training data are typically unavailable. To overcome this problem, we propose a novel, weakly-supervised learning architecture (DCNN-GPC) which combines parametric models (a pair of Deep Convolutional Neural Networks (DCNN) for RGB and D modalities) with non-parametric models (Gaussian Process Classification). Our system is initially trained using a small amount of labeled data, and then automatically prop- agates labels to large-scale unlabeled data. We first run 3D- based objectness detection on RGBD videos to acquire many unlabeled object proposals, and then employ DCNN-GPC to label them. As a result, our multi-modal DCNN can be trained end-to-end using only a small amount of human annotation. Finally, our 3D-based objectness detection and multi-modal DCNN are integrated into a real-time detection and recognition pipeline. In our approach, bounding-box annotations are not required and boundary-aware detection is achieved. We also propose a novel way to pretrain a DCNN for the depth modality, by training on virtual depth images projected from CAD models. We pretrain our multi-modal DCNN on public 3D datasets, achieving performance comparable to state-of-the-art methods on Washington RGBS Dataset. We then finetune the network by further training on a small amount of annotated data from our novel dataset of industrial objects (nuclear waste simulants). Our weakly supervised approach has demonstrated to be highly effective in solving a novel RGBD object recognition application which lacks of human annotations.
Mar 16 2017 cs.CV
This paper addresses the problem of simultaneous 3D reconstruction and material recognition and segmentation. Enabling robots to recognise different materials (concrete, metal etc.) in a scene is important for many tasks, e.g. robotic interventions in nuclear decommissioning. Previous work on 3D semantic reconstruction has predominantly focused on recognition of everyday domestic objects (tables, chairs etc.), whereas previous work on material recognition has largely been confined to single 2D images without any 3D reconstruction. Meanwhile, most 3D semantic reconstruction methods rely on computationally expensive post-processing, using Fully-Connected Conditional Random Fields (CRFs), to achieve consistent segmentations. In contrast, we propose a deep learning method which performs 3D reconstruction while simultaneously recognising different types of materials and labelling them at the pixel level. Unlike previous methods, we propose a fully end-to-end approach, which does not require hand-crafted features or CRF post-processing. Instead, we use only learned features, and the CRF segmentation constraints are incorporated inside the fully end-to-end learned system. We present the results of experiments, in which we trained our system to perform real-time 3D semantic reconstruction for 23 different materials in a real-world application. The run-time performance of the system can be boosted to around 10Hz, using a conventional GPU, which is enough to achieve real-time semantic reconstruction using a 30fps RGB-D camera. To the best of our knowledge, this work is the first real-time end-to-end system for simultaneous 3D reconstruction and material recognition.
Mar 13 2017 cs.MM
The emergence of smart Wi-Fi APs (Access Point), which are equipped with huge storage space, opens a new research area on how to utilize these resources at the edge network to improve users' quality of experience (QoE) (e.g., a short startup delay and smooth playback). One important research interest in this area is content prefetching, which predicts and accurately fetches contents ahead of users' requests to shift the traffic away during peak periods. However, in practice, the different video watching patterns among users, and the varying network connection status lead to the time-varying server load, which eventually makes the content prefetching problem challenging. To understand this challenge, this paper first performs a large-scale measurement study on users' AP connection and TV series watching patterns using real-traces. Then, based on the obtained insights, we formulate the content prefetching problem as a Markov Decision Process (MDP). The objective is to strike a balance between the increased prefetching&storage cost incurred by incorrect prediction and the reduced content download delay because of successful prediction. A learning-based approach is proposed to solve this problem and another three algorithms are adopted as baselines. In particular, first, we investigate the performance lower bound by using a random algorithm, and the upper bound by using an ideal offline approach. Then, we present a heuristic algorithm as another baseline. Finally, we design a reinforcement learning algorithm that is more practical to work in the online manner. Through extensive trace-based experiments, we demonstrate the performance gain of our design. Remarkably, our learning-based algorithm achieves a better precision and hit ratio (e.g., 80%) with about 70% (resp. 50%) cost saving compared to the random (resp. heuristic) algorithm.
Jan 27 2017 cs.CV
Hallucinating high frequency image details in single image super-resolution is a challenging task. Traditional super-resolution methods tend to produce oversmoothed output images due to the ambiguity in mapping between low and high resolution patches. We build on recent success in deep learning based texture synthesis and show that this rich feature space can facilitate successful transfer and synthesis of high frequency image details to improve the visual quality of super-resolution results on a wide variety of natural textures and images.
Jan 02 2017 cs.CV
Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer. This is usually actualized through feedforward multilayer neural networks, e.g. ConvNets, where each layer forms one of such successive representations. However, an alternative that can achieve the same goal is a feedback based approach in which the representation is formed in an iterative manner based on a feedback received from previous iteration's output. We establish that a feedback based approach has several fundamental advantages over feedforward: it enables making early predictions at the query time, its output naturally conforms to a hierarchical structure in the label space (e.g. a taxonomy), and it provides a new basis for Curriculum Learning. We observe that feedback networks develop a considerably different representation compared to feedforward counterparts, in line with the aforementioned advantages. We put forth a general feedback based learning architecture with the endpoint results on par or better than existing feedforward networks with the addition of the above advantages. We also investigate several mechanisms in feedback architectures (e.g. skip connections in time) and design choices (e.g. feedback length). We hope this study offers new perspectives in quest for more natural and practical learning models.
Dec 28 2016 cs.CV
Nowadays the CNN is widely used in practical applications for image classification task. However the design of the CNN model is very professional work and which is very difficult for ordinary users. Besides, even for experts of CNN, to select an optimal model for specific task may still need a lot of time (to train many different models). In order to solve this problem, we proposed an automated CNN recommendation system for image classification task. Our system is able to evaluate the complexity of the classification task and the classification ability of the CNN model precisely. By using the evaluation results, the system can recommend the optimal CNN model and which can match the task perfectly. The recommendation process of the system is very fast since we don't need any model training. The experiment results proved that the evaluation methods are very accurate and reliable.
Nov 29 2016 cs.CL
Unsupervised models of dependency parsing typically require large amounts of clean, unlabeled data plus gold-standard part-of-speech tags. Adding indirect supervision (e.g. language universals and rules) can help, but we show that obtaining small amounts of direct supervision - here, partial dependency annotations - provides a strong balance between zero and full supervision. We adapt the unsupervised ConvexMST dependency parser to learn from partial dependencies expressed in the Graph Fragment Language. With less than 24 hours of total annotation, we obtain 7% and 17% absolute improvement in unlabeled dependency scores for English and Spanish, respectively, compared to the same parser using only universal grammar constraints.
Nov 02 2016 cs.NI
In this work, we study the joint optimization of edge caching and data sponsoring for a video content provider (CP), aiming at reducing the content delivery cost and increasing the CP's revenue. Specifically, we formulate the joint optimization problem as a two-stage decision problem for the CP. In Stage I, the CP determines the edge caching policy (for a relatively long time period). In Stage II, the CP decides the real-time data sponsoring strategy for each content request within the period. We first propose a Lyapunov-based online sponsoring strategy in Stage II, which reaches 90% of the offline maximum performance (benchmark). We then solve the edge caching problem in Stage I based on the online sponsoring strategy proposed in Stage II, and show that the optimal caching policy depends on the aggregate user request for each content in each location. Simulations show that such a joint optimization can increase the CP's revenue by 30%~100%, comparing with the purely data sponsoring (i.e., without edge caching).
This paper presents a novel robot vision architecture for perceiving generic 3D clothes configurations. Our architecture is hierarchically structured, starting from low-level curvatures, across mid-level geometric shapes \& topology descriptions; and finally approaching high-level semantic surface structure descriptions. We demonstrate our robot vision architecture in a customised dual-arm industrial robot with our self-designed, off-the-self stereo vision system, carrying out autonomous grasping and dual-arm flattening. It is worth noting that the proposed dual-arm flattening approach is unique among the state-of-the-art robot autonomous system, which is the major contribution of this paper. The experimental results show that the proposed dual-arm flattening using stereo vision system remarkably outperforms the single-arm flattening and widely-cited Kinect-based sensing system for dexterous manipulation tasks. In addition, the proposed grasping approach achieves satisfactory performance on grasping various kind of garments, verifying the capability of proposed visual perception architecture to be adapted to more than one clothing manipulation tasks.
Oct 04 2016 cs.DS
We present a polynomial-sized linear program (LP) for the n-city TSP drawing upon "complex flow" modeling ideas by the first two authors who used an O(n^9)xO(n^8) model*. Here we have only O(n^5) variables and O(n^4) constraints. We do not model explicit cycles of the cities, and our modeling does not involve the city-to-city variables-based, traditional TSP polytope referred to in the literature as "The TSP Polytope." Optimal TSP objective value and tours are achieved by solving our proposed LP. In the case of a unique optimum, the integral solution representing the optimal tour is obtained using any LP solver (solution algorithm). In the case of alternate optima, an LP solver (e.g., an interior-point solver) may stop with a fractional (interior-point) solution, which (we prove) is a convex combination of alternate optimal TSP tours. In such cases, one of the optimal tours can be trivially retrieved from the solution using a simple iterative elimination procedure we propose. We have solved over a million problems with up to 27 cities using the barrier methods of CPLEX, consistently obtaining all integer solutions. Since LP is solvable in polynomial time and we have a model which is of polynomial size in n, the paper is thus offering (although, incidentally) a proof of the equality of the computational complexity classes "P" and "NP". The non-applicability and numerical refutations of existing extended formulations results (such as Braun et al. (2015) or Fiorini et al. (2015) in particular) are briefly discussed in an appendix. [*: Advances in Combinatorial Optimization: Linear Programming Formulation of the Traveling Salesman and Other Hard Combinatorial Optimization Problems (World Scientific, January 2016).]
Sep 22 2016 cs.CR
Anomalous user behavior detection is the core component of many information security systems, such as intrusion detection, insider threat detection and authentication systems. Anomalous behavior will raise an alarm to the system administrator and can be further combined with other information to determine whether it constitutes an unauthorised or malicious use of a resource. This paper presents an anomalous user behaviour detection framework that applies an extended version of Isolation Forest algorithm. Our method is fast and scalable and does not require example anomalies in the training data set. We apply our method to an enterprise dataset. The experimental results show that the system is able to isolate anomalous instances from the baseline user model using a single feature or combined features.
Jul 21 2016 cs.AI
Uncertain data streams have been widely generated in many Web applications. The uncertainty in data streams makes anomaly detection from sensor data streams far more challenging. In this paper, we present a novel framework that supports anomaly detection in uncertain data streams. The proposed framework adopts an efficient uncertainty pre-processing procedure to identify and eliminate uncertainties in data streams. Based on the corrected data streams, we develop effective period pattern recognition and feature extraction techniques to improve the computational efficiency. We use classification methods for anomaly detection in the corrected data stream. We also empirically show that the proposed approach shows a high accuracy of anomaly detection on a number of real datasets.
Popularly used to distribute a variety of multimedia content items in today Internet, HTTP-based web content delivery still suffers from various content delivery failures. Hindered by the expensive deployment cost, the conventional CDN can not deploy as many edge servers as possible to successfully deliver content items to all users under these delivery failures. In this paper, we propose a joint CDN and peer-assisted web content delivery framework to address the delivery failure problem. Different from conventional peer-assisted approaches for web content delivery, which mainly focus on alleviating the CDN servers bandwidth load, we study how to use a browser-based peer-assisted scheme, namely WebRTC, to resolve content delivery failures. To this end, we carry out large-scale measurement studies on how users access and view webpages. Our measurement results demonstrate the challenges (e.g., peers stay on a webpage extremely short) that can not be directly solved by conventional P2P strategies, and some important webpage viewing patterns. Due to these unique characteristics, WebRTC peers open up new possibilities for helping the web content delivery, coming with the problem of how to utilize the dynamic resources efficiently. We formulate the peer selection that is the critical strategy in our framework, as an optimization problem, and design a heuristic algorithm based on the measurement insights to solve it. Our simulation experiments driven by the traces from Tencent QZone demonstrate the effectiveness of our design: compared with non-peer-assisted strategy and random peer selection strategy, our design significantly improves the successful relay ratio of web content items under network failures, e.g., our design improves the content download ratio up to 60% even when users located in a particular region (e.g., city) where none can connect to the regional CDN server.
Dynamic Adaptive Streaming over HTTP (DASH) has emerged as an increasingly popular paradigm for video streaming , in which a video is segmented into many chunks delivered to users by HTTP request/response over Transmission Control Protocol (TCP) con- nections. Therefore, it is intriguing to study the performance of strategies implemented in conventional TCPs, which are not dedicated for video streaming, e.g., whether chunks are efficiently delivered when users per- form interactions with the video players. In this paper, we conduct mea- surement studies on users chunk requesting traces in DASH from a rep- resentative video streaming provider, to investigate users behaviors in DASH, and TCP-connection-level traces from CDN servers, to investi- gate the performance of TCP for DASH. By studying how video chunks are delivered in both the slow start and congestion avoidance phases, our observations have revealed the performance characteristics of TCP for DASH as follows: (1) Request patterns in DASH have a great impact on the performance of TCP variations including cubic; (2) Strategies in conventional TCPs may cause user perceived quality degradation in DASH streaming; (3) Potential improvement to TCP strategies for better delivery in DASH can be further explored.
Jul 06 2016 cs.MM
Multihoming for a video Content Delivery Network (CDN) allows edge peering servers to deliver video chunks through different Internet Service Providers (ISPs), to achieve an improved quality of service (QoS) for video streaming users. However, since traditional strategies for a multihoming video CDN are simply designed according to static rules, e.g., simply sending traffic via a ISP which is the same as the ISP of client, they fail to dynamically allocate resources among different ISPs over time. In this paper, we perform measurement studies to demonstrate that such static allocation mechanism is inefficient to make full utilization of multiple ISPs' resources. To address this problem, we propose a dynamic flow scheduling strategy for multihoming video CDN. The challenge is to find the control parameters that can guide the ISP selection when performing flow scheduling. Using a data-driven approach, we find factors that have a major impact on the performance improvement in the dynamic flow scheduling. We further utilize an information gain approach to generate parameter combinations that can be used to guide the flow scheduling, i.e., to determine the ISP each request should be responded by. Our evaluation results demonstrate that our design effectively performs the flow scheduling. In particular, our design yields near optimal performance in a simulation of real-world multihoming setup.
Mobile online social network services have seen a rapid increase, in which the huge amount of user-generated social media contents propagating between users via social connections has significantly challenged the traditional content delivery paradigm: First, replicating all of the contents generated by users to edge servers that well "fit" the receivers becomes difficult due to the limited bandwidth and storage capacities. Motivated by device-to-device (D2D) communication that allows users with smart devices to transfer content directly, we propose replicating bandwidth-intensive social contents in a device-to-device manner. Based on large-scale measurement studies on social content propagation and user mobility patterns in edge-network regions, we observe that (1) Device-to-device replication can significantly help users download social contents from nearby neighboring peers; (2) Both social propagation and mobility patterns affect how contents should be replicated; (3) The replication strategies depend on regional characteristics (\em e.g., how users move across regions). Using these measurement insights, we propose a joint \emphpropagation- and mobility-aware content replication strategy for edge-network regions, in which social contents are assigned to users in edge-network regions according to a joint consideration of social graph, content propagation and user mobility. We formulate the replication scheduling as an optimization problem and design distributed algorithm only using historical, local and partial information to solve it. Trace-driven experiments further verify the superiority of our proposal: compared with conventional pure movement-based and popularity-based approach, our design can significantly ($2-4$ times) improve the amount of social contents successfully delivered by device-to-device replication.
Recent years have witnessed a new video delivery paradigm: smartrouter-based peer video content delivery network, which is enabled by smartrouters deployed at users' homes. ChinaCache (one of the largest CDN providers in China) and Youku (a video provider using smartrouters to assist video delivery) announced their cooperation in 2015, to create a new paradigm of content delivery based on householders' network resources. This new paradigm is different from the conventional peer-to-peer (P2P) approach, because millions of dedicated smartrouters are operated by the centralized video service providers in a coordinative manner. Thus it is intriguing to study the content placement strategies used in a smartrouter-based content delivery system, as well as its potential impact on the content delivery ecosystem. In this paper, we carry out measurement studies of Youku's peer video CDN, who has deployed over 300K smartrouter devices for its video delivery. In our measurement studies, 104K videos were investigated and 4TB traffic has been analyzed, over controlled smartrouter nodes and players. Our measurement insights are as follows. First, a global content replication strategy is essential for the peer CDN systems. Second, such peer CDN deployment itself can form an effective sub-system for end-to-end QoS monitoring, which can be used for fine-grained request redirection (e.g., user-level) and content replication. We also show our analysis on the performance limitations and propose potential improvements to the peer CDN systems.
Recent years have witnessed a new video delivery paradigm: smartrouter-based video delivery network, which is enabled by smartrouters deployed at users' homes, together with the conventional video servers deployed in the datacenters. Recently, ChinaCache, a large content delivery network (CDN) provider, and Youku, a video service provider using smartrouters to assist video delivery, announced their cooperation to create a new paradigm of content delivery based on householders' network resources. This new paradigm is different from the conventional peer-to-peer (P2P) approach, because such dedicated smartrouters are inherently operated by the centralized video service providers in a coordinative manner. It is intriguing to study the strategies, performance and potential impact on the content delivery ecosystem of such peer CDN systems. In this paper, we study the Youku peer CDN, which has deployed over 300K smartrouter devices for its video streaming. In our measurement, 78K videos were investigated and 3TB traffic has been analyzed, over controlled routers and players. Our contributions are the following measurement insights. First, a global replication and caching strategy is essential for the peer CDN systems, and proactively scheduling replication and caching on a daily basis can guarantee their performance. Second, such peer CDN deployment can itself form an effective Quality of Service (QoS) monitoring sub-system, which can be used for fine-grained user request redirection. We also provide our analysis on the performance issues and potential improvements to the peer CDN systems.
We provide a numerical refutation of the developments of Fiorini et al. (2015)* for models with disjoint sets of descriptive variables. We also provide an insight into the meaning of the existence of a one-to-one linear map between solutions of such models. *: Fiorini, S., S. Massar, S. Pokutta, H.R. Tiwary, and R. de Wolf (2015). Exponential Lower Bounds for Polytopes in Combinatorial Optimization. Journal of the ACM 62:2, Article No. 17.
Mar 28 2016 cs.CV
Face sketch synthesis has wide applications ranging from digital entertainments to law enforcements. Objective image quality assessment scores and face recognition accuracy are two mainly used tools to evaluate the synthesis performance. In this paper, we proposed a synthesized face sketch recognition framework based on full-reference image quality assessment metrics. Synthesized sketches generated from four state-of-the-art methods are utilized to test the performance of the proposed recognition framework. For the image quality assessment metrics, we employed the classical structured similarity index metric and other three prevalent metrics: visual information fidelity, feature similarity index metric and gradient magnitude similarity deviation. Extensive experiments compared with baseline methods illustrate the effectiveness of the proposed synthesized face sketch recognition framework. Data and implementation code in this paper are available online at www.ihitworld.com/WNN/IQA_Sketch.zip.
Feb 05 2016 cs.NI
With smart devices, particular smartphones, becoming our everyday companions, the ubiquitous mobile Internet and computing applications pervade people's daily lives. With the surge demand on high-quality mobile services at anywhere, how to address the ubiquitous user demand and accommodate the explosive growth of mobile traffics is the key issue of the next generation mobile networks. The Fog computing is a promising solution towards this goal. Fog computing extends cloud computing by providing virtualized resources and engaged location-based services to the edge of the mobile networks so as to better serve mobile traffics. Therefore, Fog computing is a lubricant of the combination of cloud computing and mobile applications. In this article, we outline the main features of Fog computing and describe its concept, architecture and design goals. Lastly, we discuss some of the future research issues from the networking perspective.
Oct 05 2015 cs.CV
Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks (CNN) for image classification, recent attempts have been made to learn 3D CNNs for recognizing human actions in videos. However, partly due to the high complexity of training 3D convolution kernels and the need for large quantities of training videos, only limited success has been reported. This has triggered us to investigate in this paper a new deep architecture which can handle 3D signals more effectively. Specifically, we propose factorized spatio-temporal convolutional networks (FstCN) that factorize the original 3D convolution kernel learning as a sequential process of learning 2D spatial kernels in the lower layers (called spatial convolutional layers), followed by learning 1D temporal kernels in the upper layers (called temporal convolutional layers). We introduce a novel transformation and permutation operator to make factorization in FstCN possible. Moreover, to address the issue of sequence alignment, we propose an effective training and inference strategy based on sampling multiple video clips from a given action video sequence. We have tested FstCN on two commonly used benchmark datasets (UCF-101 and HMDB-51). Without using auxiliary training videos to boost the performance, FstCN outperforms existing CNN based methods and achieves comparable performance with a recent method that benefits from using auxiliary training videos.
Custom optics is a necessity for many imaging applications. Unfortunately, custom lens design is costly (thousands to tens of thousands of dollars), time consuming (10-12 weeks typical lead time), and requires specialized optics design expertise. By using only inexpensive, off-the-shelf lens components the Lens Factory automatic design system greatly reduces cost and time. Design, ordering of parts, delivery, and assembly can be completed in a few days, at a cost in the low hundreds of dollars. Lens design constraints, such as focal length and field of view, are specified in terms familiar to the graphics community so no optics expertise is necessary. Unlike conventional lens design systems, which only use continuous optimization methods, Lens Factory adds a discrete optimization stage. This stage searches the combinatorial space of possible combinations of lens elements to find novel designs, evolving simple canonical lens designs into more complex, better designs. Intelligent pruning rules make the combinatorial search feasible. We have designed and built several high performance optical systems which demonstrate the practicality of the system.
Online microblogging services that have been increasingly used by people to share and exchange information, have emerged as a promising way to profiling multimedia contents, in a sense to provide users a socialized abstraction and understanding of these contents. In this paper, we propose a microblogging profiling framework, to provide a social demonstration of TV shows. Challenges for this study lie in two folds: First, TV shows are generally offline, i.e., most of them are not originally from the Internet, and we need to create a connection between these TV shows with online microblogging services; Second, contents in a microblogging service are extremely noisy for video profiling, and we need to strategically retrieve the most related information for the TV show profiling.To address these challenges, we propose a MAP, a microblogging-assisted profiling framework, with contributions as follows: i) We propose a joint user and content retrieval scheme, which uses information about both actors and topics of a TV show to retrieve related microblogs; ii) We propose a social-aware profiling strategy, which profiles a video according to not only its content, but also the social relationship of its microblogging users and its propagation in the social network; iii) We present some interesting analysis, based on our framework to profile real-world TV shows.
Feb 10 2015 cs.NI
Instant social video sharing which combines the online social network and user-generated short video streaming services, has become popular in today's Internet. Cloud-based hosting of such instant social video contents has become a norm to serve the increasing users with user-generated contents. A fundamental problem of cloud-based social video sharing service is that users are located globally, who cannot be served with good service quality with a single cloud provider. In this paper, we investigate the feasibility of dispersing instant social video contents to multiple cloud providers. The challenge is that inter-cloud social \emphpropagation is indispensable with such multi-cloud social video hosting, yet such inter-cloud traffic incurs substantial operational cost. We analyze and formulate the multi-cloud hosting of an instant social video system as an optimization problem. We conduct large-scale measurement studies to show the characteristics of instant social video deployment, and demonstrate the trade-off between satisfying users with their ideal cloud providers, and reducing the inter-cloud data propagation. Our measurement insights of the social propagation allow us to propose a heuristic algorithm with acceptable complexity to solve the optimization problem, by partitioning a propagation-weighted social graph in two phases: a preference-aware initial cloud provider selection and a propagation-aware re-hosting. Our simulation experiments driven by real-world social network traces show the superiority of our design.
Feb 09 2015 cs.NI
With smart devices, particular smartphones, becoming our everyday companions, the ubiquitous mobile Internet and computing applications pervade people daily lives. With the surge demand on high-quality mobile services at anywhere, how to address the ubiquitous user demand and accommodate the explosive growth of mobile traffics is the key issue of the next generation mobile networks. The Fog computing is a promising solution towards this goal. Fog computing extends cloud computing by providing virtualized resources and engaged location-based services to the edge of the mobile networks so as to better serve mobile traffics. Therefore, Fog computing is a lubricant of the combination of cloud computing and mobile applications. In this article, we outline the main features of Fog computing and describe its concept, architecture and design goals. Lastly, we discuss some of the future research issues from the networking perspective.
This paper considers the implementation of Tomlinson-Harashima (TH) precoding for multiuser MIMO systems based on quantized channel state information (CSI) at the transmitter side. Compared with the results in , our scheme applies to more general system setting where the number of users in the system can be less than or equal to the number of transmit antennas. We also study the achievable average sum rate of the proposed quantized CSI-based TH precoding scheme. The expressions of the upper bounds on both the average sum rate of the systems with quantized CSI and the mean loss in average sum rate due to CSI quantization are derived. We also present some numerical results. The results show that the nonlinear TH precoding can achieve much better performance than that of linear zero-forcing precoding for both perfect CSI and quantized CSI cases. In addition, our derived upper bound on the mean rate loss for TH precoding converges to the true rate loss faster than that of zeroforcing precoding obtained in  as the number of feedback bits becomes large. Both the analytical and numerical results show that nonlinear precoding suffers from imperfect CSI more than linear precoding does.
This paper studies the sum rate performance of a low complexity quantized CSI-based Tomlinson-Harashima (TH) precoding scheme for downlink multiuser MIMO tansmission, employing greedy user selection. The asymptotic distribution of the output signal to interference plus noise ratio of each selected user and the asymptotic sum rate as the number of users K grows large are derived by using extreme value theory. For fixed finite signal to noise ratios and a finite number of transmit antennas $n_T$, we prove that as K grows large, the proposed approach can achieve the optimal sum rate scaling of the MIMO broadcast channel. We also prove that, if we ignore the precoding loss, the average sum rate of this approach converges to the average sum capacity of the MIMO broadcast channel. Our results provide insights into the effect of multiuser interference caused by quantized CSI on the multiuser diversity gain.
Understanding the long-term impact that changes in a city's transportation infrastructure have on its spatial interactions remains a challenge. The difficulty arises from the fact that the real impact may not be revealed in static or aggregated mobility measures, as these are remarkably robust to perturbations. More generally, the lack of longitudinal, cross-sectional data demonstrating the evolution of spatial interactions at a meaningful urban scale also hinders us from evaluating the sensitivity of movement indicators, limiting our capacity to understand the evolution of urban mobility in depth. Using very large mobility records distributed over three years we quantify the impact of the completion of a metro line extension: the circle line (CCL) in Singapore. We find that the commonly used movement indicators are almost identical before and after the project was completed. However, in comparing the temporal community structure across years, we do observe significant differences in the spatial reorganization of the affected geographical areas. The completion of CCL enables travelers to re-identify their desired destinations collectively with lower transport cost, making the community structure more consistent. These changes in locality are dynamic, and characterized over short time-scales, offering us a different approach to identify and analyze the long-term impact of new infrastructures on cities and their evolution dynamics.
Mar 04 2014 cs.CV
Human beings process stereoscopic correspondence across multiple scales. However, this bio-inspiration is ignored by state-of-the-art cost aggregation methods for dense stereo correspondence. In this paper, a generic cross-scale cost aggregation framework is proposed to allow multi-scale interaction in cost aggregation. We firstly reformulate cost aggregation from a unified optimization perspective and show that different cost aggregation methods essentially differ in the choices of similarity kernels. Then, an inter-scale regularizer is introduced into optimization and solving this new optimization problem leads to the proposed framework. Since the regularization term is independent of the similarity kernel, various cost aggregation methods can be integrated into the proposed general framework. We show that the cross-scale framework is important as it effectively and efficiently expands state-of-the-art cost aggregation methods and leads to significant improvements, when evaluated on Middlebury, KITTI and New Tsukuba datasets.
Feb 11 2014 cs.CV
In this paper, we propose a novel binary-based cost computation and aggregation approach for stereo matching problem. The cost volume is constructed through bitwise operations on a series of binary strings. Then this approach is combined with traditional winner-take-all strategy, resulting in a new local stereo matching algorithm called binary stereo matching (BSM). Since core algorithm of BSM is based on binary and integer computations, it has a higher computational efficiency than previous methods. Experimental results on Middlebury benchmark show that BSM has comparable performance with state-of-the-art local stereo methods in terms of both quality and speed. Furthermore, experiments on images with radiometric differences demonstrate that BSM is more robust than previous methods under these changes, which is common under real illumination.
Physical contact remains difficult to trace in large metropolitan networks, though it is a key vehicle for the transmission of contagious outbreaks. Co-presence encounters during daily transit use provide us with a city-scale time-resolved physical contact network, consisting of 1 billion contacts among 3 million transit users. Here, we study the advantage that knowledge of such co-presence structures may provide for early detection of contagious outbreaks. We first examine the "friend sensor" scheme --- a simple, but universal strategy requiring only local information --- and demonstrate that it provides significant early detection of simulated outbreaks. Taking advantage of the full network structure, we then identify advanced "global sensor sets", obtaining substantial early warning times savings over the friends sensor scheme. Individuals with highest number of encounters are the most efficient sensors, with performance comparable to individuals with the highest travel frequency, exploratory behavior and structural centrality. An efficiency balance emerges when testing the dependency on sensor size and evaluating sensor reliability; we find that substantial and reliable lead-time could be attained by monitoring only 0.01% of the population with the highest degree.
Electron tomography usually suffers from so called missing wedge artifacts caused by limited tilt angle range. An equally sloped tomography (EST) acquisition scheme (which should be called the linogram sampling scheme) was recently applied to achieve 2.4-angstrom resolution. On the other hand, a compressive sensing-inspired reconstruction algorithm, known as adaptive dictionary based statistical iterative reconstruction (ADSIR), has been reported for x-ray computed tomography. In this paper, we evaluate the EST, ADSIR and an ordered-subset simultaneous algebraic reconstruction technique (OS-SART), and compare the ES and equally angled (EA) data acquisition modes. Our results show that OS-SART is comparable to EST, and the ADSIR outperforms EST and OS-SART. Furthermore, the equally sloped projection data acquisition mode has no advantage over the conventional equally angled mode in the context.
Understanding of the mechanisms driving our daily face-to-face encounters is still limited; the field lacks large-scale datasets describing both individual behaviors and their collective interactions. However, here, with the help of travel smart card data, we uncover such encounter mechanisms and structures by constructing a time-resolved in-vehicle social encounter network on public buses in a city (about 5 million residents). This is the first time that such a large network of encounters has been identified and analyzed. Using a population scale dataset, we find physical encounters display reproducible temporal patterns, indicating that repeated encounters are regular and identical. On an individual scale, we find that collective regularities dominate distinct encounters' bounded nature. An individual's encounter capability is rooted in his/her daily behavioral regularity, explaining the emergence of "familiar strangers" in daily life. Strikingly, we find individuals with repeated encounters are not grouped into small communities, but become strongly connected over time, resulting in a large, but imperceptible, small-world contact network or "structure of co-presence" across the whole metropolitan area. Revealing the encounter pattern and identifying this large-scale contact network are crucial to understanding the dynamics in patterns of social acquaintances, collective human behaviors, and -- particularly -- disclosing the impact of human behavior on various diffusion/spreading processes.
This paper studies the sum rate performance of two low complexity eigenmode-based transmission techniques for the MIMO broadcast channel, employing greedy semi-orthogonal user selection (SUS). The first approach, termed ZFDPC-SUS, is based on zero-forcing dirty paper coding; the second approach, termed ZFBF-SUS, is based on zero-forcing beamforming. We first employ new analytical methods to prove that as the number of users K grows large, the ZFDPC-SUS approach can achieve the optimal sum rate scaling of the MIMO broadcast channel. We also prove that the average sum rates of both techniques converge to the average sum capacity of the MIMO broadcast channel for large K. In addition to the asymptotic analysis, we investigate the sum rates achieved by ZFDPC-SUS and ZFBF-SUS for finite K, and show that ZFDPC-SUS has significant performance advantages. Our results also provide key insights into the benefit of multiple receive antennas, and the effect of the SUS algorithm. In particular, we show that whilst multiple receive antennas only improves the asymptotic sum rate scaling via the second-order behavior of the multi-user diversity gain; for finite K, the benefit can be very significant. We also show the interesting result that the semi-orthogonality constraint imposed by SUS, whilst facilitating a very low complexity user selection procedure, asymptotically does not reduce the multi-user diversity gain in either first (log K) or second-order (loglog K) terms.