results for au:Wang_L in:cs

- Jun 27 2017 cs.CV arXiv:1706.07966v1Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like ${3\times3}$, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN.
- Jun 21 2017 cs.CV arXiv:1706.06266v1Multi-frame image super-resolution (SR) aims to fuse information in low-resolution (LR) image sequence to compose a high-resolution (HR) one, which is applied in many areas. As an ill-posed problem, SR process suffers from annoying artifacts and remains difficult to balance between the smoothness and the edge preserving. Markov random field (MRF), as a superior method to depict the local characteristics in the image, is applied in the SR process of this paper. This paper firstly proposes an end-to-end SR strategy, which means the corrections in HR space can be derived directly from the reconstruction errors in LR space with no need of projecting into HR space, evidently decreasing the computa-tion amount. Secondly a newly designed MRF-based regu-larization is proposed to realize simultaneous smoothness and edge preserving. The extensive experimental results demon-strate the superior performance of the proposed method.
- With the rapid growth of social media, massive misinformation is also spreading widely on social media, such as microblog, and bring negative effects to human life. Nowadays, automatic misinformation identification has drawn attention from academic and industrial communities. For an event on social media usually consists of multiple microblogs, current methods are mainly based on global statistical features. However, information on social media is full of noisy and outliers, which should be alleviated. Moreover, most of microblogs about an event have little contribution to the identification of misinformation, where useful information can be easily overwhelmed by useless information. Thus, it is important to mine significant microblogs for a reliable misinformation identification method. In this paper, we propose an Attention-based approach for Identification of Misinformation (AIM). Based on the attention mechanism, AIM can select microblogs with largest attention values for misinformation identification. The attention mechanism in AIM contains two parts: content attention and dynamic attention. Content attention is calculated based textual features of each microblog. Dynamic attention is related to the time interval between the posting time of a microblog and the beginning of the event. To evaluate AIM, we conduct a series of experiments on the Weibo dataset and the Twitter dataset, and the experimental results show that the proposed AIM model outperforms the state-of-the-art methods.
- A hybrid observer is described for estimating the state of an $m>0$ channel, $n$-dimensional, continuous-time, distributed linear system of the form $\dot{x} = Ax,\;y_i = C_ix,\;i\in\{1,2,\ldots, m\}$. The system's state $x$ is simultaneously estimated by $m$ agents assuming each agent $i$ senses $y_i$ and receives appropriately defined data from each of its current neighbors. Neighbor relations are characterized by a time-varying directed graph $\mathbb{N}(t)$ whose vertices correspond to agents and whose arcs depict neighbor relations. Agent $i$ updates its estimate $x_i$ of $x$ at "event times" $t_1,t_2,\ldots $ using a local observer and a local parameter estimator. The local observer is a continuous time linear system whose input is $y_i$ and whose output $w_i$ is an asymptotically correct estimate of $L_ix$ where $L_i$ a matrix with kernel equaling the unobservable space of $(C_i,A)$. The local parameter estimator is a recursive algorithm designed to estimate, prior to each event time $t_j$, a constant parameter $p_j$ which satisfies the linear equations $w_k(t_j-\tau) = L_kp_j+\mu_k(t_j-\tau),\;k\in\{1,2,\ldots,m\}$, where $\tau$ is a small positive constant and $\mu_k$ is the state estimation error of local observer $k$. Agent $i$ accomplishes this by iterating its parameter estimator state $z_i$, $q$ times within the interval $[t_j-\tau, t_j)$, and by making use of the state of each of its neighbors' parameter estimators at each iteration. The updated value of $x_i$ at event time $t_j$ is then $x_i(t_j) = e^{A\tau}z_i(q)$. Subject to the assumptions that (i) the neighbor graph $\mathbb{N}(t)$ is strongly connected for all time, (ii) the system whose state is to be estimated is jointly observable, (iii) $q$ is sufficiently large, it is shown that each estimate $x_i$ converges to $x$ exponentially fast as $t\rightarrow \infty$ at a rate which can be controlled.
- Jun 15 2017 cs.CV arXiv:1706.04303v2Early detection of pulmonary cancer is the most promising way to enhance a patient's chance for survival. Accurate pulmonary nodule detection in computed tomography (CT) images is a crucial step in diagnosing pulmonary cancer. In this paper, inspired by the successful use of deep convolutional neural networks (DCNNs) in natural image recognition, we propose a novel pulmonary nodule detection approach based on DCNNs. We first introduce a deconvolutional structure to Faster Region-based Convolutional Neural Network (Faster R-CNN) for candidate detection on axial slices. Then, a three-dimensional DCNN is presented for the subsequent false positive reduction. Experimental results of the LUng Nodule Analysis 2016 (LUNA16) Challenge demonstrate the superior detection performance of the proposed approach on nodule detection(average FROC-score of 0.891, ranking the 1st place over all submitted results).
- Non-interactive Local Differential Privacy (LDP) requires data analysts to collect data from users through noisy channel at once. In this paper, we extend the frontiers of Non-interactive LDP learning and estimation from several aspects. For learning with smooth generalized linear losses, we propose an approximate stochastic gradient oracle estimated from non-interactive LDP channel, using Chebyshev expansion. Combined with inexact gradient methods, we obtain an efficient algorithm with quasi-polynomial sample complexity bound. For the high-dimensional world, we discover that under $\ell_2$-norm assumption on data points, high-dimensional sparse linear regression and mean estimation can be achieved with logarithmic dependence on dimension, using random projection and approximate recovery. We also extend our methods to Kernel Ridge Regression. Our work is the first one that makes learning and estimation possible for a broad range of learning tasks under non-interactive LDP model.
- Jun 07 2017 cs.SI physics.soc-ph arXiv:1706.01779v1To explore large-scale population indoor interactions, we analyze 18,715 users' WiFi access logs recorded in a Chinese university campus during 3 months, and define two categories of human interactions, the event interaction (EI) and the temporal interaction (TI). The EI helps construct a transmission graph, and the TI helps build an interval graph. The dynamics of EIs show that their active durations are truncated power-law distributed, which is independent on the number of involved individuals. The transmission duration presents a truncated power-law behavior at the daily timescale with weekly periodicity. Besides, those `leaf' individuals in the aggregated contact network may participate in the `super-connecting cliques' in the aggregated transmission graph. Analyzing the dynamics of the interval graph, we find that the probability distribution of TIs' inter-event duration also displays a truncated power-law pattern at the daily timescale with weekly periodicity, while the pairwise individuals with burst interactions are prone to randomly select their interactive locations, and those individuals with periodic interactions have preferred interactive locations.
- Jun 07 2017 cs.CV arXiv:1706.01644v1Volume calculation from the Computed Tomography (CT) lung lesions data is a significant parameter for clinical diagnosis. The volume is widely used to assess the severity of the lung nodules and track its progression, however, the accuracy and efficiency of previous studies are not well achieved for clinical uses. It remains to be a challenging task due to its tight attachment to the lung wall, inhomogeneous background noises and large variations in sizes and shape. In this paper, we employ Halton low-discrepancy sequences to calculate the volume of the lung lesions. The proposed method directly compute the volume without the procedure of three-dimension (3D) model reconstruction and surface triangulation, which significantly improves the efficiency and reduces the complexity. The main steps of the proposed method are: (1) generate a certain number of random points in each slice using Halton low-discrepancy sequences and calculate the lesion area of each slice through the proportion; (2) obtain the volume by integrating the areas in the sagittal direction. In order to evaluate our proposed method, the experiments were conducted on the sufficient data sets with different size of lung lesions. With the uniform distribution of random points, our proposed method achieves more accurate results compared with other methods, which demonstrates the robustness and accuracy for the volume calculation of CT lung lesions. In addition, our proposed method is easy to follow and can be extensively applied to other applications, e.g., volume calculation of liver tumor, atrial wall aneurysm, etc.
- May 30 2017 cs.CV arXiv:1705.09888v1Sketch-based image retrieval (SBIR) is challenging due to the inherent domain-gap between sketch and photo. Compared with pixel-perfect depictions of photos, sketches are iconic renderings of the real world with highly abstract. Therefore, matching sketch and photo directly using low-level visual clues are unsufficient, since a common low-level subspace that traverses semantically across the two modalities is non-trivial to establish. Most existing SBIR studies do not directly tackle this cross-modal problem. This naturally motivates us to explore the effectiveness of cross-modal retrieval methods in SBIR, which have been applied in the image-text matching successfully. In this paper, we introduce and compare a series of state-of-the-art cross-modal subspace learning methods and benchmark them on two recently released fine-grained SBIR datasets. Through thorough examination of the experimental results, we have demonstrated that the subspace learning can effectively model the sketch-photo domain-gap. In addition we draw a few key insights to drive future research.
- May 29 2017 cs.NI arXiv:1705.09647v1Heterogeneous ultra-dense networks enable ultra-high data rates and ultra-low latency through the use of dense sub-6 GHz and millimeter wave (mmWave) small cells with different antenna configurations. Existing work has widely studied spectral and energy efficiency in such networks and shown that high spectral and energy efficiency can be achieved. This article investigates the benefits of heterogeneous ultra-dense network architecture from the perspectives of three promising technologies, i.e., physical layer security, caching, and wireless energy harvesting, and provides enthusiastic outlook towards application of these technologies in heterogeneous ultra-dense networks. Based on the rationale of each technology, opportunities and challenges are identified to advance the research in this emerging network.
- Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars based on their trajectories. Since licensed ridesourcing car traces are not openly available and may be completely missing in some cities due to legal issues, we turn to transferring knowledge from public transport open data, i.e, taxis and buses, to ridesourcing detection among ordinary vehicles. We propose a two-stage transfer learning framework. In Stage 1, we take taxi and bus data as input to learn a random forest (RF) classifier using trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we use the RF to label all the candidate cars. In Stage 2, leveraging the subset of high confident labels from the previous stage as input, we further learn a convolutional neural network (CNN) classifier for ridesourcing detection, and iteratively refine RF and CNN, as well as the feature set, via a co-training process. Finally, we use the resulting ensemble of RF and CNN to identify the ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus traces show that our transfer learning framework, with no need of a pre-labeled ridesourcing dataset, can achieve similar accuracy as the supervised learning methods.
- May 17 2017 cs.CV arXiv:1705.05640v1We present the 2017 WebVision Challenge, a public image recognition challenge designed for deep learning based on web images without instance-level human annotation. Following the spirit of previous vision challenges, such as ILSVRC, Places2 and PASCAL VOC, which have played critical roles in the development of computer vision by contributing to the community with large scale annotated data for model designing and standardized benchmarking, we contribute with this challenge a large scale web images dataset, and a public competition with a workshop co-located with CVPR 2017. The WebVision dataset contains more than $2.4$ million web images crawled from the Internet by using queries generated from the $1,000$ semantic concepts of the benchmark ILSVRC 2012 dataset. Meta information is also included. A validation set and test set containing human annotated images are also provided to facilitate algorithmic development. The 2017 WebVision challenge consists of two tracks, the image classification task on WebVision test set, and the transfer learning task on PASCAL VOC 2012 dataset. In this paper, we describe the details of data collection and annotation, highlight the characteristics of the dataset, and introduce the evaluation metrics.
- May 16 2017 cs.CL arXiv:1705.05039v1We present a joint modeling approach to identify salient discussion points in spoken meetings as well as to label the discourse relations between speaker turns. A variation of our model is also discussed when discourse relations are treated as latent variables. Experimental results on two popular meeting corpora show that our joint model can outperform state-of-the-art approaches for both phrase-based content selection and discourse relation prediction tasks. We also evaluate our model on predicting the consistency among team members' understanding of their group decisions. Classifiers trained with features constructed from our model achieve significant better predictive performance than the state-of-the-art.
- May 16 2017 cs.CL arXiv:1705.05040v1Debate and deliberation play essential roles in politics and government, but most models presume that debates are won mainly via superior style or agenda control. Ideally, however, debates would be won on the merits, as a function of which side has the stronger arguments. We propose a predictive model of debate that estimates the effects of linguistic features and the latent persuasive strengths of different topics, as well as the interactions between the two. Using a dataset of 118 Oxford-style debates, our model's combination of content (as latent topics) and style (as linguistic features) allows us to predict audience-adjudicated winners with 74% accuracy, significantly outperforming linguistic features alone (66%). Our model finds that winning sides employ stronger arguments, and allows us to identify the linguistic features associated with strong or weak arguments.
- May 09 2017 cs.CV arXiv:1705.02953v1Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based sampling and aggregation module. This unique design enables our TSN to efficiently learn action models by using the whole action videos. The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the instantiation of TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on four challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%), THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB difference for motion models, our method can still achieve competitive accuracy on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
- May 05 2017 cs.SY arXiv:1705.01711v1In this paper, a discrete-time multi-agent system is presented which is formulated in terms of the delta operator. The proposed multi-agent system can unify discrete-time and continuous-time multi-agent systems. In a multi-agent network, in practice, the communication among agents is acted upon by various factors. The communication network among faulty agents may cause link failures, which is modeled by randomly switching graphs. First, we show that the delta representation of discrete-time multi-agent system reaches consensus in mean (in probability and almost surely) if the expected graph is strongly connected. The results induce that the continuous-time multi-agent system with random networks can also reach consensus in the same sense. Second, the influence of faulty agents on consensus value is quantified under original network. By using matrix perturbation theory, the error bound is also presented in this paper. Finally, a simulation example is provided to demonstrate the effectiveness of our theoretical results.
- May 02 2017 cs.CL arXiv:1705.00045v2We investigate the problem of sentence-level supporting argument detection from relevant documents for user-specified claims. A dataset containing claims and associated citation articles is collected from online debate website idebate.org. We then manually label sentence-level supporting arguments from the documents along with their types as study, factual, opinion, or reasoning. We further characterize arguments of different types, and explore whether leveraging type information can facilitate the supporting arguments detection task. Experimental results show that LambdaMART (Burges, 2010) ranker that uses features informed by argument types yields better performance than the same ranker trained without type information.
- Apr 24 2017 cs.MM arXiv:1704.06444v1The panoramic video is widely used to build virtual reality (VR) and is expected to be one of the next generation Killer-Apps. Transmitting panoramic VR videos is a challenging task because of two problems: 1) panoramic VR videos are typically much larger than normal videos but they need to be transmitted with limited bandwidth in mobile networks. 2) high-resolution and fluent views should be provided to guarantee a superior user experience and avoid side-effects such as dizziness and nausea. To address these two problems, we propose a novel interactive streaming technology, namely Focus-based Interactive Streaming Framework (FISF). FISF consists of three parts: 1) we use the classic clustering algorithm DBSCAN to analyze real user data for Video Focus Detection (VFD); 2) we propose a Focus-based Interactive Streaming Technology (FIST), including a static version and a dynamic version; 3) we propose two optimization methods: focus merging and prefetch strategy. Experimental results show that FISF significantly outperforms the state-of-the-art. The paper is submitted to Sigcomm 2017, VR/AR Network on 31 Mar 2017 at 10:44:04am EDT.
- Apr 21 2017 cs.CV arXiv:1704.06228v1Detecting activities in untrimmed videos is an important yet challenging task. In this paper, we tackle the difficulties of effectively locating the start and the end of a long complex action, which are often met by existing methods. Our key contribution is the structured segment network, a novel framework for temporal action detection, which models the temporal structure of each activity instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model, which comprises two classifiers, respectively for classifying activities and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. We also propose a simple yet effective temporal action proposal scheme that can generate proposals of considerably higher qualities. On two challenging benchmarks, THUMOS14 and ActivityNet, our method remarkably outperforms existing state-of-the-art methods by over $ 10\% $ absolute average mAP, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.
- We consider the phase retrieval problem of recovering the unknown signal from the magnitude-only measurements, where the measurements can be contaminated by both sparse arbitrary corruption and bounded random noise. We propose a new nonconvex algorithm for robust phase retrieval, namely Robust Wirtinger Flow, to jointly estimate the unknown signal and the sparse corruption. We show that our proposed algorithm is guaranteed to converge linearly to the unknown true signal up to a minimax optimal statistical precision in such a challenging setting. Compared with existing robust phase retrieval methods, we improved the statistical error rate by a factor of $\sqrt{n/m}$ where $n$ is the dimension of the signal and $m$ is the sample size, provided a refined characterization of the corruption fraction requirement, and relaxed the lower bound condition on the number of corruption. In the noise-free case, our algorithm converges to the unknown signal at a linear rate and achieves optimal sample complexity up to a logarithm factor. Thorough experiments on both synthetic and real datasets corroborate our theory.
- Apr 17 2017 cs.CL arXiv:1704.04347v2In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.
- Apr 14 2017 cs.CV arXiv:1704.04141v1Procedural textures are normally generated from mathematical models with parameters carefully selected by experienced users. However, for naive users, the intuitive way to obtain a desired texture is to provide semantic descriptions such as "regular," "lacelike," and "repetitive" and then a procedural model with proper parameters will be automatically suggested to generate the corresponding textures. By contrast, it is less practical for users to learn mathematical models and tune parameters based on multiple examinations of large numbers of generated textures. In this study, we propose a novel framework that generates procedural textures according to user-defined semantic descriptions, and we establish a mapping between procedural models and semantic texture descriptions. First, based on a vocabulary of semantic attributes collected from psychophysical experiments, a multi-label learning method is employed to annotate a large number of textures with semantic attributes to form a semantic procedural texture dataset. Then, we derive a low dimensional semantic space in which the semantic descriptions can be separated from one other. Finally, given a set of semantic descriptions, the diverse properties of the samples in the semantic space can lead the framework to find an appropriate generation model that uses appropriate parameters to produce a desired texture. The experimental results show that the proposed framework is effective and that the generated textures closely correlate with the input semantic descriptions.
- Apr 13 2017 cs.CV arXiv:1704.03470v2This paper investigates two-branch neural networks for image-text matching tasks. We propose two different network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. The second one, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our two-branch networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.
- Apr 13 2017 cs.LG arXiv:1704.03718v1Extreme multi-label learning or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves 2L possible label sets when the label dimension L is very large e.g. in millions for Wikipedia labels. This paper is motivated to better explore the label space by build- ing and modeling an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems includ- ing multi-label classification, however it has not been sufficiently studied in this extreme but practi- cal case, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classifi- cation. Our method harvests the ideas of non-linear embedding and modeling label space with graph priors at the same time. Extensive experiments on public datasets for XML show that our method per- form competitively against state-of-the-art result.
- Apr 11 2017 cs.CV arXiv:1704.02581v2Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.
- Triplet networks are widely used models that are characterized by good performance in classification and retrieval tasks. In this work we propose to train a triplet network by putting it as the discriminator in Generative Adversarial Nets (GANs). We make use of the good capability of representation learning of the discriminator to increase the predictive quality of the model. We evaluated our approach on Cifar10 and MNIST datasets and observed significant improvement on the classification performance using the simple k-nn method.
- Apr 06 2017 cs.SI arXiv:1704.01508v1Recent research has shown a substantial active presence of bots in online social networks (OSNs). In this paper we utilise our past work on studying bots (Stweeler) to comparatively analyse the usage and impact of bots and humans on Twitter, one of the largest OSNs in the world. We collect a large-scale Twitter dataset and define various metrics based on tweet metadata. We divide and filter the dataset in four popularity groups in terms of number of followers. Using a human annotation task we assign 'bot' and 'human' ground-truth labels to the dataset, and compare the annotations against an online bot detection tool for evaluation. We then ask a series of questions to discern important behavioural bot and human characteristics using metrics within and among four popularity groups. From the comparative analysis we draw important differences as well as surprising similarities between the two entities, thus paving the way for reliable classification of automated political infiltration, advertisement campaigns, and general bot detection.
- Apr 04 2017 cs.NI arXiv:1704.00336v2This paper focuses on wireless powered 5G dense cellular networks, where base station (BS) delivers energy to user equipment (UE) via the microwave radiation in sub-6 GHz or millimeter wave (mmWave) frequency, and UE uses the harvested energy for uplink information transmission. By addressing the impacts of employing different number of antennas and bandwidths at lower and higher frequencies, we evaluate the amount of harvested energy and throughput in such networks. Based on the derived results, we obtain the required small cell density to achieve an expected level of harvested energy or throughput. Also, we obtain that when the ratio of the number of sub-6 GHz BSs to that of the mmWave BSs is lower than a given threshold, UE harvests more energy from a mmWave BS than a sub-6 GHz BS. We find how many mmWave small cells are needed to perform better than the sub-6 GHz small cells from the perspectives of harvested energy and throughput. Our results reveal that the amount of harvested energy from the mmWave tier can be comparable to the sub-6 GHz counterpart in the dense scenarios. For the same tier scale, mmWave tier can achieve higher throughput. Furthermore, the throughput gap between different mmWave frequencies increases with the mmWave BS density.
- Apr 03 2017 cs.CV arXiv:1703.10898v1Deep ConvNets have been shown to be effective for the task of human pose estimation from single images. However, several challenging issues arise in the video-based case such as self-occlusion, motion blur, and uncommon poses with few or no examples in training data sets. Temporal information can provide additional cues about the location of body joints and help to alleviate these issues. In this paper, we propose a deep structured model to estimate a sequence of human poses in unconstrained videos. This model can be efficiently trained in an end-to-end manner and is capable of representing appearance of body joints and their spatio-temporal relationships simultaneously. Domain knowledge about the human body is explicitly incorporated into the network providing effective priors to regularize the skeletal structure and to enforce temporal consistency. The proposed end-to-end architecture is evaluated on two widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based pose estimation. Our approach significantly outperforms the existing state-of-the-art methods.
- Apr 03 2017 cs.NI arXiv:1703.10750v1As the explosive growth of smart devices and the advent of many new applications, traffic volume has been growing exponentially. The traditional centralized network architecture cannot accommodate such user demands due to heavy burden on the backhaul links and long latency. Therefore, new architectures which bring network functions and contents to the network edge are proposed, i.e., mobile edge computing and caching. Mobile edge networks provide cloud computing and caching capabilities at the edge of cellular networks. In this survey, we make an exhaustive review on the state-of-the-art research efforts on mobile edge networks. We first give an overview of mobile edge networks including definition, architecture and advantages. Next, a comprehensive survey of issues on computing, caching and communication techniques at the network edge is presented respectively. The applications and use cases of mobile edge networks are discussed. Subsequently, the key enablers of mobile edge networks such as cloud technology, SDN/NFV and smart devices are discussed. Finally, open research challenges and future directions are presented as well.
- Apr 03 2017 cs.NI arXiv:1703.10794v1Caching popular contents at the edge of cellular networks has been proposed to reduce the load, and hence the cost of backhaul links. It is significant to decide which files should be cached and where to cache them. In this paper, we propose a distributed caching scheme considering the tradeoff between the diversity and redundancy of base stations' cached contents. Whether it is better to cache the same or different contents in different base stations? To find out this, we formulate an optimal redundancy caching problem. Our goal is to minimize the total transmission cost of the network, including cost within the radio access network (RAN) and cost incurred by transmission to the core network via backhaul links. The optimal redundancy ratio under given system configuration is obtained with adapted particle swarm optimization (PSO) algorithm. We analyze the impact of important system parameters through Monte-Carlo simulation. Results show that the optimal redundancy ratio is mainly influenced by two parameters, which are the backhaul to RAN unit cost ratio and the steepness of file popularity distribution. The total cost can be reduced by up to 54% at given unit cost ratio of backhaul to RAN when the optimal redundancy ratio is selected. Under typical file request pattern, the reduction amount can be up to 57%.
- In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $\epsilon$-DP and $(\epsilon, \delta)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.
- Mar 10 2017 cs.CV arXiv:1703.03329v2Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets.
- Based on the distinguishing features of multi-tier millimeter wave (mmWave) networks such as different transmit powers, different directivity gains from directional beamforming alignment and path loss laws for line-of-sight (LOS) and non-line-of-sight (NLOS) links, we introduce a normalization model to simplify the analysis of multi-tier mmWave cellular networks. The highlight of the model is that we convert a multi-tier mmWave cellular network into a single-tier mmWave network, where all the base stations (BSs) have the same normalized transmit power 1 and the densities of BSs scaled by LOS or NLOS scaling factors respectively follow piecewise constant function which has multiple demarcation points. On this basis, expressions for computing the coverage probability are obtained in general case with beamforming alignment errors and the special case with perfect beamforming alignment in the communication. According to corresponding numerical exploration, we conclude that the normalization model for multi-tier mmWave cellular networks fully meets requirements of network performance analysis, and it is simpler and clearer than the untransformed model. Besides, an unexpected but sensible finding is that there is an optimal beam width that maximizes coverage probability in the case with beamforming alignment errors.
- Mar 09 2017 cs.CV arXiv:1703.02716v1Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g., they often meet difficulties in locating the beginning and end of a long complex action. In this paper, we propose a generic framework that can accurately detect a wide variety of activities from untrimmed videos. Our first contribution is a novel proposal scheme that can efficiently generate candidates with accurate temporal boundaries. The other contribution is a cascaded classification pipeline that explicitly distinguishes between relevance and completeness of a candidate instance. On two challenging temporal activity detection datasets, THUMOS14 and ActivityNet, the proposed framework significantly outperforms the existing state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.
- Mar 08 2017 cs.CV arXiv:1703.02271v1The study on point sources in astronomical images is of special importance, since most energetic celestial objects in the Universe exhibit a point-like appearance. An approach to recognize the point sources (PS) in the X-ray astronomical images using our newly designed granular binary-tree support vector machine (GBT-SVM) classifier is proposed. First, all potential point sources are located by peak detection on the image. The image and spectral features of these potential point sources are then extracted. Finally, a classifier to recognize the true point sources is build through the extracted features. Experiments and applications of our approach on real X-ray astronomical images are demonstrated. comparisons between our approach and other SVM-based classifiers are also carried out by evaluating the precision and recall rates, which prove that our approach is better and achieves a higher accuracy of around 89%.
- Mar 07 2017 cs.CV arXiv:1703.01702v1This paper studies the problem of how to choose good viewpoints for taking photographs of architectures. We achieve this by learning from professional photographs of world famous landmarks that are available on the Internet. Unlike previous efforts devoted to photo quality assessment which mainly rely on 2D image features, we show in this paper combining 2D image features extracted from images with 3D geometric features computed on the 3D models can result in more reliable evaluation of viewpoint quality. Specifically, we collect a set of photographs for each of 15 world famous architectures as well as their 3D models from the Internet. Viewpoint recovery for images is carried out through an image-model registration process, after which a newly proposed viewpoint clustering strategy is exploited to validate users' viewpoint preferences when photographing landmarks. Finally, we extract a number of 2D and 3D features for each image based on multiple visual and geometric cues and perform viewpoint recommendation by learning from both 2D and 3D features using a specifically designed SVM-2K multi-view learner, achieving superior performance over using solely 2D or 3D features. We show the effectiveness of the proposed approach through extensive experiments. The experiments also demonstrate that our system can be used to recommend viewpoints for rendering textured 3D models of buildings for the use of architectural design, in addition to viewpoint evaluation of photographs and recommendation of viewpoints for photographing architectures in practice.
- Mar 06 2017 cs.CV arXiv:1703.01086v2This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks (RRPN), which is designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in orientation. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to the feature map for a text region classifier. The whole framework is built upon region proposal based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection comparing with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets, and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
- Mar 03 2017 cs.NI physics.soc-ph arXiv:1703.00520v1The key requirement to routing in any telecommunication network, and especially in Internet-of-Things (IoT) networks, is scalability. Routing must route packets between any source and destination in the network without incurring unmanageable routing overhead that grows quickly with increasing network size and dynamics. Here we present an addressing scheme and a coupled network topology design scheme that guarantee essentially optimal routing scalability. The FIB sizes are as small as they can be, equal to the number of adjacencies a node has, while the routing control overhead is minimized as nearly zero routing control messages are exchanged even upon catastrophic failures in the network. The key new ingredient is the addressing scheme, which is purely local, based only on geographic coordinates of nodes and a centrality measure, and does not require any sophisticated non-local computations or global network topology knowledge for network embedding. The price paid for these benefits is that network topology cannot be arbitrary but should follow a specific design, resulting in Internet-like topologies. The proposed schemes can be most easily deployed in overlay networks, and also in other network deployments, where geolocation information is available, and where network topology can grow following the design specifications.
- Iterative Hard Thresholding (IHT) is a class of projected gradient descent methods for optimizing sparsity-constrained minimization models, with the best known efficiency and scalability in practice. As far as we know, the existing IHT-style methods are designed for sparse minimization in primal form. It remains open to explore duality theory and algorithms in such a non-convex and NP-hard problem setting. In this paper, we bridge this gap by establishing a duality theory for sparsity-constrained minimization with $\ell_2$-regularized loss function and proposing an IHT-style algorithm for dual maximization. Our sparse duality theory provides a set of sufficient and necessary conditions under which the original NP-hard/non-convex problem can be equivalently solved in a dual formulation. The proposed dual IHT algorithm is a super-gradient method for maximizing the non-smooth dual objective. An interesting finding is that the sparse recovery performance of dual IHT is invariant to the Restricted Isometry Property (RIP), which is required by virtually all the existing primal IHT algorithms without sparsity relaxation. Moreover, a stochastic variant of dual IHT is proposed for large-scale stochastic optimization. Numerical results demonstrate the superiority of dual IHT algorithms to the state-of-the-art primal IHT-style algorithms in model estimation accuracy and computational efficiency.
- Mar 02 2017 cs.LG arXiv:1703.00380v2Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is moved to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users' current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also demonstrate the robustness of our approach against adversarial attacks, as well as its feasibility by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi).
- Boltzmann machines are physics informed generative models with wide applications in machine learning. They can learn the probability distribution from an input dataset and generate new samples accordingly. Applying them back to physics, the Boltzmann machines are ideal recommender systems to accelerate Monte Carlo simulation of physical systems due to their flexibility and effectiveness. More intriguingly, we show that the generative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from the latent representation of the Boltzmann machines, which learn to mediate complex interactions and identify clusters of the physical system. We demonstrate these findings with concrete examples of the classical Ising model with and without four spin plaquette interactions. Our results endorse a fresh research paradigm where intelligent machines are designed to create or inspire human discovery of innovative algorithms.
- Traditionally, multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded, thus increases the risk of large variance. Large variance of neuron makes the model sensitive to the change of input distribution, thus results in poor generalization, and aggravates the internal covariate shift which slows down the training. To bound dot product and decrease the variance, we propose to use cosine similarity instead of dot product in neural networks, which we call cosine normalization. Our experiments show that cosine normalization in fully-connected neural networks notably reduces the test err with lower divergence, compared to other normalization techniques. Applied to convolutional networks, cosine normalization also significantly enhances the accuracy of classification and accelerates the training.
- In this work we study the quantitative relation between the recursive teaching dimension (RTD) and the VC dimension (VCD) of concept classes of finite sizes. The RTD of a concept class $\mathcal C \subseteq \{0, 1\}^n$, introduced by Zilles et al. (2011), is a combinatorial complexity measure characterized by the worst-case number of examples necessary to identify a concept in $\mathcal C$ according to the recursive teaching model. For any finite concept class $\mathcal C \subseteq \{0,1\}^n$ with $\mathrm{VCD}(\mathcal C)=d$, Simon & Zilles (2015) posed an open problem $\mathrm{RTD}(\mathcal C) = O(d)$, i.e., is RTD linearly upper bounded by VCD? Previously, the best known result is an exponential upper bound $\mathrm{RTD}(\mathcal C) = O(d \cdot 2^d)$, due to Chen et al. (2016). In this paper, we show a quadratic upper bound: $\mathrm{RTD}(\mathcal C) = O(d^2)$, much closer to an answer to the open problem. We also discuss the challenges in fully solving the problem.
- In this paper we define the overflow problem of a network coding storage system in which the encoding parameter and the storage parameter are mismatched. Through analyses and experiments, we first show the impacts of the overflow problem in a network coding scheme, which not only waste storage spaces, but also degrade coding efficiency. To avoid the overflow problem, we then develop the network coding based secure storage (NCSS) scheme. Thanks to considering both security and storage requirements in encoding procedures and distributed architectures, the NCSS can improve the performance of a cloud storage system from both the aspects of storage cost and coding processing time. We analyze the maximum allowable stored encoded data under the perfect secrecy criterion, and provide the design guidelines for the secure cloud storage system to enhance coding efficiency and achieve the minimal storage cost.
- For units $\delta$ and $\alpha$ in $\F_{p^m}$, the structure of $(\delta+\alpha u^2)$-constacyclic codes of length $p^k$ over $\mathbb{F}_{p^m}+u\mathbb{F}_{p^m}+u^2\mathbb{F}_{p^m}$ is studied and self-dual $(\delta+\alpha u^2)$-constacyclic codes are analyzed.
- Feb 07 2017 cs.CC arXiv:1702.01423v1Feedback shift registers(FSRs) are a fundamental component in electronics and secure communication. An FSR $f$ is said to be reducible if all the output sequences of another FSR $g$ can also be generated by $f$ and the FSR $g$ has less memory than $f$. An FSR is said to be decomposable if it has the same set of output sequences as a cascade connection of two FSRs. It is proved that deciding whether FSRs are irreducible/indecomposable is NP-hard.
- Feb 06 2017 cs.RO arXiv:1702.01075v1Safety Barrier Certificates that ensure collision-free maneuvers for teams of differential flatness-based quadrotors are presented in this paper. Synthesized with control barrier functions, the certificates are used to modify the nominal trajectory in a minimally invasive way to avoid collisions. The proposed collision avoidance strategy complements existing flight control and planning algorithms by providing trajectory modifications with provable safety guarantees. The effectiveness of this strategy is supported both by the theoretical results and experimental validation on a team of five quadrotors.
- Feb 02 2017 cs.CV arXiv:1702.00254v3We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
- Breaking the fronthaul capacity limitations is vital to make cloud radio access network (C-RAN) scalable and practical. One promising way is aggregating several remote radio units (RRUs) as a cluster to share a fronthaul link, so as to enjoy the statistical multiplexing gain brought by the spatial randomness of the traffic. In this letter, a tractable model is proposed to analyze the fronthaul statistical multiplexing gain. We first derive the user blocking probability caused by the limited fronthaul capacity, including its upper and lower bounds. We then obtain the limits of fronthaul statistical multiplexing gain when the cluster size approaches infinity. Analytical results reveal that the user blocking probability decreases exponentially with the average fronthaul capacity per RRU, and the exponent is proportional to the cluster size. Numerical results further show considerable fronthaul statistical multiplexing gain even at a small to medium cluster size.
- Taking into account of both the huge computing power of intruders and untrusted cloud servers, we develop an enhanced secure pseudonym scheme to protect the privacy of mobile cloud data. To face the huge computing power challenge, we develop an unconditionally secure lightweight network coding pseudonym scheme. For the privacy issue of untrusted cloud server, we further design a two tier network coding to decouple the stored mobile cloud data from the owner pseudonyms. Therefore, our proposed network coding based pseudonym scheme can simultaneously defend against attackers from both outside and inside. We implement our proposed two-tier light-weight network coding mechanism in a group location based service (LBS) using untrusted cloud database. Compared to computationally secure Hash-based pseudonym, our proposed scheme is not only unconditionally secure, but also can reduce more than 90 percent of processing time as well as 10 percent of energy consumption.
- One of key 5G scenarios is that device-to-device (D2D) and massive multiple-input multiple-output (MIMO) will be co-existed. However, interference in the uplink D2D underlaid massive MIMO cellular networks needs to be coordinated, due to the vast cellular and D2D transmissions. To this end, this paper introduces a spatially dynamic power control solution for mitigating the cellular-to-D2D and D2D-to-cellular interference. In particular, the proposed D2D power control policy is rather flexible including the special cases of no D2D links or using maximum transmit power. Under the considered power control, an analytical approach is developed to evaluate the spectral efficiency (SE) and energy efficiency (EE) in such networks. Thus, the exact expressions of SE for a cellular user or D2D transmitter are derived, which quantify the impacts of key system parameters such as massive MIMO antennas and D2D density. Moreover, the D2D scale properties are obtained, which provide the sufficient conditions for achieving the anticipated SE. Numerical results corroborate our analysis and show that the proposed power control solution can efficiently mitigate interference between the cellular and D2D tier. The results demonstrate that there exists the optimal D2D density for maximizing the area SE of D2D tier. In addition, the achievable EE of a cellular user can be comparable to that of a D2D user.
- A low-complexity coding scheme is developed to achieve the rate region of maximum likelihood decoding for interference channels. As in the classical rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting, the proposed coding scheme uses superposition of multiple codewords with successive cancellation decoding, which can be implemented using standard point-to-point encoders and decoders. Unlike rate-splitting multiple access, which is not rate-optimal for multiple receivers, the proposed coding scheme transmits codewords over multiple blocks in a staggered manner and recovers them successively over sliding decoding windows, achieving the single-stream optimal rate region as well as the more general Han--Kobayashi inner bound for the two-user interference channel. The feasibility of this scheme in practice is verified by implementing it using commercial channel codes over the two-user Gaussian interference channel.
- Jan 11 2017 cs.CR arXiv:1701.02711v1There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.
- Jan 06 2017 cs.DC arXiv:1701.01170v1For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.
- In this paper, we focus on one of the representative 5G network scenarios, namely multi-tier heterogeneous cellular networks. User association is investigated in order to reduce the down-link co-channel interference. Firstly, in order to analyze the multi-tier heterogeneous cellular networks where the base stations in different tiers usually adopt different transmission powers, we propose a Transmission Power Normalization Model (TPNM), which is able to convert a multi-tier cellular network into a single-tier network, such that all base stations have the same normalized transmission power. Then using TPNM, the signal and interference received at any point in the complex multi-tier environment can be analyzed by considering the same point in the equivalent single-tier cellular network model, thus significantly simplifying the analysis. On this basis, we propose a new user association scheme in heterogeneous cellular networks, where the base station that leads to the smallest interference to other co-channel mobile stations is chosen from a set of candidate base stations that satisfy the quality-of-service (QoS) constraint for an intended mobile station. Numerical results show that the proposed user association scheme is able to significantly reduce the down-link interference compared with existing schemes while maintaining a reasonably good QoS.
- We introduce the notion of information ratio $\text{Ir}(H/G)$ between two (simple, undirected) graphs $G$ and $H$, defined as the supremum of ratios $k/n$ such that there exists a mapping between the strong products $G^k$ to $H^n$ that preserves non-adjacency. Operationally speaking, the information ratio is the maximal number of source symbols per channel use that can be reliably sent over a channel with a confusion graph $H$, where reliability is measured w.r.t. a source confusion graph $G$. Various results are provided, including in particular lower and upper bounds on $\text{Ir}(H/G)$ in terms of different graph properties, inequalities and identities for behavior under strong product and disjoint union, relations to graph cores, and notions of graph criticality. Informally speaking, $\text{Ir}(H/G)$ can be interpreted as a measure of similarity between $G$ and $H$. We make this notion precise by introducing the concept of information equivalence between graphs, a more quantitative version of homomorphic equivalence. We then describe a natural partial ordering over the space of information equivalence classes, and endow it with a suitable metric structure that is contractive under the strong product. Various examples and open problems are discussed.
- Dec 16 2016 cs.SY arXiv:1612.04913v1In this paper, a class of convex feasibility problems (CFPs) are studied for multi-agent systems through local interactions. The objective is to search a feasible solution to the convex inequalities with some set constraints in a distributed manner. The distributed control algorithms, involving subgradient and projection, are proposed for both continuous- and discrete-time systems, respectively. Conditions associated with connectivity of the directed communication graph are given to ensure convergence of the algorithms. It is shown that under mild conditions, the states of all agents reach consensus asymptotically and the consensus state is located in the solution set of the CFP. Simulation examples are presented to demonstrate the effectiveness of the theoretical results.
- Dec 15 2016 cs.NI arXiv:1612.04457v1The orbital angular momentum (OAM) technique provides a new degree of freedom for information transmissions in millimeter wave communications. Considering the spatial distribution characteristics of OAM beams, a new OAM spatial modulation (OAM-SM) millimeter wave communication system is first proposed for future mobile networks. Furthermore, the capacity, average bit error probability and energy efficiency of OAM-SM millimeter wave communication systems are analytically derived for performance analysis. Compared with the conventional multi-input multi-output (MIMO) millimeter wave communication systems, the maximum capacity and energy efficiency of OAM-SM millimeter wave communication systems are improved by 36% and 472.3%, respectively. Moreover, numerical results indicate that the proposed OAM-SM millimeter wave communication systems are more robust to path-loss attenuations than the conventional MIMO millimeter wave communication systems, which makes it suitable for long-range transmissions. Therefore, OAM-SM millimeter wave communication systems provide a great growth space for future mobile networks.
- It is believed that hippocampus functions as a memory allocator in brain, the mechanism of which remains unrevealed. In Valiant's neuroidal model, the hippocampus was described as a randomly connected graph, the computation on which maps input to a set of activated neuroids with stable size. Valiant proposed three requirements for the hippocampal circuit to become a stable memory allocator (SMA): stability, continuity and orthogonality. The functionality of SMA in hippocampus is essential in further computation within cortex, according to Valiant's model. In this paper, we put these requirements for memorization functions into rigorous mathematical formulation and introduce the concept of capacity, based on the probability of erroneous allocation. We prove fundamental limits for the capacity and error probability of SMA, in both data-independent and data-dependent settings. We also establish an example of stable memory allocator that can be implemented via neuroidal circuits. Both theoretical bounds and simulation results show that the neural SMA functions well.
- Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve a-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low doubling dimension, leading to significantly improved performance. In this paper, we address this issue and propose a refined analyses for running time of approximating near neighbors queries via LSH. We characterize dispersion of data using N_b, the number of b*r-near pairs among the data points. Combined with optimal data-oblivious LSH scheme, we get a new query time bound depending on N_b and doubling dimension. For many natural scenarios where points are well-dispersed or lying in a low-doubling-dimension space, our result leads to sharper performance than existing worst-case analysis. This paper not only present first rigorous proof on how LSHs make use of the structure of data points, but also provide important insights into parameter setting in the practice of LSH beyond worst case. Besides, the techniques in our analysis involve a generalized version of sphere packing problem, which might be of some independent interest.
- With the fifth generation (5G) of mobile broadband systems, Radio Resources Management (RRM) will reach unprecedented levels of complexity. To cope with the higher complexity of RRM functionalities, while retaining the fast execution required in 5G, this manuscript presents a lean 5G RRM architecture that capitalizes on the most recent advances in the field of machine learning in combination with the large amount of data readily available in the network from measurements and system observations. The result is a general-purpose learning framework capable of generating algorithms specialized to RRM functionalities directly from data gathered in the network. The potential of this approach is verified in three study cases and future directions on applications of machine learning to RRM are discussed.
- Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.
- Nov 22 2016 cs.LG arXiv:1611.06306v1A novel data representation method of convolutional neural net- work (CNN) is proposed in this paper to represent data of different modalities. We learn a CNN model for the data of each modality to map the data of differ- ent modalities to a common space, and regularize the new representations in the common space by a cross-model relevance matrix. We further impose that the class label of data points can also be predicted from the CNN representa- tions in the common space. The learning problem is modeled as a minimiza- tion problem, which is solved by an augmented Lagrange method (ALM) with updating rules of Alternating direction method of multipliers (ADMM). The experiments over benchmark of sequence data of multiple modalities show its advantage.
- Nov 22 2016 physics.soc-ph cs.SI arXiv:1611.06660v1While urban systems demonstrate high spatial heterogeneity, many urban planning, economic and political decisions heavily rely on a deep understanding of local neighborhood contexts. We show that the structure of 311 Service Requests enables one possible way of building a unique signature of the local urban context, thus being able to serve as a low-cost decision support tool for urban stakeholders. Considering examples of New York City, Boston and Chicago, we demonstrate how 311 Service Requests recorded and categorized by type in each neighborhood can be utilized to generate a meaningful classification of locations across the city, based on distinctive socioeconomic profiles. Moreover, the 311-based classification of urban neighborhoods can present sufficient information to model various socioeconomic features. Finally, we show that these characteristics are capable of predicting future trends in comparative local real estate prices. We demonstrate 311 Service Requests data can be used to monitor and predict socioeconomic performance of urban neighborhoods, allowing urban stakeholders to quantify the impacts of their interventions.
- Nov 22 2016 cs.IR arXiv:1611.06668v1Sequential prediction is a fundamental task for Web applications. Due to the insufficiency of user feedbacks, sequential prediction usually suffers from the cold start problem. There are two kinds of popular approaches based on matrix factorization (MF) and Markov chains (MC) for item prediction. MF methods factorize the user-item matrix to learn general tastes of users. MC methods predict the next behavior based on recent behaviors. However, they have limitations. MF methods can merge additional information to address cold start but could not capture dynamic properties of user's interest, and MC based sequential methods have difficulty in addressing cold start and has a strong Markov assumption that the next state only depends on the last state. In this work, to deal with the cold start problem of sequential prediction, we propose a RNN model adopting visual and textual content of items, which is named as $\mathbf{V}$isual and $\mathbf{T}$extual $\mathbf{R}$ecurrent $\mathbf{N}$eural $\mathbf{N}$etwork ($\mathbf{VT}$-$\mathbf{RNN}$). We can simultaneously learn the sequential latent vectors that dynamically capture the user's interest, as well as content-based representations that contribute to address the cold start. Experiments on two real-world datasets show that our proposed VT-RNN model can effectively generate the personalized ranking list and significantly alleviate the cold start problem.
- Nov 18 2016 cs.CV arXiv:1611.05592v1Video captioning which automatically translates video clips into natural language sentences is a very important task in computer vision. By virtue of recent deep learning technologies, e.g., convolutional neural networks (CNNs) and recurrent neural networks (RNNs), video captioning has made great progress. However, learning an effective mapping from visual sequence space to language space is still a challenging problem. In this paper, we propose a Multimodal Memory Model (M3) to describe videos, which builds a visual and textual shared memory to model the long-term visual-textual dependency and further guide global visual attention on described targets. Specifically, the proposed M3 attaches an external memory to store and retrieve both visual and textual contents by interacting with video and sentence with multiple read and write operations. First, text representation in the Long Short-Term Memory (LSTM) based text decoder is written into the memory, and the memory contents will be read out to guide an attention to select related visual targets. Then, the selected visual information is written into the memory, which will be further read out to the text decoder. To evaluate the proposed model, we perform experiments on two publicly benchmark datasets: MSVD and MSR-VTT. The experimental results demonstrate that our method outperforms the state-of-theart methods in terms of BLEU and METEOR.
- Nov 18 2016 cs.CV arXiv:1611.05588v1Effective image and sentence matching depends on how to well measure their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal Long Short-Term Memory network (sm-LSTM) for instance-aware image and sentence matching. The sm-LSTM includes a multimodal context-modulated attention scheme at each timestep that can selectively attend to a pair of instances of image and sentence, by predicting pairwise instance-aware saliency maps for image and sentence. For selected pairwise instances, their representations are obtained based on the predicted saliency maps, and then compared to measure their local similarity. By similarly measuring multiple local similarities within a few timesteps, the sm-LSTM sequentially aggregates them with hidden states to obtain a final matching score as the desired global similarity. Extensive experiments show that our model can well match image and sentence with complex content, and achieve the state-of-the-art results on two public benchmark datasets.
- Nov 15 2016 cs.DC arXiv:1611.04255v2We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for sub-gradient aggregations, which for large messages quickly dominates overall execution time and limits parallel scalability. To address this problem, we develop a new technique for collective operations, referred to as Linear Pipelining (LP). It is tuned to the message sizes that arise in BSP-SGD, and works effectively on multi-GPU systems. Theoretically, the cost of LP is invariant to $P$, where $P$ is the number of GPUs, while the cost of more conventional Minimum Spanning Tree (MST) scales like $O(\log P)$. LP also demonstrate up to 2x faster bandwidth than Bidirectional Exchange (BE) techniques that are widely adopted by current MPI implementations. We apply these collectives to BSP-SGD, showing that the proposed implementations reduce communication bottlenecks in practice while preserving the attractive convergence properties of BSP-SGD.
- Probabilistic Temporal Tensor Factorization (PTTF) is an effective algorithm to model the temporal tensor data. It leverages a time constraint to capture the evolving properties of tensor data. Nowadays the exploding dataset demands a large scale PTTF analysis, and a parallel solution is critical to accommodate the trend. Whereas, the parallelization of PTTF still remains unexplored. In this paper, we propose a simple yet efficient Parallel Probabilistic Temporal Tensor Factorization, referred to as P$^2$T$^2$F, to provide a scalable PTTF solution. P$^2$T$^2$F is fundamentally disparate from existing parallel tensor factorizations by considering the probabilistic decomposition and the temporal effects of tensor data. It adopts a new tensor data split strategy to subdivide a large tensor into independent sub-tensors, the computation of which is inherently parallel. We train P$^2$T$^2$F with an efficient algorithm of stochastic Alternating Direction Method of Multipliers, and show that the convergence is guaranteed. Experiments on several real-word tensor datasets demonstrate that P$^2$T$^2$F is a highly effective and efficiently scalable algorithm dedicated for large scale probabilistic temporal tensor analysis.