results for au:Wang_L in:cs

- Apr 24 2017 cs.MM arXiv:1704.06444v1The panoramic video is widely used to build virtual reality (VR) and is expected to be one of the next generation Killer-Apps. Transmitting panoramic VR videos is a challenging task because of two problems: 1) panoramic VR videos are typically much larger than normal videos but they need to be transmitted with limited bandwidth in mobile networks. 2) high-resolution and fluent views should be provided to guarantee a superior user experience and avoid side-effects such as dizziness and nausea. To address these two problems, we propose a novel interactive streaming technology, namely Focus-based Interactive Streaming Framework (FISF). FISF consists of three parts: 1) we use the classic clustering algorithm DBSCAN to analyze real user data for Video Focus Detection (VFD); 2) we propose a Focus-based Interactive Streaming Technology (FIST), including a static version and a dynamic version; 3) we propose two optimization methods: focus merging and prefetch strategy. Experimental results show that FISF significantly outperforms the state-of-the-art. The paper is submitted to Sigcomm 2017, VR/AR Network on 31 Mar 2017 at 10:44:04am EDT.
- Apr 21 2017 cs.CV arXiv:1704.06228v1Detecting activities in untrimmed videos is an important yet challenging task. In this paper, we tackle the difficulties of effectively locating the start and the end of a long complex action, which are often met by existing methods. Our key contribution is the structured segment network, a novel framework for temporal action detection, which models the temporal structure of each activity instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model, which comprises two classifiers, respectively for classifying activities and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. We also propose a simple yet effective temporal action proposal scheme that can generate proposals of considerably higher qualities. On two challenging benchmarks, THUMOS14 and ActivityNet, our method remarkably outperforms existing state-of-the-art methods by over $ 10\% $ absolute average mAP, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.
- We consider the phase retrieval problem of recovering the unknown signal from the magnitude-only measurements, where the measurements can be contaminated by both sparse arbitrary corruption and bounded random noise. We propose a new nonconvex algorithm for robust phase retrieval, namely Robust Wirtinger Flow, to jointly estimate the unknown signal and the sparse corruption. We show that our proposed algorithm is guaranteed to converge linearly to the unknown true signal up to a minimax optimal statistical precision in such a challenging setting. Compared with existing robust phase retrieval methods, we improved the statistical error rate by a factor of $\sqrt{n/m}$ where $n$ is the dimension of the signal and $m$ is the sample size, provided a refined characterization of the corruption fraction requirement, and relaxed the lower bound condition on the number of corruption. In the noise-free case, our algorithm converges to the unknown signal at a linear rate and achieves optimal sample complexity up to a logarithm factor. Thorough experiments on both synthetic and real datasets corroborate our theory.
- Apr 17 2017 cs.CL arXiv:1704.04347v2In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.
- Apr 14 2017 cs.CV arXiv:1704.04141v1Procedural textures are normally generated from mathematical models with parameters carefully selected by experienced users. However, for naive users, the intuitive way to obtain a desired texture is to provide semantic descriptions such as "regular," "lacelike," and "repetitive" and then a procedural model with proper parameters will be automatically suggested to generate the corresponding textures. By contrast, it is less practical for users to learn mathematical models and tune parameters based on multiple examinations of large numbers of generated textures. In this study, we propose a novel framework that generates procedural textures according to user-defined semantic descriptions, and we establish a mapping between procedural models and semantic texture descriptions. First, based on a vocabulary of semantic attributes collected from psychophysical experiments, a multi-label learning method is employed to annotate a large number of textures with semantic attributes to form a semantic procedural texture dataset. Then, we derive a low dimensional semantic space in which the semantic descriptions can be separated from one other. Finally, given a set of semantic descriptions, the diverse properties of the samples in the semantic space can lead the framework to find an appropriate generation model that uses appropriate parameters to produce a desired texture. The experimental results show that the proposed framework is effective and that the generated textures closely correlate with the input semantic descriptions.
- Apr 13 2017 cs.CV arXiv:1704.03470v2This paper investigates two-branch neural networks for image-text matching tasks. We propose two different network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. The second one, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our two-branch networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.
- Apr 13 2017 cs.LG arXiv:1704.03718v1Extreme multi-label learning or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves 2L possible label sets when the label dimension L is very large e.g. in millions for Wikipedia labels. This paper is motivated to better explore the label space by build- ing and modeling an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems includ- ing multi-label classification, however it has not been sufficiently studied in this extreme but practi- cal case, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classifi- cation. Our method harvests the ideas of non-linear embedding and modeling label space with graph priors at the same time. Extensive experiments on public datasets for XML show that our method per- form competitively against state-of-the-art result.
- Apr 11 2017 cs.CV arXiv:1704.02581v2Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.
- Triplet networks are widely used models that are characterized by good performance in classification and retrieval tasks. In this work we propose to train a triplet network by putting it as the discriminator in Generative Adversarial Nets (GANs). We make use of the good capability of representation learning of the discriminator to increase the predictive quality of the model. We evaluated our approach on Cifar10 and MNIST datasets and observed significant improvement on the classification performance using the simple k-nn method.
- Apr 06 2017 cs.SI arXiv:1704.01508v1Recent research has shown a substantial active presence of bots in online social networks (OSNs). In this paper we utilise our past work on studying bots (Stweeler) to comparatively analyse the usage and impact of bots and humans on Twitter, one of the largest OSNs in the world. We collect a large-scale Twitter dataset and define various metrics based on tweet metadata. We divide and filter the dataset in four popularity groups in terms of number of followers. Using a human annotation task we assign 'bot' and 'human' ground-truth labels to the dataset, and compare the annotations against an online bot detection tool for evaluation. We then ask a series of questions to discern important behavioural bot and human characteristics using metrics within and among four popularity groups. From the comparative analysis we draw important differences as well as surprising similarities between the two entities, thus paving the way for reliable classification of automated political infiltration, advertisement campaigns, and general bot detection.
- Apr 04 2017 cs.NI arXiv:1704.00336v2This paper focuses on wireless powered 5G dense cellular networks, where base station (BS) delivers energy to user equipment (UE) via the microwave radiation in sub-6 GHz or millimeter wave (mmWave) frequency, and UE uses the harvested energy for uplink information transmission. By addressing the impacts of employing different number of antennas and bandwidths at lower and higher frequencies, we evaluate the amount of harvested energy and throughput in such networks. Based on the derived results, we obtain the required small cell density to achieve an expected level of harvested energy or throughput. Also, we obtain that when the ratio of the number of sub-6 GHz BSs to that of the mmWave BSs is lower than a given threshold, UE harvests more energy from a mmWave BS than a sub-6 GHz BS. We find how many mmWave small cells are needed to perform better than the sub-6 GHz small cells from the perspectives of harvested energy and throughput. Our results reveal that the amount of harvested energy from the mmWave tier can be comparable to the sub-6 GHz counterpart in the dense scenarios. For the same tier scale, mmWave tier can achieve higher throughput. Furthermore, the throughput gap between different mmWave frequencies increases with the mmWave BS density.
- Apr 03 2017 cs.CV arXiv:1703.10898v1Deep ConvNets have been shown to be effective for the task of human pose estimation from single images. However, several challenging issues arise in the video-based case such as self-occlusion, motion blur, and uncommon poses with few or no examples in training data sets. Temporal information can provide additional cues about the location of body joints and help to alleviate these issues. In this paper, we propose a deep structured model to estimate a sequence of human poses in unconstrained videos. This model can be efficiently trained in an end-to-end manner and is capable of representing appearance of body joints and their spatio-temporal relationships simultaneously. Domain knowledge about the human body is explicitly incorporated into the network providing effective priors to regularize the skeletal structure and to enforce temporal consistency. The proposed end-to-end architecture is evaluated on two widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based pose estimation. Our approach significantly outperforms the existing state-of-the-art methods.
- Apr 03 2017 cs.NI arXiv:1703.10750v1As the explosive growth of smart devices and the advent of many new applications, traffic volume has been growing exponentially. The traditional centralized network architecture cannot accommodate such user demands due to heavy burden on the backhaul links and long latency. Therefore, new architectures which bring network functions and contents to the network edge are proposed, i.e., mobile edge computing and caching. Mobile edge networks provide cloud computing and caching capabilities at the edge of cellular networks. In this survey, we make an exhaustive review on the state-of-the-art research efforts on mobile edge networks. We first give an overview of mobile edge networks including definition, architecture and advantages. Next, a comprehensive survey of issues on computing, caching and communication techniques at the network edge is presented respectively. The applications and use cases of mobile edge networks are discussed. Subsequently, the key enablers of mobile edge networks such as cloud technology, SDN/NFV and smart devices are discussed. Finally, open research challenges and future directions are presented as well.
- Apr 03 2017 cs.NI arXiv:1703.10794v1Caching popular contents at the edge of cellular networks has been proposed to reduce the load, and hence the cost of backhaul links. It is significant to decide which files should be cached and where to cache them. In this paper, we propose a distributed caching scheme considering the tradeoff between the diversity and redundancy of base stations' cached contents. Whether it is better to cache the same or different contents in different base stations? To find out this, we formulate an optimal redundancy caching problem. Our goal is to minimize the total transmission cost of the network, including cost within the radio access network (RAN) and cost incurred by transmission to the core network via backhaul links. The optimal redundancy ratio under given system configuration is obtained with adapted particle swarm optimization (PSO) algorithm. We analyze the impact of important system parameters through Monte-Carlo simulation. Results show that the optimal redundancy ratio is mainly influenced by two parameters, which are the backhaul to RAN unit cost ratio and the steepness of file popularity distribution. The total cost can be reduced by up to 54% at given unit cost ratio of backhaul to RAN when the optimal redundancy ratio is selected. Under typical file request pattern, the reduction amount can be up to 57%.
- In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $\epsilon$-DP and $(\epsilon, \delta)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.
- Mar 10 2017 cs.CV arXiv:1703.03329v1Current action recognition methods heavily rely on trimmed videos for model training. However, it is very expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn from untrimmed videos without the need of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two modules are implemented with feed-forward networks. UntrimmedNet is essentially an end-to-end trainable architecture, which allows for the joint optimization of model parameters of both components. We exploit the learned models for the problems of action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of strongly supervised approaches on these two datasets.
- Based on the distinguishing features of multi-tier millimeter wave (mmWave) networks such as different transmit powers, different directivity gains from directional beamforming alignment and path loss laws for line-of-sight (LOS) and non-line-of-sight (NLOS) links, we introduce a normalization model to simplify the analysis of multi-tier mmWave cellular networks. The highlight of the model is that we convert a multi-tier mmWave cellular network into a single-tier mmWave network, where all the base stations (BSs) have the same normalized transmit power 1 and the densities of BSs scaled by LOS or NLOS scaling factors respectively follow piecewise constant function which has multiple demarcation points. On this basis, expressions for computing the coverage probability are obtained in general case with beamforming alignment errors and in the special case with perfect beamforming alignment in the communication. According to corresponding numerical exploration, we conclude that the normalization model for multi-tier mmWave cellular networks fully meets requirements of network performance analysis, and it is simpler and clearer than the untransformed model. Besides, an unexpected but sensible finding is that there is an optimal beam width that maximizes coverage probability in the case with beamforming alignment errors.
- Mar 09 2017 cs.CV arXiv:1703.02716v1Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g., they often meet difficulties in locating the beginning and end of a long complex action. In this paper, we propose a generic framework that can accurately detect a wide variety of activities from untrimmed videos. Our first contribution is a novel proposal scheme that can efficiently generate candidates with accurate temporal boundaries. The other contribution is a cascaded classification pipeline that explicitly distinguishes between relevance and completeness of a candidate instance. On two challenging temporal activity detection datasets, THUMOS14 and ActivityNet, the proposed framework significantly outperforms the existing state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.
- Mar 08 2017 cs.CV arXiv:1703.02271v1The study on point sources in astronomical images is of special importance, since most energetic celestial objects in the Universe exhibit a point-like appearance. An approach to recognize the point sources (PS) in the X-ray astronomical images using our newly designed granular binary-tree support vector machine (GBT-SVM) classifier is proposed. First, all potential point sources are located by peak detection on the image. The image and spectral features of these potential point sources are then extracted. Finally, a classifier to recognize the true point sources is build through the extracted features. Experiments and applications of our approach on real X-ray astronomical images are demonstrated. comparisons between our approach and other SVM-based classifiers are also carried out by evaluating the precision and recall rates, which prove that our approach is better and achieves a higher accuracy of around 89%.
- Mar 07 2017 cs.CV arXiv:1703.01702v1This paper studies the problem of how to choose good viewpoints for taking photographs of architectures. We achieve this by learning from professional photographs of world famous landmarks that are available on the Internet. Unlike previous efforts devoted to photo quality assessment which mainly rely on 2D image features, we show in this paper combining 2D image features extracted from images with 3D geometric features computed on the 3D models can result in more reliable evaluation of viewpoint quality. Specifically, we collect a set of photographs for each of 15 world famous architectures as well as their 3D models from the Internet. Viewpoint recovery for images is carried out through an image-model registration process, after which a newly proposed viewpoint clustering strategy is exploited to validate users' viewpoint preferences when photographing landmarks. Finally, we extract a number of 2D and 3D features for each image based on multiple visual and geometric cues and perform viewpoint recommendation by learning from both 2D and 3D features using a specifically designed SVM-2K multi-view learner, achieving superior performance over using solely 2D or 3D features. We show the effectiveness of the proposed approach through extensive experiments. The experiments also demonstrate that our system can be used to recommend viewpoints for rendering textured 3D models of buildings for the use of architectural design, in addition to viewpoint evaluation of photographs and recommendation of viewpoints for photographing architectures in practice.
- Mar 06 2017 cs.CV arXiv:1703.01086v2This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks (RRPN), which is designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in orientation. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to the feature map for a text region classifier. The whole framework is built upon region proposal based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection comparing with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets, and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
- Mar 03 2017 cs.NI physics.soc-ph arXiv:1703.00520v1The key requirement to routing in any telecommunication network, and especially in Internet-of-Things (IoT) networks, is scalability. Routing must route packets between any source and destination in the network without incurring unmanageable routing overhead that grows quickly with increasing network size and dynamics. Here we present an addressing scheme and a coupled network topology design scheme that guarantee essentially optimal routing scalability. The FIB sizes are as small as they can be, equal to the number of adjacencies a node has, while the routing control overhead is minimized as nearly zero routing control messages are exchanged even upon catastrophic failures in the network. The key new ingredient is the addressing scheme, which is purely local, based only on geographic coordinates of nodes and a centrality measure, and does not require any sophisticated non-local computations or global network topology knowledge for network embedding. The price paid for these benefits is that network topology cannot be arbitrary but should follow a specific design, resulting in Internet-like topologies. The proposed schemes can be most easily deployed in overlay networks, and also in other network deployments, where geolocation information is available, and where network topology can grow following the design specifications.
- Iterative Hard Thresholding (IHT) is a class of projected gradient descent methods for optimizing sparsity-constrained minimization models, with the best known efficiency and scalability in practice. As far as we know, the existing IHT-style methods are designed for sparse minimization in primal form. It remains open to explore duality theory and algorithms in such a non-convex and NP-hard setting. In this article, we bridge the gap by establishing a duality theory for sparsity-constrained minimization with $\ell_2$-regularized objective and proposing an IHT-style algorithm for dual maximization. Our sparse duality theory provides a set of sufficient and necessary conditions under which the original NP-hard/non-convex problem can be equivalently solved in a dual space. The proposed dual IHT algorithm is a super-gradient method for maximizing the non-smooth dual objective. An interesting finding is that the sparse recovery performance of dual IHT is invariant to the Restricted Isometry Property (RIP), which is required by all the existing primal IHT without sparsity relaxation. Moreover, a stochastic variant of dual IHT is proposed for large-scale stochastic optimization. Numerical results demonstrate that dual IHT algorithms can achieve more accurate model estimation given small number of training data and have higher computational efficiency than the state-of-the-art primal IHT-style algorithms.
- Mar 02 2017 cs.LG arXiv:1703.00380v1Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is moved to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users' current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also demonstrate the feasibility of our approach by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi).
- Boltzmann machines are physics informed generative models with wide applications in machine learning. They can learn the probability distribution from an input dataset and generate new samples accordingly. Applying them back to physics, the Boltzmann machines are ideal recommender systems to accelerate Monte Carlo simulation of physical systems due to their flexibility and effectiveness. More intriguingly, we show that the generative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from the latent representation of the Boltzmann machines, which learn to mediate complex interactions and identify clusters of the physical system. We demonstrate these findings with concrete examples of the classical Ising model with and without four spin plaquette interactions. Our results endorse a fresh research paradigm where intelligent machines are designed to create or inspire human discovery of innovative algorithms.
- In this work we study the quantitative relation between the recursive teaching dimension (RTD) and the VC dimension (VCD) of concept classes of finite sizes. The RTD of a concept class $\mathcal C \subseteq \{0, 1\}^n$, introduced by Zilles et al. (2011), is a combinatorial complexity measure characterized by the worst-case number of examples necessary to identify a concept in $\mathcal C$ according to the recursive teaching model. For any finite concept class $\mathcal C \subseteq \{0,1\}^n$ with $\mathrm{VCD}(\mathcal C)=d$, Simon & Zilles (2015) posed an open problem $\mathrm{RTD}(\mathcal C) = O(d)$, i.e., is RTD linearly upper bounded by VCD? Previously, the best known result is an exponential upper bound $\mathrm{RTD}(\mathcal C) = O(d \cdot 2^d)$, due to Chen et al. (2016). In this paper, we show a quadratic upper bound: $\mathrm{RTD}(\mathcal C) = O(d^2)$, much closer to an answer to the open problem. We also discuss the challenges in fully solving the problem.
- In this paper we define the overflow problem of a network coding storage system in which the encoding parameter and the storage parameter are mismatched. Through analyses and experiments, we first show the impacts of the overflow problem in a network coding scheme, which not only waste storage spaces, but also degrade coding efficiency. To avoid the overflow problem, we then develop the network coding based secure storage (NCSS) scheme. Thanks to considering both security and storage requirements in encoding procedures and distributed architectures, the NCSS can improve the performance of a cloud storage system from both the aspects of storage cost and coding processing time. We analyze the maximum allowable stored encoded data under the perfect secrecy criterion, and provide the design guidelines for the secure cloud storage system to enhance coding efficiency and achieve the minimal storage cost.
- For units $\delta$ and $\alpha$ in $\F_{p^m}$, the structure of $(\delta+\alpha u^2)$-constacyclic codes of length $p^k$ over $\mathbb{F}_{p^m}+u\mathbb{F}_{p^m}+u^2\mathbb{F}_{p^m}$ is studied and self-dual $(\delta+\alpha u^2)$-constacyclic codes are analyzed.
- Feb 07 2017 cs.CC arXiv:1702.01423v1Feedback shift registers(FSRs) are a fundamental component in electronics and secure communication. An FSR $f$ is said to be reducible if all the output sequences of another FSR $g$ can also be generated by $f$ and the FSR $g$ has less memory than $f$. An FSR is said to be decomposable if it has the same set of output sequences as a cascade connection of two FSRs. It is proved that deciding whether FSRs are irreducible/indecomposable is NP-hard.
- Feb 06 2017 cs.RO arXiv:1702.01075v1Safety Barrier Certificates that ensure collision-free maneuvers for teams of differential flatness-based quadrotors are presented in this paper. Synthesized with control barrier functions, the certificates are used to modify the nominal trajectory in a minimally invasive way to avoid collisions. The proposed collision avoidance strategy complements existing flight control and planning algorithms by providing trajectory modifications with provable safety guarantees. The effectiveness of this strategy is supported both by the theoretical results and experimental validation on a team of five quadrotors.
- Feb 02 2017 cs.CV arXiv:1702.00254v3We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
- Breaking the fronthaul capacity limitations is vital to make cloud radio access network (C-RAN) scalable and practical. One promising way is aggregating several remote radio units (RRUs) as a cluster to share a fronthaul link, so as to enjoy the statistical multiplexing gain brought by the spatial randomness of the traffic. In this letter, a tractable model is proposed to analyze the fronthaul statistical multiplexing gain. We first derive the user blocking probability caused by the limited fronthaul capacity, including its upper and lower bounds. We then obtain the limits of fronthaul statistical multiplexing gain when the cluster size approaches infinity. Analytical results reveal that the user blocking probability decreases exponentially with the average fronthaul capacity per RRU, and the exponent is proportional to the cluster size. Numerical results further show considerable fronthaul statistical multiplexing gain even at a small to medium cluster size.
- Taking into account of both the huge computing power of intruders and untrusted cloud servers, we develop an enhanced secure pseudonym scheme to protect the privacy of mobile cloud data. To face the huge computing power challenge, we develop an unconditionally secure lightweight network coding pseudonym scheme. For the privacy issue of untrusted cloud server, we further design a two tier network coding to decouple the stored mobile cloud data from the owner pseudonyms. Therefore, our proposed network coding based pseudonym scheme can simultaneously defend against attackers from both outside and inside. We implement our proposed two-tier light-weight network coding mechanism in a group location based service (LBS) using untrusted cloud database. Compared to computationally secure Hash-based pseudonym, our proposed scheme is not only unconditionally secure, but also can reduce more than 90 percent of processing time as well as 10 percent of energy consumption.
- One of key 5G scenarios is that device-to-device (D2D) and massive multiple-input multiple-output (MIMO) will be co-existed. However, interference in the uplink D2D underlaid massive MIMO cellular networks needs to be coordinated, due to the vast cellular and D2D transmissions. To this end, this paper introduces a spatially dynamic power control solution for mitigating the cellular-to-D2D and D2D-to-cellular interference. In particular, the proposed D2D power control policy is rather flexible including the special cases of no D2D links or using maximum transmit power. Under the considered power control, an analytical approach is developed to evaluate the spectral efficiency (SE) and energy efficiency (EE) in such networks. Thus, the exact expressions of SE for a cellular user or D2D transmitter are derived, which quantify the impacts of key system parameters such as massive MIMO antennas and D2D density. Moreover, the D2D scale properties are obtained, which provide the sufficient conditions for achieving the anticipated SE. Numerical results corroborate our analysis and show that the proposed power control solution can efficiently mitigate interference between the cellular and D2D tier. The results demonstrate that there exists the optimal D2D density for maximizing the area SE of D2D tier. In addition, the achievable EE of a cellular user can be comparable to that of a D2D user.
- A low-complexity coding scheme is developed to achieve the rate region of maximum likelihood decoding for interference channels. As in the classical rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting, the proposed coding scheme uses superposition of multiple codewords with successive cancellation decoding, which can be implemented using standard point-to-point encoders and decoders. Unlike rate-splitting multiple access, which is not rate-optimal for multiple receivers, the proposed coding scheme transmits codewords over multiple blocks in a staggered manner and recovers them successively over sliding decoding windows, achieving the single-stream optimal rate region as well as the more general Han--Kobayashi inner bound for the two-user interference channel. The feasibility of this scheme in practice is verified by implementing it using commercial channel codes over the two-user Gaussian interference channel.
- Jan 11 2017 cs.CR arXiv:1701.02711v1There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.
- Jan 06 2017 cs.DC arXiv:1701.01170v1For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.
- In this paper, we focus on one of the representative 5G network scenarios, namely multi-tier heterogeneous cellular networks. User association is investigated in order to reduce the down-link co-channel interference. Firstly, in order to analyze the multi-tier heterogeneous cellular networks where the base stations in different tiers usually adopt different transmission powers, we propose a Transmission Power Normalization Model (TPNM), which is able to convert a multi-tier cellular network into a single-tier network, such that all base stations have the same normalized transmission power. Then using TPNM, the signal and interference received at any point in the complex multi-tier environment can be analyzed by considering the same point in the equivalent single-tier cellular network model, thus significantly simplifying the analysis. On this basis, we propose a new user association scheme in heterogeneous cellular networks, where the base station that leads to the smallest interference to other co-channel mobile stations is chosen from a set of candidate base stations that satisfy the quality-of-service (QoS) constraint for an intended mobile station. Numerical results show that the proposed user association scheme is able to significantly reduce the down-link interference compared with existing schemes while maintaining a reasonably good QoS.
- We introduce the notion of information ratio $\text{Ir}(H/G)$ between two (simple, undirected) graphs $G$ and $H$, defined as the supremum of ratios $k/n$ such that there exists a mapping between the strong products $G^k$ to $H^n$ that preserves non-adjacency. Operationally speaking, the information ratio is the maximal number of source symbols per channel use that can be reliably sent over a channel with a confusion graph $H$, where reliability is measured w.r.t. a source confusion graph $G$. Various results are provided, including in particular lower and upper bounds on $\text{Ir}(H/G)$ in terms of different graph properties, inequalities and identities for behavior under strong product and disjoint union, relations to graph cores, and notions of graph criticality. Informally speaking, $\text{Ir}(H/G)$ can be interpreted as a measure of similarity between $G$ and $H$. We make this notion precise by introducing the concept of information equivalence between graphs, a more quantitative version of homomorphic equivalence. We then describe a natural partial ordering over the space of information equivalence classes, and endow it with a suitable metric structure that is contractive under the strong product. Various examples and open problems are discussed.
- Dec 16 2016 cs.SY arXiv:1612.04913v1In this paper, a class of convex feasibility problems (CFPs) are studied for multi-agent systems through local interactions. The objective is to search a feasible solution to the convex inequalities with some set constraints in a distributed manner. The distributed control algorithms, involving subgradient and projection, are proposed for both continuous- and discrete-time systems, respectively. Conditions associated with connectivity of the directed communication graph are given to ensure convergence of the algorithms. It is shown that under mild conditions, the states of all agents reach consensus asymptotically and the consensus state is located in the solution set of the CFP. Simulation examples are presented to demonstrate the effectiveness of the theoretical results.
- Dec 15 2016 cs.NI arXiv:1612.04457v1The orbital angular momentum (OAM) technique provides a new degree of freedom for information transmissions in millimeter wave communications. Considering the spatial distribution characteristics of OAM beams, a new OAM spatial modulation (OAM-SM) millimeter wave communication system is first proposed for future mobile networks. Furthermore, the capacity, average bit error probability and energy efficiency of OAM-SM millimeter wave communication systems are analytically derived for performance analysis. Compared with the conventional multi-input multi-output (MIMO) millimeter wave communication systems, the maximum capacity and energy efficiency of OAM-SM millimeter wave communication systems are improved by 36% and 472.3%, respectively. Moreover, numerical results indicate that the proposed OAM-SM millimeter wave communication systems are more robust to path-loss attenuations than the conventional MIMO millimeter wave communication systems, which makes it suitable for long-range transmissions. Therefore, OAM-SM millimeter wave communication systems provide a great growth space for future mobile networks.
- It is believed that hippocampus functions as a memory allocator in brain, the mechanism of which remains unrevealed. In Valiant's neuroidal model, the hippocampus was described as a randomly connected graph, the computation on which maps input to a set of activated neuroids with stable size. Valiant proposed three requirements for the hippocampal circuit to become a stable memory allocator (SMA): stability, continuity and orthogonality. The functionality of SMA in hippocampus is essential in further computation within cortex, according to Valiant's model. In this paper, we put these requirements for memorization functions into rigorous mathematical formulation and introduce the concept of capacity, based on the probability of erroneous allocation. We prove fundamental limits for the capacity and error probability of SMA, in both data-independent and data-dependent settings. We also establish an example of stable memory allocator that can be implemented via neuroidal circuits. Both theoretical bounds and simulation results show that the neural SMA functions well.
- Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve a-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low doubling dimension, leading to significantly improved performance. In this paper, we address this issue and propose a refined analyses for running time of approximating near neighbors queries via LSH. We characterize dispersion of data using N_b, the number of b*r-near pairs among the data points. Combined with optimal data-oblivious LSH scheme, we get a new query time bound depending on N_b and doubling dimension. For many natural scenarios where points are well-dispersed or lying in a low-doubling-dimension space, our result leads to sharper performance than existing worst-case analysis. This paper not only present first rigorous proof on how LSHs make use of the structure of data points, but also provide important insights into parameter setting in the practice of LSH beyond worst case. Besides, the techniques in our analysis involve a generalized version of sphere packing problem, which might be of some independent interest.
- With the fifth generation (5G) of mobile broadband systems, Radio Resources Management (RRM) will reach unprecedented levels of complexity. To cope with the higher complexity of RRM functionalities, while retaining the fast execution required in 5G, this manuscript presents a lean 5G RRM architecture that capitalizes on the most recent advances in the field of machine learning in combination with the large amount of data readily available in the network from measurements and system observations. The result is a general-purpose learning framework capable of generating algorithms specialized to RRM functionalities directly from data gathered in the network. The potential of this approach is verified in three study cases and future directions on applications of machine learning to RRM are discussed.
- Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.
- Nov 22 2016 cs.LG arXiv:1611.06306v1A novel data representation method of convolutional neural net- work (CNN) is proposed in this paper to represent data of different modalities. We learn a CNN model for the data of each modality to map the data of differ- ent modalities to a common space, and regularize the new representations in the common space by a cross-model relevance matrix. We further impose that the class label of data points can also be predicted from the CNN representa- tions in the common space. The learning problem is modeled as a minimiza- tion problem, which is solved by an augmented Lagrange method (ALM) with updating rules of Alternating direction method of multipliers (ADMM). The experiments over benchmark of sequence data of multiple modalities show its advantage.
- Nov 22 2016 physics.soc-ph cs.SI arXiv:1611.06660v1While urban systems demonstrate high spatial heterogeneity, many urban planning, economic and political decisions heavily rely on a deep understanding of local neighborhood contexts. We show that the structure of 311 Service Requests enables one possible way of building a unique signature of the local urban context, thus being able to serve as a low-cost decision support tool for urban stakeholders. Considering examples of New York City, Boston and Chicago, we demonstrate how 311 Service Requests recorded and categorized by type in each neighborhood can be utilized to generate a meaningful classification of locations across the city, based on distinctive socioeconomic profiles. Moreover, the 311-based classification of urban neighborhoods can present sufficient information to model various socioeconomic features. Finally, we show that these characteristics are capable of predicting future trends in comparative local real estate prices. We demonstrate 311 Service Requests data can be used to monitor and predict socioeconomic performance of urban neighborhoods, allowing urban stakeholders to quantify the impacts of their interventions.
- Nov 22 2016 cs.IR arXiv:1611.06668v1Sequential prediction is a fundamental task for Web applications. Due to the insufficiency of user feedbacks, sequential prediction usually suffers from the cold start problem. There are two kinds of popular approaches based on matrix factorization (MF) and Markov chains (MC) for item prediction. MF methods factorize the user-item matrix to learn general tastes of users. MC methods predict the next behavior based on recent behaviors. However, they have limitations. MF methods can merge additional information to address cold start but could not capture dynamic properties of user's interest, and MC based sequential methods have difficulty in addressing cold start and has a strong Markov assumption that the next state only depends on the last state. In this work, to deal with the cold start problem of sequential prediction, we propose a RNN model adopting visual and textual content of items, which is named as $\mathbf{V}$isual and $\mathbf{T}$extual $\mathbf{R}$ecurrent $\mathbf{N}$eural $\mathbf{N}$etwork ($\mathbf{VT}$-$\mathbf{RNN}$). We can simultaneously learn the sequential latent vectors that dynamically capture the user's interest, as well as content-based representations that contribute to address the cold start. Experiments on two real-world datasets show that our proposed VT-RNN model can effectively generate the personalized ranking list and significantly alleviate the cold start problem.
- Nov 18 2016 cs.CV arXiv:1611.05592v1Video captioning which automatically translates video clips into natural language sentences is a very important task in computer vision. By virtue of recent deep learning technologies, e.g., convolutional neural networks (CNNs) and recurrent neural networks (RNNs), video captioning has made great progress. However, learning an effective mapping from visual sequence space to language space is still a challenging problem. In this paper, we propose a Multimodal Memory Model (M3) to describe videos, which builds a visual and textual shared memory to model the long-term visual-textual dependency and further guide global visual attention on described targets. Specifically, the proposed M3 attaches an external memory to store and retrieve both visual and textual contents by interacting with video and sentence with multiple read and write operations. First, text representation in the Long Short-Term Memory (LSTM) based text decoder is written into the memory, and the memory contents will be read out to guide an attention to select related visual targets. Then, the selected visual information is written into the memory, which will be further read out to the text decoder. To evaluate the proposed model, we perform experiments on two publicly benchmark datasets: MSVD and MSR-VTT. The experimental results demonstrate that our method outperforms the state-of-theart methods in terms of BLEU and METEOR.
- Nov 18 2016 cs.CV arXiv:1611.05588v1Effective image and sentence matching depends on how to well measure their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal Long Short-Term Memory network (sm-LSTM) for instance-aware image and sentence matching. The sm-LSTM includes a multimodal context-modulated attention scheme at each timestep that can selectively attend to a pair of instances of image and sentence, by predicting pairwise instance-aware saliency maps for image and sentence. For selected pairwise instances, their representations are obtained based on the predicted saliency maps, and then compared to measure their local similarity. By similarly measuring multiple local similarities within a few timesteps, the sm-LSTM sequentially aggregates them with hidden states to obtain a final matching score as the desired global similarity. Extensive experiments show that our model can well match image and sentence with complex content, and achieve the state-of-the-art results on two public benchmark datasets.
- Nov 15 2016 cs.DC arXiv:1611.04255v2We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for sub-gradient aggregations, which for large messages quickly dominates overall execution time and limits parallel scalability. To address this problem, we develop a new technique for collective operations, referred to as Linear Pipelining (LP). It is tuned to the message sizes that arise in BSP-SGD, and works effectively on multi-GPU systems. Theoretically, the cost of LP is invariant to $P$, where $P$ is the number of GPUs, while the cost of more conventional Minimum Spanning Tree (MST) scales like $O(\log P)$. LP also demonstrate up to 2x faster bandwidth than Bidirectional Exchange (BE) techniques that are widely adopted by current MPI implementations. We apply these collectives to BSP-SGD, showing that the proposed implementations reduce communication bottlenecks in practice while preserving the attractive convergence properties of BSP-SGD.
- Probabilistic Temporal Tensor Factorization (PTTF) is an effective algorithm to model the temporal tensor data. It leverages a time constraint to capture the evolving properties of tensor data. Nowadays the exploding dataset demands a large scale PTTF analysis, and a parallel solution is critical to accommodate the trend. Whereas, the parallelization of PTTF still remains unexplored. In this paper, we propose a simple yet efficient Parallel Probabilistic Temporal Tensor Factorization, referred to as P$^2$T$^2$F, to provide a scalable PTTF solution. P$^2$T$^2$F is fundamentally disparate from existing parallel tensor factorizations by considering the probabilistic decomposition and the temporal effects of tensor data. It adopts a new tensor data split strategy to subdivide a large tensor into independent sub-tensors, the computation of which is inherently parallel. We train P$^2$T$^2$F with an efficient algorithm of stochastic Alternating Direction Method of Multipliers, and show that the convergence is guaranteed. Experiments on several real-word tensor datasets demonstrate that P$^2$T$^2$F is a highly effective and efficiently scalable algorithm dedicated for large scale probabilistic temporal tensor analysis.
- Nov 11 2016 cs.CV arXiv:1611.03341v1This paper presents an efficient algorithm of high-resolution microwave imaging based on the concept of generalized reflectivity. The contribution made in this paper is two-fold. We introduce the concept of non-parametric generalized reflectivity (GR, for short) as a function of operational frequencies and view angles, etc. The GR extends the conventional Born-based imaging model, i.e., single-scattering model, into that accounting for more realistic interaction between the electromagnetic wavefield and imaged scene. Afterwards, the GR-based microwave imaging is formulated in the convex of sparsity-regularized optimization. Typically, the sparsity-regularized optimization requires the implementation of iterative strategy, which is computationally expensive, especially for large-scale problems. To break this bottleneck, we convert the imaging problem into the problem of physics-driven image processing by introducing a dual transformation. Moreover, this image processing is performed over overlapping patches, which can be efficiently solved in the parallel or distributed manner. In this way, the proposed high-resolution imaging methodology could be applicable to large-scale microwave imaging problems. Selected simulation results are provided to demonstrate the state-of-art performance of proposed methodology.
- Routing in NDN networks must scale in terms of forwarding table size and routing protocol overhead. Hyperbolic routing (HR) presents a potential solution to address the routing scalability problem, because it does not use traditional forwarding tables or exchange routing updates upon changes in network topologies. Although HR has the drawbacks of producing sub-optimal routes or local minima for some destinations, these issues can be mitigated by NDN's intelligent data forwarding plane. However, HR's viability still depends on both the quality of the routes HR provides and the overhead incurred at the forwarding plane due to HR's sub-optimal behavior. We designed a new forwarding strategy called Adaptive Smoothed RTT-based Forwarding (ASF) to mitigate HR's sub-optimal path selection. This paper describes our experimental investigation into the packet delivery delay and overhead under HR as compared with Named-Data Link State Routing (NLSR), which calculates shortest paths. We run emulation experiments using various topologies with different failure scenarios, probing intervals, and maximum number of next hops for a name prefix. Our results show that HR's delay stretch has a median close to 1 and a 95th-percentile around or below 2, which does not grow with the network size. HR's message overhead in dynamic topologies is nearly independent of the network size, while NLSR's overhead grows polynomially at least. These results suggest that HR offers a more scalable routing solution with little impact on the optimality of routing paths.
- Nov 02 2016 cs.CL arXiv:1611.00179v1While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emphdual-NMT. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
- Oct 28 2016 cs.CV arXiv:1610.08619v2The past few years have witnessed increasing research interest on covariance-based feature representation. A variety of methods have been proposed to boost its efficacy, with some recent ones resorting to nonlinear kernel technique. Noting that the essence of this feature representation is to characterise the underlying structure of visual features, this paper argues that an equally, if not more, important approach to boosting its efficacy shall be to improve the quality of this characterisation. Following this idea, we propose to exploit the structure sparsity of visual features in skeletal human action recognition, and compute sparse inverse covariance estimate (SICE) as feature representation. We discuss the advantage of this new representation on dealing with small sample, high dimensionality, and modelling capability. Furthermore, utilising the monotonicity property of SICE, we efficiently generate a hierarchy of SICE matrices to characterise the structure of visual features at different sparsity levels, and two discriminative learning algorithms are then developed to adaptively integrate them to perform recognition. As demonstrated by extensive experiments, the proposed representation leads to significantly improved recognition performance over the state-of-the-art comparable methods. In particular, as a method fully based on linear technique, it is comparable or even better than those employing nonlinear kernel technique. This result well demonstrates the value of exploiting structure sparsity for covariance-based feature representation.
- The k-means algorithm is one of the most common clustering algorithms and widely used in data mining and pattern recognition. The increasing computational requirement of big data applications makes hardware acceleration for the k-means algorithm necessary. In this paper, a simplified Map-Reduce architecture is proposed to implement the k-means algorithm on an FPGA. Algorithmic segmentation, data path elaboration and automatic control are applied to optimize the architecture for high performance. In addition, high level synthesis technique is utilized to reduce development cycles and complexity. For a single iteration in the k-means algorithm, a throughput of 28.74 Gbps is achieved. The performance shows at least 3.93x speedup compared with four representative existing FPGA-based implementations and can satisfy the demand of big data applications.
- We propose a novel probabilistic dimensionality reduction framework that can naturally integrate the generative model and the locality information of data. Based on this framework, we present a new model, which is able to learn a smooth skeleton of embedding points in a low-dimensional space from high-dimensional noisy data. The formulation of the new model can be equivalently interpreted as two coupled learning problem, i.e., structure learning and the learning of projection matrix. This interpretation motivates the learning of the embedding points that can directly form an explicit graph structure. We develop a new method to learn the embedding points that form a spanning tree, which is further extended to obtain a discriminative and compact feature representation for clustering problems. Unlike traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. This can greatly facilitate data visualization and scientific discovery in downstream analysis. Extensive experiments are performed that demonstrate that the proposed framework is able to obtain discriminative feature representations, and correctly recover the intrinsic structures of various real-world datasets.
- Oct 06 2016 cs.PF arXiv:1610.01267v3Despite computation becomes much complex on data with an unprecedented scale, we argue computers or smart devices should and will consistently provide information and knowledge to human being in the order of a few tens milliseconds. We coin a new term 10-millisecond computing to call attention to this class of workloads. 10-millisecond computing raises many challenges for both software and hardware stacks. In this paper, using a typical workload-memcached on a 40-core server (a main-stream server in near future), we quantitatively measure 10-ms computing's challenges to conventional operating systems. For better communication, we propose a simple metric-outlier proportion to measure quality of service: for N completed requests or jobs, if M jobs or requests' latencies exceed the outlier threshold t, the outlier proportion is M/N . For a 1K-scale system running Linux (version 2.6.32), LXC (version 0.7.5) or XEN (version 4.0.0), respectively, we surprisingly find that so as to reduce the service outlier proportion to 10% (10% users will feel QoS degradation), the outlier proportion of a single server has to be reduced by 871X, 2372X, 2372X accordingly. Also, we discuss the possible design spaces of 10-ms computing systems from perspectives of datacenter architectures, networking, OS and scheduling, and benchmarking.
- Oct 05 2016 cs.CV arXiv:1610.01119v2Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, and background environment, thus leading to large intra-class variations. In addition, with the increasing number of scene categories, label ambiguity has become another crucial issue in large-scale classification. This paper focuses on large-scale scene recognition and makes two major contributions to tackle these issues. First, we propose a multi-resolution CNN architecture that captures visual content and structure at multiple levels. The multi-resolution CNNs are composed of coarse resolution CNNs and fine resolution CNNs, which are complementary to each other. Second, we design two knowledge guided disambiguation techniques to deal with the problem of label ambiguity. (i) We exploit the knowledge from the confusion matrix computed on validation data to merge ambiguous classes into a super category. (ii) We utilize the knowledge of extra networks to produce a soft label for each image. Then the super categories or soft labels are employed to guide CNN training on the Places2. We conduct extensive experiments on three large-scale image datasets (ImageNet, Places, and Places2), demonstrating the effectiveness of our approach. Furthermore, our method takes part in two major scene recognition challenges, and achieves the second place at the Places2 challenge in ILSVRC 2015, and the first place at the LSUN challenge in CVPR 2016. Finally, we directly test the learned representations on other scene benchmarks, and obtain the new state-of-the-art results on the MIT Indoor67 (86.7\%) and SUN397 (72.0\%). We release the code and models at~\urlhttps://github.com/wanglimin/MRCNN-Scene-Recognition.
- In this paper, we study one-Lee weight and two-Lee weight codes over $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$, where $u^{2}=0$. Some properties of one-Lee weight $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-additive codes are given, and a complete classification of one-Lee weight $\mathbb{Z}_2\mathbb{Z}_2[u]$-additive formally self-dual codes is obtained. The structure of two-Lee weight projective $\mathbb{Z}_2\mathbb{Z}_2[u]$ codes are determined. Some optimal binary linear codes are obtained directly from one-Lee weight and two-Lee weight $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-additive codes via the extended Gray map.
- With the rapid growth of social media, rumors are also spreading widely on social media and bring harm to people's daily life. Nowadays, information credibility evaluation has drawn attention from academic and industrial communities. Current methods mainly focus on feature engineering and achieve some success. However, feature engineering based methods require a lot of labor and cannot fully reveal the underlying relations among data. In our viewpoint, the key elements of user behaviors for evaluating credibility are concluded as "who", "what", "when", and "how". These existing methods cannot model the correlation among different key elements during the spreading of microblogs. In this paper, we propose a novel representation learning method, Information Credibility Evaluation (ICE), to learn representations of information credibility on social media. In ICE, latent representations are learnt for modeling user credibility, behavior types, temporal properties, and comment attitudes. The aggregation of these factors in the microblog spreading process yields the representation of a user's behavior, and the aggregation of these dynamic representations generates the credibility representation of an event spreading on social media. Moreover, a pairwise learning method is applied to maximize the credibility difference between rumors and non-rumors. To evaluate the performance of ICE, we conduct experiments on a Sina Weibo data set, and the experimental results show that our ICE model outperforms the state-of-the-art methods.
- To achieve a low computational cost when performing online metric learning for large-scale data, we present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets to approximate the representation ability of whole original triplets obtained by batch-manner methods. Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads to a low space (i.e., $O(d)$) and time (i.e., $O(d^2)$) complexity, where $d$ is the feature dimensionality. In addition, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real case the first several samples come from the same class (i.e., cold start problem). In the experiments, we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate the proposed methods on different sample number, different feature dimensionalities and different feature extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML under the cold start setting is experimentally verified.
- Ubiquitous information service converged by different types of heterogeneous networks is one of fundamental functions for smart cities. Considering the deployment of 5G ultra-dense wireless networks, the 5G converged cell-less communication networks are proposed to support mobile terminals in smart cities. To break obstacles of heterogeneous wireless networks, the 5G converged cell-less communication network is vertically converged in different tiers of heterogeneous wireless networks and horizontally converged in celled architectures of base stations/access points. Moreover, the software defined network controllers are configured to manage the traffic scheduling and resource allocation in 5G converged cell-less communication networks. Simulation results indicate the coverage probability and the energy saving at both base stations and mobile terminals are improved by the cooperative grouping scheme in 5G converged cell-less communication networks.
- Sep 20 2016 cs.SY arXiv:1609.05800v2A time-invariant, linear, distributed observer is described for estimating the state of an $m>0$ channel, $n$-dimensional continuous-time linear system of the form $ \dot{x} = Ax,\ y_i = C_i x,\ i \in \{1,2,\cdots, m\}$. The state $x$ is simultaneously estimated by $m$ agents assuming each agent $i$ senses $y_i$ and receives the state $z_j$ of each of its neighbors' estimators. Neighbor relations are characterized by a constant directed graph $\mathbb{N}$ whose vertices correspond to agents and whose arcs depict neighbor relations. The overall distributed observer consists of $m$ linear estimators, one for each agent; $m-1$ of the estimators are of dimension $n$ and one estimator is of dimension $n+m-1$. Using results from classical decentralized control theory, it is shown that subject to the assumptions that (i) none of the $C_i$ are zero, (ii) the neighbor graph $\mathbb{N}$ is strongly connected, (iii) the system whose state is to be estimated is jointly observable, and nothing more, it is possible to freely assign the spectrum of the overall distributed observer.
- Since sequential information plays an important role in modeling user behaviors, various sequential recommendation methods have been proposed. Methods based on Markov assumption are widely-used, but independently combine several most recent components. Recently, Recurrent Neural Networks (RNN) based methods have been successfully applied in several sequential modeling tasks. However, for real-world applications, these methods have difficulty in modeling the contextual information, which has been proved to be very important for behavior modeling. In this paper, we propose a novel model, named Context-Aware Recurrent Neural Networks (CA-RNN). Instead of using the constant input matrix and transition matrix in conventional RNN models, CA-RNN employs adaptive context-specific input matrices and adaptive context-specific transition matrices. The adaptive context-specific input matrices capture external situations where user behaviors happen, such as time, location, weather and so on. And the adaptive context-specific transition matrices capture how lengths of time intervals between adjacent behaviors in historical sequences affect the transition of global sequential features. Experimental results show that the proposed CA-RNN model yields significant improvements over state-of-the-art sequential recommendation methods and context-aware recommendation methods on two public datasets, i.e., the Taobao dataset and the Movielens-1M dataset.
- Sep 16 2016 cs.RO arXiv:1609.04730v1This paper describes the Robotarium -- a remotely accessible, multi-robot research facility. The impetus behind the Robotarium is that multi-robot testbeds constitute an integral and essential part of the multi-robot research cycle, yet they are expensive, complex, and time-consuming to develop, operate, and maintain. These resource constraints, in turn, limit access for large groups of researchers and students, which is what the Robotarium is remedying by providing users with remote access to a state-of-the-art multi-robot test facility. This paper details the design and operation of the Robotarium and discusses the considerations one must take when making complex hardware remotely accessible. In particular, safety must be built into the system already at the design phase without overly constraining what coordinated control programs users can upload and execute, which calls for minimally invasive safety routines with provable performance guarantees.
- The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For this third edition, iTWIST'16 gathered about 50 international participants and features 8 invited talks, 12 oral presentations, and 12 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing (e.g., optics, computer vision, genomics, biomedical, digital communication, channel estimation, astronomy); Application of sparse models in non-convex/non-linear inverse problems (e.g., phase retrieval, blind deconvolution, self calibration); Approximate probabilistic inference for sparse problems; Sparse machine learning and inference; "Blind" inverse problems and dictionary learning; Optimization for sparse modelling; Information theory, geometry and randomness; Sparsity? What's next? (Discrete-valued signals; Union of low-dimensional spaces, Cosparsity, mixed/group norm, model-based, low-complexity models, ...); Matrix/manifold sensing/processing (graph, low-rank approximation, ...); Complexity/accuracy tradeoffs in numerical methods/optimization; Electronic/optical compressive sensors (hardware).
- Sep 15 2016 cs.DS arXiv:1609.04029v4Due to hybridization events in evolution, studying two different genes of a set of species may yield two related but different phylogenetic trees for the set of species. In this case, we want to measure the dissimilarity of the two trees. The rooted subtree prune and regraft (rSPR) distance of the two trees has been used for this purpose. The problem of computing the rSPR distance of two given trees has many applications but is unfortunately NP-hard. The previously best approximation algorithm for rSPR distance achieves a ratio of 2.5 and it was open whether a better approximation algorithm for rSPR distance exists. In this paper, we answer this question in the affirmative by presenting a cubic-time approximation algorithm for rSPR distance that achieves a ratio of 2. Our algorithm is based on the new notion of key and a number of new structural lemmas. The algorithm is fairly simple and the proof of its correctness is intuitively understandable albeit complicated.
- Sep 13 2016 cs.SY arXiv:1609.03161v1In this paper, a distributed subgradient-based algorithm is proposed for continuous-time multi-agent systems to search a feasible solution to convex inequalities. The algorithm involves each agent achieving a state constrained by its own inequalities while exchanging local information with other agents under a time-varying directed communication graph. With the validity of a mild connectivity condition associated with the communication graph, it is shown that all agents will reach agreement asymptotically and the consensus state is in the solution set of the inequalities. Furthermore, the method is also extended to solving the distributed optimization problem of minimizing the sum of local objective functions subject to convex inequalities. A simulation example is presented to demonstrate the effectiveness of the theoretical results.