results for au:Zhou_Y in:cs

- Aug 22 2017 cs.CV arXiv:1708.06023v1The de facto algorithm for facial landmark estimation involves running a face detector with a subsequent deformable model fitting on the bounding box. This encompasses two basic problems: i) the detection and deformable fitting steps are performed independently, while the detector might not provide best-suited initialisation for the fitting step, ii) the face appearance varies hugely across different poses, which makes the deformable face fitting very challenging and thus distinct models have to be used (\eg, one for profile and one for frontal faces). In this work, we propose the first, to the best of our knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localisation tasks. Existing joint face detection and landmark localisation methods focus only on a very small set of landmarks. By contrast, our method can detect and align a large number of landmarks for semi-frontal (68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a plethora of datasets including standard static image datasets such as IBUG, 300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile faces. Significant improvement over state-of-the-art methods on deformable face tracking is witnessed on 300VW benchmark. We also demonstrate state-of-the-art results for face detection on FDDB and MALF datasets.
- This paper presents a hybrid approach called frequent pattern based search that combines data mining and optimization. The proposed method uses a data mining procedure to mine frequent patterns from a set of high-quality solutions collected from previous search, and the mined frequent patterns are then employed to build starting solutions that are improved by an optimization procedure. After presenting the general approach and its composing ingredients, we illustrate its application to solve the well-known and challenging quadratic assignment problem. Computational results on the 21 hardest benchmark instances show that the proposed approach competes favorably with state-of-the-art algorithms both in terms of solution quality and computing time.
- Thompson sampling has impressive empirical performance for many multi-armed bandit problems. But current algorithms for Thompson sampling only work for the case of conjugate priors since these algorithms require to infer the posterior, which is often computationally intractable when the prior is not conjugate. In this paper, we propose a novel algorithm for Thompson sampling which only requires to draw samples from a tractable distribution, so our algorithm is efficient even when the prior is non-conjugate. To do this, we reformulate Thompson sampling as an optimization problem via the Gumbel-Max trick. After that we construct a set of random variables and our goal is to identify the one with highest mean. Finally, we solve it with techniques in best arm identification.
- On demand single photon emitters (SPEs) play a key role across a broad range of quantum technologies, including quantum computation, quantum simulation, quantum metrology and quantum communications. In quantum networks and quantum key distribution protocols, where photons are employed as flying qubits, telecom wavelength operation is preferred due to the reduced fibre loss. However, despite the tremendous efforts to develop various triggered SPE platforms, a robust source of triggered SPEs operating at room temperature and the telecom wavelength is still missing. Here we report a triggered, optically stable, room temperature solid state SPE operating at telecom wavelengths. The emitters exhibit high photon purity (~ 5% multiphoton events) and a record-high brightness of ~ 1.5 MHz. The emission is attributed to localized defects in a gallium nitride (GaN) crystal. The high performance SPEs embedded in a technologically mature semiconductor are promising for on-chip quantum simulators and practical quantum communication technologies.
- Aug 07 2017 cs.DC arXiv:1708.01401v1Background: Spot pricing is considered as a significant supplement for building a full-fledged market economy for the Cloud ecosystem. However, it seems that both providers and consumers are still hesitating to enter the Cloud spot market. The relevant academic community also has conflicting opinions about Cloud spot pricing in terms of revenue generation. Aim: This work aims to systematically identify, assess, synthesize and report the published evidence in favor of or against spot-price scheme compared with fixed-price scheme of Cloud computing, so as to help relieve the aforementioned conflict. Method: We employed the systematic literature review (SLR) method to collect and investigate the empirical studies of Cloud spot pricing indexed by major electronic libraries. Results: This SLR identified 61 primary studies that either delivered discussions or conducted experiments to perform comparison between spot pricing and fixed pricing in the Cloud domain. The reported benefits and limitations were summarized to facilitate cost-benefit analysis of being a Cloud spot pricing player, while four types of theories were distinguished to help both researchers and practitioners better understand the Cloud spot market. Conclusions: This SLR shows that the academic community strongly advocates the emerging Cloud spot market. Although there is still a lack of practical and easily deployable market-driven mechanisms, the overall findings of our work indicate that spot pricing plays a promising role in the sustainability of Cloud resource exploitation.
- Aug 03 2017 cs.SI arXiv:1708.00759v1The difficulty of getting medical treatment is one of major livelihood issues in China. Since patients lack prior knowledge about the spatial distribution and the capacity of hospitals, some hospitals have abnormally high or sporadic population densities. This paper presents a new model for estimating the spatiotemporal population density in each hospital based on location-based service (LBS) big data, which would be beneficial to guiding and dispersing outpatients. To improve the estimation accuracy, several approaches are proposed to denoise the LBS data and classify people by detecting their various behaviors. In addition, a long short-term memory (LSTM) based deep learning is presented to predict the trend of population density. By using Baidu large-scale LBS logs database, we apply the proposed model to 113 hospitals in Beijing, P. R. China, and constructed an online hospital recommendation system which can provide users with a hospital rank list basing the real-time population density information and the hospitals' basic information such as hospitals' levels and their distances. We also mine several interesting patterns from these LBS logs by using our proposed system.
- Compared to basic fork-join queues, a job in (n, k) fork-join queues only needs its k out of all n sub-tasks to be finished. Since (n, k) fork-join queues are prevalent in modern popular distributed systems such as Cassandra, MapReduce and Spark, estimating the sojourn time of such queues is critical for the performance measurement and resource plan of these systems. However, the estimating keeps to be a well-known open challenge for years, and only rough bounds for a partial range of load factors have been given. In this paper, we developed a close-form linear transformation technique for identical random variables: An order statistic can be represented by a linear combination of maxima. This brand-new technique is then used to transform the sojourn time of non-purging (n, k) fork-join queues into a linear combination of the sojourn times of basic (k, k)~(n, n) fork-join queues. Consequently, existing approximations for basic fork-join queues can be bridged to the approximations for non-purging (n, k) fork-join queues. The uncovered approximations are then used to improve the upper bounds for purging (n, k) fork-join queues. Simulation experiments show that this linear transformation approach is practiced well for moderate n and relatively large k.
- Jul 19 2017 cs.LG arXiv:1707.05363v2We present a real-time method for synthesizing highly complex human motions using a novel LSTM network training regime we call the auto-conditioned LSTM (acLSTM). Recently, researchers have attempted to synthesize new motion by using autoregressive techniques, but existing methods tend to freeze or diverge after a couple of seconds due to an accumulation of errors that are fed back into the network. Furthermore, such methods have only been shown to be reliable for relatively simple human motions, such as walking or running. In contrast, our approach can synthesize arbitrary motions with highly complex styles, including dances or martial arts in addition to locomotion. The acLSTM is able to accomplish this by explicitly accommodating for autoregressive noise accumulation during training. Furthermore, the structure of the acLSTM is modular and compatible with any other recurrent network architecture, and is usable for tasks other than motion. Our work is the first to our knowledge that demonstrates the ability to generate over 18,000 continuous frames (300 seconds) of new complex human motion w.r.t. different styles.
- Jul 17 2017 cs.SI arXiv:1707.04284v1Ridesharing can save energy, reduce pollution, and alleviate congestion. Ridesharing systems allow peer-to-peer ride matching through mobile-phone applications. In many systems people have little information about the driver and co-riders. Trust heavily influences the usage of a ridesharing system, however, work in this area is limited. In this paper, we investigate the role of social media in shaping trust and delineate factors that influence the decisions of trust placed on a driver or co-rider. We design a ridesharing system where a rider can see the Instagram profile and 10 recent pictures of a co-rider and ask them to make judgments on six trust related questions. We discovered three important factors that shape trust forming: \textitsocial proof, \textitsocial approval, and \textitself-disclosure. These three factors account for 40\%, 33\%, and 13\% of the variance, and subsequent factors extracted accounted for substantially less variance. Using these three factors, we successfully build a trust prediction model with F-measure of 87.4\%, an improvement from 78.6\% on using all factors collected.
- Device-to-Device (D2D) discovery is a key enabler of D2D communications for the direct exchange of local area traffic between proximity users (UEs) to improve spectral efficiency. The direct D2D discovery relies on the capabilities of the D2D UEs to autonomously indicate their presence to proximity D2D UEs. Despite its potential of reducing energy and signalling burden, the direct D2D discovery has not drawn adequate attention. In this paper, we propose a direct D2D discovery scheme based on the random backoff procedure, where D2D UEs randomly choose a backoff interval and retransmit a beacon. Compared with existing schemes, the performance of the proposed scheme can be significantly enhanced in terms of the discovery probability and the discovery delay. Several useful guidelines for its design are proposed based on our analysis. Finally, numerical results provide valuable insights on the performance tradeoff inherent to the proposed D2D discovery scheme.
- Jul 13 2017 cs.SE arXiv:1707.03750v1Deep learning applications are computation-intensive and often employ GPU as the underlying computing devices. Deep learning frameworks provide powerful programming interfaces, but the gap between source codes and practical GPU operations make it difficult to analyze the performance of deep learning applications. In this paper, through examing the features of GPU traces and deep learning applications, we use the suffix tree structure to extract the repeated patten in GPU traces. Performance analysis graphs can be generated from the preprocessed GPU traces. We further present \textttDeepProf, a novel tool to automatically process GPU traces and generate performance analysis reports for deep learning applications. Empirical study verifies the effectiveness of \textttDeepProf in performance analysis and diagnosis. We also find out some interesting properties of Tensorflow, which can be used to guide the deep learning system setup.
- Virtual reality simulation is becoming popular as a training platform in surgical education. However, one important aspect of simulation-based surgical training that has not received much attention is the provision of automated real-time performance feedback to support the learning process. Performance feedback is actionable advice that improves novice behaviour. In simulation, automated feedback is typically extracted from prediction models trained using data mining techniques. Existing techniques suffer from either low effectiveness or low efficiency resulting in their inability to be used in real-time. In this paper, we propose a random forest based method that finds a balance between effectiveness and efficiency. Experimental results in a temporal bone surgery simulation show that the proposed method is able to extract highly effective feedback at a high level of efficiency.
- Jun 30 2017 cs.NI arXiv:1706.09541v1In order to better accommodate the dramatically increasing demand for data caching and computing services, storage and computation capabilities should be endowed to some of the intermediate nodes within the network. In this paper, we design a novel virtualized heterogeneous networks framework aiming at enabling content caching and computing. With the virtualization of the whole system, the communication, computing and caching resources can be shared among all users associated with different virtual service providers. We formulate the virtual resource allocation strategy as a joint optimization problem, where the gains of not only virtualization but also caching and computing are taken into consideration in the proposed architecture. In addition, a distributed algorithm based on alternating direction method of multipliers is adopted to solve the formulated problem, in order to reduce the computational complexity and signaling overhead. Finally, extensive simulations are presented to show the effectiveness of the proposed scheme under different system parameters.
- Jun 23 2017 cs.CV arXiv:1706.07346v1Automatic segmentation of an organ and its cystic region is a prerequisite of computer-aided diagnosis. In this paper, we focus on pancreatic cyst segmentation in abdominal CT scan. This task is important and very useful in clinical practice yet challenging due to the low contrast in boundary, the variability in location, shape and the different stages of the pancreatic cancer. Inspired by the high relevance between the location of a pancreas and its cystic region, we introduce extra deep supervision into the segmentation network, so that cyst segmentation can be improved with the help of relatively easier pancreas segmentation. Under a reasonable transformation function, our approach can be factorized into two stages, and each stage can be efficiently optimized via gradient back-propagation throughout the deep networks. We collect a new dataset with 131 pathological samples, which, to the best of our knowledge, is the largest set for pancreatic cyst segmentation. Without human assistance, our approach reports a 63.44% average accuracy, measured by the Dice-Sørensen coefficient (DSC), which is higher than the number (60.46%) without deep supervision.
- Jun 20 2017 cs.DL arXiv:1706.06087v1Precision medicine and health requires the characterization and phenotyping of biological systems and patient datasets using a variety of data formats. This scenario mandates the centralization of various tools and resources in a unified platform to render them Findable, Accessible, Interoperable, and Reusable (FAIR Principles). Leveraging these principles, Aztec provides the scientific community with a new platform that promotes a long-term, sustainable ecosystem of biomedical research software. Aztec is available at https://aztec.bio and its source code is hosted at https://github.com/BD2K-Aztec.
- The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.
- Jun 06 2017 cs.LG arXiv:1706.01026v1We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $n$-armed bandit game. This problem has a wide range of applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our goal is to develop a PAC algorithm, which, with probability at least $1-\delta$, identifies a set of $K$ arms with the aggregate regret at most $\epsilon$. The notion of aggregate regret for multiple-arm identification was first introduced in \citeZhou:14 , which is defined as the difference of the averaged expected rewards between the selected set of arms and the best $K$ arms. In contrast to \citeZhou:14 that only provides instance-independent sample complexity, we introduce a new hardness parameter for characterizing the difficulty of any given instance. We further develop two algorithms and establish the corresponding sample complexity in terms of this hardness parameter. The derived sample complexity can be significantly smaller than state-of-the-art results for a large class of instances and matches the instance-independent lower bound upto a $\log(\epsilon^{-1})$ factor in the worst case. We also prove a lower bound result showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.
- May 26 2017 cs.CL arXiv:1705.08947v1We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets. We show that a single neural TTS system can learn hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality synthesis and preserving the speaker identities almost perfectly.
- May 23 2017 cs.AI arXiv:1705.07339v1The Maximum Balanced Biclique Problem is a well-known graph model with relevant applications in diverse domains. This paper introduces a novel algorithm, which combines an effective constraint-based tabu search procedure and two dedicated graph reduction techniques. We verify the effectiveness of the algorithm on 30 classical random benchmark graphs and 25 very large real-life sparse graphs from the popular Koblenz Network Collection (KONECT). The results show that the algorithm improves the best-known results (new lower bounds) for 10 classical benchmarks and obtains the optimal solutions for 14 KONECT instances.
- May 23 2017 cs.DM arXiv:1705.07338v1The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NP-hard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP. Firstly, we introduce an Upper Bound Propagation procedure to pre-compute an upper bound involving each vertex. Then we extend an existing branch-and-bound algorithm by integrating the pre-computed upper bounds. We also present a set of new valid inequalities induced from the upper bounds to tighten an existing mathematical formulation for MBBP. Lastly, we investigate another exact algorithm scheme which enumerates a subset of balanced bicliques based on our upper bounds. Experiments show that compared to existing approaches, the proposed algorithms and formulations are more efficient in solving a set of random graphs and large real-life instances.
- May 23 2017 cs.CL arXiv:1705.07371v1In this paper, we reformulated the spell correction problem as a machine translation task under the encoder-decoder framework. This reformulation enabled us to use a single model for solving the problem that is traditionally formulated as learning a language model and an error model. This model employs multi-layer recurrent neural networks as an encoder and a decoder. We demonstrate the effectiveness of this model using an internal dataset, where the training data is automatically obtained from user logs. The model offers competitive performance as compared to the state of the art methods but does not require any feature engineering nor hand tuning between models.
- May 16 2017 cs.LG arXiv:1705.04925v1In many modern machine learning applications, structures of underlying mathematical models often yield nonconvex optimization problems. Due to the intractability of nonconvexity, there is a rising need to develop efficient methods for solving general nonconvex problems with certain performance guarantee. In this work, we investigate the accelerated proximal gradient method for nonconvex programming (APGnc). The method compares between a usual proximal gradient step and a linear extrapolation step, and accepts the one that has a lower function value to achieve a monotonic decrease. In specific, under a general nonsmooth and nonconvex setting, we provide a rigorous argument to show that the limit points of the sequence generated by APGnc are critical points of the objective function. Then, by exploiting the Kurdyka-Łojasiewicz (\KL) property for a broad class of functions, we establish the linear and sub-linear convergence rates of the function value sequence generated by APGnc. We further propose a stochastic variance reduced APGnc (SVRG-APGnc), and establish its linear convergence under a special case of the \KL property. We also extend the analysis to the inexact version of these methods and develop an adaptive momentum strategy that improves the numerical performance.
- May 12 2017 cs.AI arXiv:1705.04119v1Critical node problems involve identifying a subset of critical nodes from an undirected graph whose removal results in optimizing a pre-defined measure over the residual graph. As useful models for a variety of practical applications, these problems are computational challenging. In this paper, we study the classic critical node problem (CNP) and introduce an effective memetic algorithm for solving CNP. The proposed algorithm combines a double backbone-based crossover operator (to generate promising offspring solutions), a component-based neighborhood search procedure (to find high-quality local optima) and a rank-based pool updating strategy (to guarantee a healthy population). Specially, the component-based neighborhood search integrates two key techniques, i.e., two-phase node exchange strategy and node weighting scheme. The double backbone-based crossover extends the idea of general backbone-based crossovers. Extensive evaluations on 42 synthetic and real-world benchmark instances show that the proposed algorithm discovers 21 new upper bounds and matches 18 previous best-known upper bounds. We also demonstrate the relevance of our algorithm for effectively solving a variant of the classic CNP, called the cardinality-constrained critical node problem. Finally, we investigate the usefulness of each key algorithmic component.
- Apr 27 2017 cs.AI arXiv:1704.07950v1In this extended abstract, we propose Structured Production Systems (SPS), which extend traditional production systems with well-formed syntactic structures. Due to the richness of structures, structured production systems significantly enhance the expressive power as well as the flexibility of production systems, for instance, to handle uncertainty. We show that different rule application strategies can be reduced into the basic one by utilizing structures. Also, many fundamental approaches in computer science, including automata, grammar and logic, can be captured by structured production systems.
- Persistent spread measurement is to count the number of distinct elements that persist in each network flow for predefined time periods. It has many practical applications, including detecting long-term stealthy network activities in the background of normal-user activities, such as stealthy DDoS attack, stealthy network scan, or faked network trend, which cannot be detected by traditional flow cardinality measurement. With big network data, one challenge is to measure the persistent spreads of a massive number of flows without incurring too much memory overhead as such measurement may be performed at the line speed by network processors with fast but small on-chip memory. We propose a highly compact Virtual Intersection HyperLogLog (VI-HLL) architecture for this purpose. It achieves far better memory efficiency than the best prior work of V-Bitmap, and in the meantime drastically extends the measurement range. Theoretical analysis and extensive experiments demonstrate that VI-HLL provides good measurement accuracy even in very tight memory space of less than 1 bit per flow.
- Apr 11 2017 cs.IR arXiv:1704.02552v1Using only implicit data, many recommender systems fail in general to provide a precise set of recommendations to users with limited interaction history. This issue is regarded as the "Cold Start" problem and is typically resolved by switching to content-based approaches where extra costly information is required. In this paper, we use a dimensionality reduction algorithm, Word2Vec (W2V), originally applied in Natural Language Processing problems under the framework of Collaborative Filtering (CF) to tackle the "Cold Start" problem using only implicit data. This combined method is named Embedded Collaborative Filtering (ECF). An experiment is conducted to determine the performance of ECF on two different implicit data sets. We show that the ECF approach outperforms other popular and state-of-the-art approaches in "Cold Start" scenarios.
- Mar 28 2017 cs.CV arXiv:1703.08603v3It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, generally exist for deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations. Based on this idea, we propose a novel algorithm named Dense Adversary Generation (DAG), which generates a large family of adversarial examples, and applies to a wide range of state-of-the-art deep networks for segmentation and detection. We also find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transferability across networks with the same architecture is more significant than in other cases. Besides, summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of black-box adversarial attack.
- In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate $O\left(\frac{1}{\varepsilon^2}\right)$ improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate $O\left(\frac{1}{\varepsilon^4}\right)$.
- Data deduplication is able to effectively identify and eliminate redundant data and only maintain a single copy of files and chunks. Hence, it is widely used in cloud storage systems to save storage space and network bandwidth. However, the occurrence of deduplication can be easily identified by monitoring and analyzing network traffic, which leads to the risk of user privacy leakage. The attacker can carry out a very dangerous side channel attack, i.e., learn-the-remaining-information (LRI) attack, to reveal users' privacy information by exploiting the side channel of network traffic in deduplication. Existing work addresses the LRI attack at the cost of the high bandwidth efficiency of deduplication. In order to address this problem, we propose a simple yet effective scheme, called randomized redundant chunk scheme (RRCS), to significantly mitigate the risk of the LRI attack while maintaining the high bandwidth efficiency of deduplication. The basic idea behind RRCS is to add randomized redundant chunks to mix up the real deduplication states of files used for the LRI attack, which effectively obfuscates the view of the attacker, who attempts to exploit the side channel of network traffic for the LRI attack. Our security analysis shows that RRCS could significantly mitigate the risk of the LRI attack. We implement the RRCS prototype and evaluate it by using three large-scale real-world datasets. Experimental results demonstrate the efficiency and efficacy of RRCS.
- Simulation-based training (SBT) is gaining popularity as a low-cost and convenient training technique in a vast range of applications. However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided automatically in real-time during training. It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in real-time. In this paper, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via back-propagation. We empirically show that the proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for real-time feedback generation in SBT.
- Maximum rank-distance (MRD) codes are extremal codes in the space of $m\times n$ matrices over a finite field, equipped with the rank metric. Up to generalizations, the classical examples of such codes were constructed in the 1970s and are today known as Gabidulin codes. Motivated by several recent approaches to construct MRD codes that are inequivalent to Gabidulin codes, we study the equivalence issue for Gabidulin codes themselves. This shows in particular that the family of Gabidulin codes already contains a huge subset of MRD codes that are pairwise inequivalent, provided that $2\le m\le n-2$.
- Feb 09 2017 cs.CV arXiv:1702.02512v1This paper presents a robust and efficient semi-dense visual odometry solution for RGB-D cameras. The core of our method is a 2D-3D ICP pipeline which estimates the pose of the sensor by registering the projection of a 3D semi-dense map of the reference frame with the 2D semi-dense region extracted in the current frame. The processing is speeded up by efficiently implemented approximate nearest neighbour fields under the Euclidean distance criterion, which permits the use of compact Gauss-Newton updates in the optimization. The registration is formulated as a maximum a posterior problem to deal with outliers and sensor noises, and consequently the equivalent weighted least squares problem is solved by iteratively reweighted least squares method. A variety of robust weight functions are tested and the optimum is determined based on the characteristics of the sensor model. Extensive evaluation on publicly available RGB-D datasets shows that the proposed method predominantly outperforms existing state-of-the-art methods.
- Transportation mode detection with personal devices has been investigated for over ten years due to its importance in monitoring ones' activities, understanding human mobility, and assisting traffic management. However, two main limitations are still preventing it from large-scale deployments: high power consumption, and the lack of high-volume and diverse labeled data. In order to reduce power consumption, existing approaches are sampling using fewer sensors and with lower frequency, which however lead to a lower accuracy. A common way to obtain labeled data is recording the ground truth while collecting data, but such method cannot apply to large-scale deployment due to its inefficiency. To address these issues, we adopt a new low-frequency sampling manner with a hierarchical transportation mode identification algorithm and propose an offline data labeling approach with its manual and automatic implementations. Through a real-world large-scale experiment and comparison with related works, our sampling manner and algorithm are proved to consume much less energy while achieving a competitive accuracy around 85%. The new offline data labeling approach is also validated to be efficient and effective in providing ground truth for model training and testing.
- We present a new class of decentralized first-order methods for nonsmooth and stochastic optimization problems defined over multiagent networks. Considering that communication is a major bottleneck in decentralized optimization, our main goal in this paper is to develop algorithmic frameworks which can significantly reduce the number of inter-node communications. We first propose a decentralized primal-dual method which can find an $\epsilon$-solution both in terms of functional optimality gap and feasibility residual in $O(1/\epsilon)$ inter-node communication rounds when the objective functions are convex and the local primal subproblems are solved exactly. Our major contribution is to present a new class of decentralized primal-dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions. By employing DCS, agents can still find an $\epsilon$-solution in $O(1/\epsilon)$ (resp., $O(1/\sqrt{\epsilon})$) communication rounds for general convex functions (resp., strongly convex functions), while maintaining the $O(1/\epsilon^2)$ (resp., $O(1/\epsilon)$) bound on the total number of intra-node subgradient evaluations. We also present a stochastic counterpart for these algorithms, denoted by SDCS, for solving stochastic optimization problems whose objective function cannot be evaluated exactly. In comparison with existing results for decentralized nonsmooth and stochastic optimization, we can reduce the total number of inter-node communication rounds by orders of magnitude while still maintaining the optimal complexity bounds on intra-node stochastic subgradient evaluations. The bounds on the subgradient evaluations are actually comparable to those required for centralized nonsmooth and stochastic optimization.
- Jan 13 2017 cs.AI arXiv:1701.03322v2First-Order Logic (FOL) is widely regarded as one of the most important foundations for knowledge representation. Nevertheless, in this paper, we argue that FOL has several critical issues for this purpose. Instead, we propose an alternative called assertional logic, in which all syntactic objects are categorized as set theoretic constructs including individuals, concepts and operators, and all kinds of knowledge are formalized by equality assertions. We first present a primitive form of assertional logic that uses minimal assumed knowledge and constructs. Then, we show how to extend it by definitions, which are special kinds of knowledge, i.e., assertions. We argue that assertional logic, although simpler, is more expressive and extensible than FOL. As a case study, we show how assertional logic can be used to unify logic and probability, and more building blocks in AI.
- Jan 10 2017 cs.CV arXiv:1701.01924v1Image blur and image noise are common distortions during image acquisition. In this paper, we systematically study the effect of image distortions on the deep neural network (DNN) image classifiers. First, we examine the DNN classifier performance under four types of distortions. Second, we propose two approaches to alleviate the effect of image distortion: re-training and fine-tuning with noisy images. Our results suggest that, under certain conditions, fine-tuning with noisy images can alleviate much effect due to distorted inputs, and is more practical than re-training.
- Jan 10 2017 cs.CV arXiv:1701.01833v2Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During back-propagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce within-class rotation-invariant deep features while maintaining inter-class discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple state-of-the-art DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks.
- Dec 28 2016 cs.CV arXiv:1612.08230v4Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-Sørensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications.
- Dec 22 2016 cs.CV physics.optics arXiv:1612.07120v1We have designed a single-pixel camera with imaging around corners based on computational ghost imaging. It can obtain the image of an object when the camera cannot look at the object directly. Our imaging system explores the fact that a bucket detector in a ghost imaging setup has no spatial resolution capability. A series of experiments have been designed to confirm our predictions. This camera has potential applications for imaging around corner or other similar environments where the object cannot be observed directly.
- Designing chaotic maps with complex dynamics is a challenging topic. This paper introduces the nonlinear chaotic processing (NCP) model, which contains six basic nonlinear operations. Each operation is a general framework that can use existing chaotic maps as seed maps to generate a huge number of new chaotic maps. The proposed NCP model can be easily extended by introducing new nonlinear operations or by arbitrarily combining existing ones. The properties and chaotic behaviors of the NCP model are investigated. To show its effectiveness and usability, as examples, we provide four new chaotic maps generated by the NCP model and evaluate their chaotic performance using Lyapunov exponent, Shannon entropy, correlation dimension and initial state sensitivity. The experimental results show that these chaotic maps have more complex chaotic behaviors than existing ones.
- Nov 30 2016 cs.CV arXiv:1611.09580v1Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world. However, existing systems for video analysis still lack the ability to handle the problems of scalability, expansibility and error-prone, though great advances have been achieved in a number of visual recognition tasks and surveillance applications, e.g., pedestrian/vehicle detection, people/vehicle counting. Moreover, few algorithms explore the specific values/characteristics in large-scale surveillance videos. To address these problems in large-scale video analysis, we develop a scalable video parsing and evaluation platform through combining some advanced techniques for Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed Filesystem (HDFS). Also, a Web User Interface is designed in the system, to collect users' degrees of satisfaction on the recognition tasks so as to evaluate the performance of the whole system. Furthermore, the highly extensible platform running on the long-term surveillance videos makes it possible to develop more intelligent incremental algorithms to enhance the performance of various visual recognition tasks.
- Nov 28 2016 cs.SI arXiv:1611.08351v1According to NSDUH (National Survey on Drug Use and Health), 20 million Americans consumed drugs in the past few 30 days. Combating illicit drug use is of great interest to public health and law enforcement agencies. Despite of the importance, most of the existing studies on drug uses rely on surveys. Surveys on sensitive topics such as drug use may not be answered truthfully by the people taking them. Selecting a representative sample to survey is another major challenge. In this paper, we explore the possibility of using big multimedia data, including both images and text, from social media in order to discover drug use patterns at fine granularity with respect to demographics. Instagram posts are searched and collected by drug related terms by analyzing the hashtags supplied with each post. A large and dynamic dictionary of frequent drug related slangs is used to find these posts. User demographics are extracted using robust face image analysis algorithms. These posts are then mined to find common trends with regard to the time and location they are posted, and further in terms of age and gender of the drug users. Furthermore, by studying the accounts followed by the users of drug related posts, we extract common interests shared by drug users.
- Generalized twisted Gabidulin codes are one of the few known families of maximum rank matrix codes over finite fields. As a subset of m by n matrices, when m=n, the automorphism group of any generalized twisted Gabidulin code has been completely determined by the authors recently. In this paper, we consider the same problem for m<n. Under certain conditions on their parameters, we determine their middle nuclei and right nuclei, which are important invariants with respect to the equivalence for rank metric codes. Furthermore, we also use them to derive necessary conditions on the automorphisms of generalized twisted Gabidulin codes.
- Oct 11 2016 cs.CV arXiv:1610.02579v1The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. This paper also shows the details of our approach in wining the ImageNet object detection challenge of 2016, with source code provided on \urlhttps://github.com/craftGBD/craftGBD.
- Oct 11 2016 cs.CL arXiv:1610.02806v1We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we propose a way to incorporate attention into the tree topology. Specially, given a pair of sentences, our attentive encoder uses the representation of one sentence, which generated via an RNN, to guide the structural encoding of the other sentence on the dependency parse tree. We evaluate the proposed attentive encoder on three tasks: semantic similarity, paraphrase identification and true-false question selection. Experimental results show that our encoder outperforms all baselines and achieves state-of-the-art results on two tasks.
- Mean-squared-error (MSE) is one of the most widely used performance metrics for the designs and analysis of multi-input-multiple-output (MIMO) communications. Weighted MSE minimization, a more general formulation of MSE minimization, plays an important role in MIMO transceiver optimization. While this topic has a long history and has been extensively studied, existing treatments on the methods in solving the weighted MSE optimization are more or less sporadic and non-systematic. In this paper, we firstly review the two major methodologies, Lagrange multiplier method and majorization theory based method, and their common procedures in solving the weighted MSE minimization. Then some problems and limitations of the methods that were usually neglected or glossed over in existing literature are provided. These problems are fundamental and of critical importance for the corresponding MIMO transceiver optimizations. In addition, a new extended matrix-field weighted MSE model is proposed. Its solutions and applications are discussed in details. Compared with existing models, this new model has wider applications, e.g., nonlinear MIMO transceiver designs and capacity-maximization transceiver designs for general MIMO networks.
- Trajectory analysis is essential in many applications. In this paper, we address the problem of representing motion trajectories in a highly informative way, and consequently utilize it for analyzing trajectories. Our approach first leverages the complete information from given trajectories to construct a thermal transfer field which provides a context-rich way to describe the global motion pattern in a scene. Then, a 3D tube is derived which depicts an input trajectory by integrating its surrounding motion patterns contained in the thermal transfer field. The 3D tube effectively: 1) maintains the movement information of a trajectory, 2) embeds the complete contextual motion pattern around a trajectory, 3) visualizes information about a trajectory in a clear and unified way. We further introduce a droplet-based process. It derives a droplet vector from a 3D tube, so as to characterize the high-dimensional 3D tube information in a simple but effective way. Finally, we apply our tube-and-droplet representation to trajectory analysis applications including trajectory clustering, trajectory classification & abnormality detection, and 3D action recognition. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our approach.
- Aug 31 2016 cs.NE arXiv:1608.08607v2The balance between convergence and diversity is a key issue of evolutionary multi-objective optimization. The recently proposed stable matching-based selection provides a new perspective to handle this balance under the framework of decomposition multi-objective optimization. In particular, the stable matching between subproblems and solutions, which achieves an equilibrium between their mutual preferences, implicitly strikes a balance between the convergence and diversity. Nevertheless, the original stable matching model has a high risk of matching a solution with a unfavorable subproblem which finally leads to an imbalanced selection result. In this paper, we propose an adaptive two-level stable matching-based selection for decomposition multi-objective optimization. Specifically, borrowing the idea of stable matching with incomplete lists, we match each solution with one of its favorite subproblems by restricting the length of its preference list during the first-level stable matching. During the second-level stable matching, the remaining subproblems are thereafter matched with their favorite solutions according to the classic stable matching model. In particular, we develop an adaptive mechanism to automatically set the length of preference list for each solution according to its local competitiveness. The performance of our proposed method is validated and compared with several state-of-the-art evolutionary multi-objective optimization algorithms on 62 benchmark problem instances. Empirical results fully demonstrate the competitive performance of our proposed method on problems with complicated Pareto sets and those with more than three objectives.
- Aug 30 2016 cs.CV arXiv:1608.07724v1Based on life-long observations of physical, chemical, and biologic phenomena in the natural world, humans can often easily picture in their minds what an object will look like in the future. But, what about computers? In this paper, we learn computational models of object transformations from time-lapse videos. In particular, we explore the use of generative models to create depictions of objects at future times. These models explore several different prediction tasks: generating a future state given a single depiction of an object, generating a future state given two depictions of an object at different times, and generating future states recursively in a recurrent framework. We provide both qualitative and quantitative evaluations of the generated results, and also conduct a human evaluation to compare variations of our models.
- Magnetic skyrmions are promising candidates for next-generation information carriers, owing to their small size, topological stability, and ultralow depinning current density. A wide variety of skyrmionic device concepts and prototypes have been proposed, highlighting their potential applications. Here, we report on a bioinspired skyrmionic device with synaptic plasticity. The synaptic weight of the proposed device can be strengthened/weakened by positive/negative stimuli, mimicking the potentiation/depression process of a biological synapse. Both short-term plasticity(STP) and long-term potentiation(LTP) functionalities have been demonstrated for a spiking time-dependent plasticity(STDP) scheme. This proposal suggests new possibilities for synaptic devices for use in spiking neuromorphic computing applications.
- Label ranking aims to learn a mapping from instances to rankings over a finite number of predefined labels. Random forest is a powerful and one of the most successfully general-purpose machine learning algorithms of modern times. In the literature, there seems no research has yet been done in applying random forest to label ranking. In this paper, We present a powerful random forest label ranking method which uses random decision trees to retrieve nearest neighbors that are not only similar in the feature space but also in the ranking space. We have developed a novel two-step rank aggregation strategy to effectively aggregate neighboring rankings discovered by the random forest into a final predicted ranking. Compared with existing methods, the new random forest method has many advantages including its intrinsically scalable tree data structure, highly parallel-able computational architecture and much superior performances. We present extensive experimental results to demonstrate that our new method achieves the best predictive accuracy performances compared with state-of-the-art methods for datasets with complete ranking and datasets with only partial ranking information.
- This paper investigates the compress-and-forward scheme for an uplink cloud radio access network (C-RAN) model, where multi-antenna base-stations (BSs) are connected to a cloud-computing based central processor (CP) via capacity-limited fronthaul links. The BSs compress the received signals with Wyner-Ziv coding and send the representation bits to the CP; the CP performs the decoding of all the users' messages. Under this setup, this paper makes progress toward the optimal structure of the fronthaul compression and CP decoding strategies for the compress-and-forward scheme in C-RAN. On the CP decoding strategy design, this paper shows that under a sum fronthaul capacity constraint, a generalized successive decoding strategy of the quantization and user message codewords that allows arbitrary interleaved order at the CP achieves the same rate region as the optimal joint decoding. Further, it is shown that a practical strategy of successively decoding the quantization codewords first, then the user messages, achieves the same maximum sum rate as joint decoding under individual fronthaul constraints. On the joint optimization of user transmission and BS quantization strategies, this paper shows that if the input distributions are assumed to be Gaussian, then under joint decoding, the optimal quantization scheme for maximizing the achievable rate region is Gaussian. Moreover, Gaussian input and Gaussian quantization with joint decoding achieve to within a constant gap of the capacity region of the Gaussian multiple-input multiple-output (MIMO) uplink C-RAN model. Finally, this paper addresses the computational aspect of optimizing uplink MIMO C-RAN by showing that under fixed Gaussian input, the sum rate maximization problem over the Gaussian quantization noise covariance matrices can be formulated as convex optimization problems, thereby facilitating its efficient solution.
- Aug 16 2016 cs.DM arXiv:1608.04217v1Given a set of $n$ elements separated by a pairwise distance matrix, the minimum differential dispersion problem (Min-Diff DP) aims to identify a subset of m elements (m < n) such that the difference between the maximum sum and the minimum sum of the inter-element distances between any two chosen elements is minimized. We propose an effective iterated local search (denoted by ILS_MinDiff) for Min-Diff DP. To ensure an effective exploration and exploitation of the search space, the proposed ILS_MinDiff algorithm iterates through three sequential search phases: a fast descent-based neighborhood search phase to find a local optimum from a given starting solution, a local optima exploring phase to visit nearby high-quality solutions around a given local optimum, and a local optima escaping phase to move away from the current search region. Experimental results on six data sets of 190 benchmark instances demonstrate that ILS_MinDiff competes favorably with the state-of-the-art algorithms by finding 130 improved best results (new upper bounds).
- Jul 18 2016 cs.CV arXiv:1607.04564v3Vehicle detection and annotation for streaming video data with complex scenes is an interesting but challenging task for urban traffic surveillance. In this paper, we present a fast framework of Detection and Annotation for Vehicles (DAVE), which effectively combines vehicle detection and attributes annotation. DAVE consists of two convolutional neural networks (CNNs): a fast vehicle proposal network (FVPN) for vehicle-like objects extraction and an attributes learning network (ALN) aiming to verify each proposal and infer each vehicle's pose, color and type simultaneously. These two nets are jointly optimized so that abundant latent knowledge learned from the ALN can be exploited to guide FVPN training. Once the system is trained, it can achieve efficient vehicle detection and annotation for real-world traffic surveillance data. We evaluate DAVE on a new self-collected UTS dataset and the public PASCAL VOC2007 car and LISA 2010 datasets, with consistent improvements over existing algorithms.
- Jul 14 2016 cs.CY arXiv:1607.03575v1Ads are an important revenue source for mobile app development, especially for free apps, whose expense can be compensated by ad revenue. The ad benefits also carry with costs. For example, too many ads can interfere the user experience, leading to less user retention and reduced earnings ultimately. In the paper, we aim at understanding the ad costs from users perspective. We utilize app reviews, which are widely recognized as expressions of user perceptions, to identify the ad costs concerned by users. Four types of ad costs, i.e., number of ads, memory/CPU overhead, traffic usage, and bettery consumption, have been discovered from user reviews. To verify whether different ad integration schemes generate different ad costs, we first obtain the commonly used ad schemes from 104 popular apps, and then design a framework named IntelliAd to automatically measure the ad costs of each scheme. To demonstrate whether these costs indeed influence users reactions, we finally observe the correlations between the measured ad costs and the user perceptions. We discover that the costs related to memory/CPU overhead and battery consumption are more concerned by users, while the traffic usage is less concerned by users in spite of its obvious variations among different schemes in the experiments. Our experimental results provide the developers with suggestions on better incorporating ads into apps and, meanwhile, ensuring the user experience.
- For each rank metric code $\mathcal{C}\subseteq \mathbb{K}^{m\times n}$, we associate a translation structure, the kernel of which is shown to be invariant with respect to the equivalence on rank metric codes. When $\mathcal{C}$ is $\mathbb{K}$-linear, we also propose and investigate other two invariants called its middle nucleus and right nucleus. When $\mathbb{K}$ is a finite field $\mathbb{F}_q$ and $\mathcal{C}$ is a maximum rank distance code with minimum distance $d<\min\{m,n\}$ or $\gcd(m,n)=1$, the kernel of the associated translation structure is proved to be $\mathbb{F}_q$. Furthermore, we also show that the middle nucleus of a linear maximum rank distance code over $\mathbb{F}_q$ must be a finite field; its right nucleus also has to be a finite field under the condition $\max\{d,m-d+2\} \geqslant \left\lfloor \frac{n}{2} \right\rfloor +1$. Let $\mathcal{D}$ be the DHO-set associated with a bilinear dimensional dual hyperoval over $\mathbb{F}_2$. The set $\mathcal{D}$ gives rise to a linear rank metric code, and we show that its kernel and right nucleus are is isomorphic to $\mathbb{F}_2$. Also, its middle nucleus must be a finite field containing $\mathbb{F}_q$. Moreover, we also consider the kernel and the nuclei of $\mathcal{D}^k$ where $k$ is a Knuth operation.
- We present an eigenspectrum partitioning scheme without inversion for the recently described real-space electronic transport code, TRANSEC. The primary advantage of TRANSEC is its highly parallel algorithm, which enables studying conductance in large systems. The present scheme adds a new source of parallelization, significantly enhancing TRANSEC's parallel scalability, especially for systems with many electrons. In principle, partitioning could enable super-linear parallel speedup, as we demonstrate in calculations within TRANSEC. In practical cases, we report better than five-fold improvement in CPU time and similar improvements in wall time, compared to previously-published large calculations. Importantly, the suggested scheme is relatively simple to implement. It can be useful for general large Hermitian or weakly non-Hermitian eigenvalue problems, whenever relatively accurate inversion via direct or iterative linear solvers is impractical.
- We study the phase retrieval problem, which solves quadratic system of equations, i.e., recovers a vector $\boldsymbol{x}\in \mathbb{R}^n$ from its magnitude measurements $y_i=|\langle \boldsymbol{a}_i, \boldsymbol{x}\rangle|, i=1,..., m$. We develop a gradient-like algorithm (referred to as RWF representing reshaped Wirtinger flow) by minimizing a nonconvex nonsmooth loss function. In comparison with existing nonconvex Wirtinger flow (WF) algorithm \citecandes2015phase, although the loss function becomes nonsmooth, it involves only the second power of variable and hence reduces the complexity. We show that for random Gaussian measurements, RWF enjoys geometric convergence to a global optimal point as long as the number $m$ of measurements is on the order of $n$, the dimension of the unknown $\boldsymbol{x}$. This improves the sample complexity of WF, and achieves the same sample complexity as truncated Wirtinger flow (TWF) \citechen2015solving, but without truncation in gradient loop. Furthermore, RWF costs less computationally than WF, and runs faster numerically than both WF and TWF. We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal. We further establish performance guarantee of an existing Kaczmarz method for the phase retrieval problem based on its connection to IRWF. We also empirically demonstrate that IRWF outperforms existing ITWF algorithm (stochastic version of TWF) as well as other batch algorithms.
- Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost. The useful representations learned are often found to be sparse and distributed. On the other hand, compressed sensing and sparse coding assume a data generating process, where the observed data is generated from some true latent signal source, and try to recover the corresponding signal from measurements. Looking at auto-encoders from this \textitsignal recovery perspective enables us to have a more coherent view of these techniques. In this paper, in particular, we show that the \textittrue hidden representation can be approximately recovered if the weight matrices are highly incoherent with unit $ \ell^{2} $ row length and the bias vectors takes the value (approximately) equal to the negative of the data mean. The recovery also becomes more and more accurate as the sparsity in hidden signals increases. Additionally, we empirically demonstrate that auto-encoders are capable of recovering the data generating dictionary when only data samples are given.
- Apr 26 2016 cs.SI arXiv:1604.07096v1Drug use by people is on the rise and is of great interest to public health agencies and law enforcement agencies. As found by the National Survey on Drug Use and Health, 20 million Americans aged 12 years or older consumed illicit drugs in the past few 30 days. Given their ubiquity in everyday life, drug abuse related studies have received much and constant attention. However, most of the existing studies rely on surveys. Surveys present a fair number of problems because of their nature. Surveys on sensitive topics such as illicit drug use may not be answered truthfully by the people taking them. Selecting a representative sample to survey is another major challenge. In this paper, we explore the possibility of using big data from social media in order to understand illicit drug use behaviors. Instagram posts are collected using drug related terms by analyzing the hashtags supplied with each post. A large and dynamic dictionary of frequent illicit drug related slang is used to find these posts. These posts are studied to find common drug consumption behaviors with regard to time of day and week. Furthermore, by studying the accounts followed by the users of drug related posts, we hope to discover common interests shared by drug users.
- This paper considers the joint fronthaul compression and transmit beamforming design for the uplink cloud radio access network (C-RAN), in which multi-antenna user terminals communicate with a cloud-computing based centralized processor (CP) through multi-antenna base-stations (BSs) serving as relay nodes. A compress-and-forward relaying strategy, named the VMAC scheme, is employed, in which the BSs can either perform single-user compression or Wyner-Ziv coding to quantize the received signals and send the quantization bits to the CP via capacity-limited fronthaul links; the CP performs successive decoding with either successive interference cancellation (SIC) receiver or linear minimum-mean-square-error (MMSE) receiver. Under this setup, this paper investigates the joint optimization of the transmit beamformers at the users and the quantization noise covariance matrices at the BSs for maximizing the network utility. A novel weighted minimum-mean-square-error successive convex approximation (WMMSE-SCA) algorithm is first proposed for maximizing the weighted sum rate under the user transmit power and fronthaul capacity constraints with single-user compression. Assuming a heuristic decompression order, the proposed algorithm is then adapted for optimizing the transmit beamforming and fronthaul compression under Wyner-Ziv coding. This paper also proposes a low-complexity separate design consisting of optimizing transmit beamformers for the Gaussian vector multiple-access channel along with per-antenna quantizers with uniform quantization noise levels across the antennas at each BS. Numerical results show that with optimized beamforming and fronthaul compression, C-RAN can significantly outperform conventional cellular networks. Furthermore, the low complexity separate design already performs very close to the optimized joint design in regime of practical interest.
- Apr 07 2016 cs.NI arXiv:1604.01675v1Unraveling quality of experience (QoE) of video streaming is very challenging in bandwidth shared wireless networks. It is unclear how QoE metrics such as starvation probability and buffering time interact with dynamics of streaming traffic load. In this paper, we collect view records from one of the largest streaming providers in China over two weeks and perform an in-depth measurement study on flow arrival and viewing time that shed light on the real traffic pattern. Our most important observation is that the viewing time of streaming users fits a hyper-exponential distribution quite well. This implies that all the views can be categorized into two classes, short and long views with separated time scales. We then map the measured traffic pattern to bandwidth shared cellular networks and propose an analytical framework to compute the closed-form starvation probability on the basis of ordinary differential equations (ODEs). Our framework can be naturally extended to investigate practical issues including the progressive downloading and the finite video duration. Extensive trace-driven simulations validate the accuracy of our models. Our study reveals that the starvation metrics of the short and long views possess different sensitivities to the scheduling priority at base station. Hence, a better QoE tradeoff between the short and long views has a potential to be leveraged by offering them different scheduling weights. The flow differentiation involves tremendous technical and non-technical challenges because video content is owned by content providers but not the network operators and the viewing time of each session is unknown beforehand. To overcome these difficulties, we propose an online Bayesian approach to infer the viewing time of each incoming flow with the "least" information from content providers.
- Apr 04 2016 cs.AI arXiv:1604.00377v1Grouping problems aim to partition a set of items into multiple mutually disjoint subsets according to some specific criterion and constraints. Grouping problems cover a large class of important combinatorial optimization problems that are generally computationally difficult. In this paper, we propose a general solution approach for grouping problems, i.e., reinforcement learning based local search (RLS), which combines reinforcement learning techniques with descent-based local search. The viability of the proposed approach is verified on a well-known representative grouping problem (graph coloring) where a very simple descent-based coloring algorithm is applied. Experimental studies on popular DIMACS and COLOR02 benchmark graphs indicate that RLS achieves competitive performances compared to a number of well-known coloring algorithms.
- In this paper, we consider the robot motion (or task) planning problem under some given time bounded high level specifications. We use metric interval temporal logic (MITL), a member of the temporal logic family, to represent the task specification and then we provide a constructive way to generate a timed automaton and methods to look for accepting runs on the automaton to find a feasible motion (or path) sequence for the robot to complete the task.
- Mar 17 2016 cs.SE arXiv:1603.05069v1Modern cyber-physical systems (CPS) have a close inter-dependence between software and physical components. Automotive embedded systems are typical CPS, as physical chips, sensors and actuators are physical components and software embedded within are the cyber components. The current stage of embedded systems is highly complex in architecture design for both software and hardware. It is common in industrial practice that high level control algorithm development and low level code implementation on hardware platforms are developed separately with limited shared information. However, software code and hardware architecture become closely related with the increasing complexity. Correlated requirements and dependencies between hardware and software are emerging problems of industrial practice. We demonstrate in this paper a method to link model based system design with real-time simulations and analysis of the architecture model. This allows hardware software co-design and thus early selection of hardware architecture.
- Mar 14 2016 cs.AI arXiv:1603.03511v1In this paper, we propose a set theoretic approach for knowledge representation. While the syntax of an application domain is captured by set theoretic constructs including individuals, concepts and operators, knowledge is formalized by equality assertions. We first present a primitive form that uses minimal assumed knowledge and constructs. Then, assuming naive set theory, we extend it by definitions, which are special kinds of knowledge. Interestingly, we show that the primitive form is expressive enough to define logic operators, not only propositional connectives but also quantifiers.
- Mar 08 2016 cs.SI arXiv:1603.02175v1In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justify why it is used instead of video-based methods, or many latent topic models such as LDA and Collaborative Filtering approaches. We then show the effectiveness of the different demographic and social features, and their combinations and derivatives, in predicting user interest similarity, based on different machine-learning methods for combining multiple features. We propose a hybrid tree-encoded linear model for combining the features, and show that it out-performs other linear and treebased models. Our methods can be used to predict user interest similarity when the ground-truth is not available, e.g. for new users, or inactive users whose interests may have changed from old access data, and is useful for video recommendation. Our study is based on a rich dataset from Tencent, a popular service provider of social networks, video services, and various other services in China.
- While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks. Specifically, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate for validation due to shifting parameter values (especially during initial training epochs). Also, BN cannot be used with batch-size 1 during training. We address these drawbacks by proposing a non-adaptive normalization technique for removing internal covariate shift, that we call Normalization Propagation. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer thus being computationally faster compared with BN. We exploit the observation that the pre-activation before Rectified Linear Units follow Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics for hidden layers.
- Mar 01 2016 cond-mat.mes-hall cs.ET arXiv:1602.08799v1Magnetic skyrmion holds promise as information carriers in the next-generation memory and logic devices, owing to the topological stability, small size and extremely low current needed to drive it. One of the most potential applications of skyrmion is to design racetrack memory (RM), named Sk-RM, instead of utilizing domain wall (DW). However, current studies face some key design challenges, e.g., skyrmion manipulation, data representation and synchronization etc. To address these challenges, we propose here a complementary Sk-RM structure with voltage manipulation. Functionality and performance of the proposed design are investigated with micromagnetic simulations.
- With the rapid growth of Internet technologies, cloud computing and social networks have become ubiquitous. An increasing number of people participate in social networks and massive online social data are obtained. In order to exploit knowledge from copious amounts of data obtained and predict social behavior of users, we urge to realize data mining in social networks. Almost all online websites use cloud services to effectively process the large scale of social data, which are gathered from distributed data centers. These data are so large-scale, high-dimension and widely distributed that we propose a distributed sparse online algorithm to handle them. Additionally, privacy-protection is an important point in social networks. We should not compromise the privacy of individuals in networks, while these social data are being learned for data mining. Thus we also consider the privacy problem in this article. Our simulations shows that the appropriate sparsity of data would enhance the performance of our algorithm and the privacy-preserving method does not significantly hurt the performance of the proposed algorithm.