results for au:Huang_Q in:cs

- Mar 20 2017 cs.SD arXiv:1703.06052v1Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (EER) reduced from 0.13 to 0.11, compared with the convolutional recurrent baseline system.
- 3D shape models are naturally parameterized using vertices and faces, \ie, composed of polygons forming a surface. However, current 3D learning paradigms for predictive and generative tasks using convolutional neural networks focus on a voxelized representation of the object. Lifting convolution operators from the traditional 2D to 3D results in high computational overhead with little additional benefit as most of the geometry information is contained on the surface boundary. Here we study the problem of directly generating the 3D shape surface of rigid and non-rigid shapes using deep convolutional neural networks. We develop a procedure to create consistent `geometry images' representing the shape surface of a category of 3D objects. We then use this consistent representation for category-specific shape surface generation from a parametric representation or an image by developing novel extensions of deep residual networks for the task of geometry image generation. Our experiments indicate that our network learns a meaningful representation of shape surfaces allowing it to interpolate between shape orientations and poses, invent new shape surfaces and reconstruct 3D shape surfaces from previously unseen images.
- Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
- This paper generalizes the piggybacking constructions for distributed storage systems by considering various protected instances and piggybacked instances. Analysis demonstrates that the proportion of protected instances determines the average repair bandwidth for a systematic node. By optimizing the proportion of protected instances, the repair ratio of generalized piggybacking codes approaches zero instead of 50% as the number of parity check nodes tends to infinity. Furthermore, the computational complexity for repairing a single systematic node cost by generalized piggybacking codes is less than that of the existing piggybacking designs.
- Nov 23 2016 cs.CV arXiv:1611.07485v1Spatial contextual dependencies are crucial for scene labeling problems. Recurrent neural network (RNN) is one of state-of-the-art methods for modeling contextual dependencies. However, RNNs are fundamentally designed for sequential data, not spatial data. This work shows that directly applying traditional RNN architectures, which unfold a 2D lattice grid into a sequence, is not sufficient to model structure dependencies in images due to the "impact vanishing" problem. A new RNN unit with Explicit Long-range Conditioning (RNN-ELC) is designed to overcome this problem. Based on this new RNN-ELC unit, a novel neural network architecture is built for scene labeling tasks. This architecture achieves state-of-the-art performances on several standard scene labeling datasets. Comprehensive experiments demonstrate that scene labeling tasks benefit a lot from the explicit long range contextual dependencies encoded in our algorithm.
- In this paper, we consider the uplink of cell-free massive MIMO systems, where a large number of distributed single antenna access points (APs) serve a much smaller number of users simultaneously via limited backhaul. For the first time, we investigate the performance of compute-and-forward (C&F) in such an ultra dense network with a realistic channel model (including fading, pathloss and shadowing). By utilising the characteristic of pathloss, a low complexity coefficient selection algorithm for C\&F is proposed. We also give a greedy AP selection method for message recovery. Additionally, we compare the performance of C&F to some other promising linear strategies for distributed massive MIMO, such as small cells (SC) and maximum ratio combining (MRC). Numerical results reveal that C&F not only reduces the backhaul load, but also significantly increases the system throughput for the symmetric scenario.
- In this paper, we study self-dual codes over $\mathbb{Z}_2 \times (\mathbb{Z}_2+u\mathbb{Z}_2) $, where $u^2=0$. Three types of self-dual codes are defined. For each type, the possible values $\alpha,\beta$ such that there exists a code $\mathcal{C}\subseteq \mathbb{Z}_{2}^\alpha\times (\mathbb{Z}_2+u\mathbb{Z}_2)^\beta$ are established. We also present several approaches to construct self-dual codes over $\mathbb{Z}_2 \times (\mathbb{Z}_2+u\mathbb{Z}_2) $. Moreover, the structure of two-weight self-dual codes is completely obtained for $\alpha \cdot\beta\neq 0$.
- We consider optimal channel shortener design for reduced-state soft-output Viterbi equalizer (RS-SOVE) in single-carrier (SC) systems. To use RS-SOVE, three receiver filters need to be designed: a prefilter, a target response and a feedback filter. The collection of these three filters are commonly referred to as the \lq\lqchannel shortener\rq\rq. Conventionally, the channel shortener is designed to transform an intersymbol interference (ISI) channel into an equivalent minimum-phase equivalent form. In this paper, we design the channel shortener to maximize a mutual information lower bound (MILB) based on a mismatched detection model. By taking the decision-feedback quality in the RS-SOVE into consideration, the prefilter and feedback filter are found in closed forms, while the target response is optimized via a gradient-ascending approach with the gradient explicitly derived. The information theoretical properties of the proposed channel shortener are analyzed. Moreover, we show through numerical results that, the proposed channel shortener design achieves superior detection performance compared to previous channel shortener designs at medium and high code-rates.
- Aug 09 2016 cs.NI arXiv:1608.02427v3Initial timing acquisition in narrow-band IoT (NB-IoT) devices is done by detecting a periodically transmitted known sequence. The detection has to be done at lowest possible latency, because the RF-transceiver, which dominates downlink power consumption of an NB-IoT modem, has to be turned on throughout this time. Auto-correlation detectors show low computational complexity from a signal processing point of view at the price of a higher detection latency. In contrast a maximum likelihood cross-correlation detector achieves low latency at a higher complexity as shown in this paper. We present a hardware implementation of the maximum likelihood cross-correlation detection. The detector achieves an average detection latency which is a factor of two below that of an auto-correlation method and is able to reduce the required energy per timing acquisition by up to 34%.
- Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classification task. For the acoustic modeling, a large set of contextual frames of the chunk are fed into the DNN to perform a multi-label classification for the expected tags, considering that only chunk (or utterance) level rather than frame-level labels are available. Dropout and background noise aware training are also adopted to improve the generalization capability of the DNNs. For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features. The new features, which are smoothed against background noise and more compact with contextual information, can further improve the performance of the DNN baseline. Compared with the standard Gaussian Mixture Model (GMM) baseline of the DCASE 2016 audio tagging challenge, our proposed method obtains a significant equal error rate (EER) reduction from 0.21 to 0.13 on the development set. The proposed aDAE system can get a relative 6.7% EER reduction compared with the strong DNN baseline on the development set. Finally, the results also show that our approach obtains the state-of-the-art performance with 0.15 EER on the evaluation set of the DCASE 2016 audio tagging task while EER of the first prize of this challenge is 0.17.
- In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework. Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds. Firstly, the parameters of the DNN are initialized by the proposed hierarchical pre-training. Multi-level objective function is then adopted to add more constraint on the cross-entropy based loss function. A series of experiments were conducted on the Task1 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. The final DNN-based system achieved a 22.9% relative improvement on average scene classification error as compared with the Gaussian Mixture Model (GMM)-based benchmark system across four standard folds.
- Acoustic event detection for content analysis in most cases relies on lots of labeled data. However, manually annotating data is a time-consuming task, which thus makes few annotated resources available so far. Unlike audio event detection, automatic audio tagging, a multi-label acoustic event classification task, only relies on weakly labeled data. This is highly desirable to some practical applications using audio analysis. In this paper we propose to use a fully deep neural network (DNN) framework to handle the multi-label classification task in a regression way. Considering that only chunk-level rather than frame-level labels are available, the whole or almost whole frames of the chunk were fed into the DNN to perform a multi-label regression for the expected tags. The fully DNN, which is regarded as an encoding function, can well map the audio features sequence to a multi-tag vector. A deep pyramid structure was also designed to extract more robust high-level features related to the target tags. Further improved methods were adopted, such as the Dropout and background noise aware training, to enhance its generalization capability for new audio recordings in mismatched environments. Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input. The results show that our approach obtained a 15% relative improvement compared with the official GMM-based method of DCASE 2016 challenge.
- Apr 21 2016 cs.CV arXiv:1604.06079v2Due to the abundance of 2D product images from the Internet, developing efficient and scalable algorithms to recover the missing depth information is central to many applications. Recent works have addressed the single-view depth estimation problem by utilizing convolutional neural networks. In this paper, we show that exploring symmetry information, which is ubiquitous in man made objects, can significantly boost the quality of such depth predictions. Specifically, we propose a new convolutional neural network architecture to first estimate dense symmetric correspondences in a product image and then propose an optimization which utilizes this information explicitly to significantly improve the quality of single-view depth estimations. We have evaluated our approach extensively, and experimental results show that this approach outperforms state-of-the-art depth estimation techniques.
- Apr 20 2016 cs.CV arXiv:1604.05383v1Discriminative deep learning approaches have shown impressive results for problems where human-labeled ground truth is plentiful, but what about tasks where labels are difficult or impossible to obtain? This paper tackles one such problem: establishing dense visual correspondence across different object instances. For this task, although we do not know what the ground-truth is, we know it should be consistent across instances of that category. We exploit this consistency as a supervisory signal to train a convolutional neural network to predict cross-instance correspondences between pairs of images depicting objects of the same category. For each pair of training images we find an appropriate 3D CAD model and render two synthetic views to link in with the pair, establishing a correspondence flow 4-cycle. We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth. At test time, no CAD models are required. We demonstrate that our end-to-end trained ConvNet supervised by cycle-consistency outperforms state-of-the-art pairwise matching methods in correspondence-related tasks.
- Motivation: High-throughput experimental techniques have been producing more and more protein-protein interaction (PPI) data. PPI network alignment greatly benefits the understanding of evolutionary relationship among species, helps identify conserved sub-networks and provides extra information for functional annotations. Although a few methods have been developed for multiple PPI network alignment, the alignment quality is still far away from perfect and thus, new network alignment methods are needed. Result: In this paper, we present a novel method, denoted as ConvexAlign, for joint alignment of multiple PPI networks by convex optimization of a scoring function composed of sequence similarity, topological score and interaction conservation score. In contrast to existing methods that generate multiple alignments in a greedy or progressive manner, our convex method optimizes alignments globally and enforces consistency among all pairwise alignments, resulting in much better alignment quality. Tested on both synthetic and real data, our experimental results show that ConvexAlign outperforms several popular methods in producing functionally coherent alignments. ConvexAlign even has a larger advantage over the others in aligning real PPI networks. ConvexAlign also finds a few conserved complexes among 5 species which cannot be detected by the other methods.
- Apr 12 2016 cs.CV arXiv:1604.02801v1We present an end-to-end system for reconstructing complete watertight and textured models of moving subjects such as clothed humans and animals, using only three or four handheld sensors. The heart of our framework is a new pairwise registration algorithm that minimizes, using a particle swarm strategy, an alignment error metric based on mutual visibility and occlusion. We show that this algorithm reliably registers partial scans with as little as 15% overlap without requiring any initial correspondences, and outperforms alternative global registration algorithms. This registration algorithm allows us to reconstruct moving subjects from free-viewpoint video produced by consumer-grade sensors, without extensive sensor calibration, constrained capture volume, expensive arrays of cameras, or templates of the subject geometry.
- Apr 01 2016 cs.CV arXiv:1603.09742v4Semantic segmentation is critical to image content understanding and object localization. Recent development in fully-convolutional neural network (FCN) has enabled accurate pixel-level labeling. One issue in previous works is that the FCN based method does not exploit the object boundary information to delineate segmentation details since the object boundary label is ignored in the network training. To tackle this problem, we introduce a double branch fully convolutional neural network, which separates the learning of the desirable semantic class labeling with mask-level object proposals guided by relabeled boundaries. This network, called object boundary guided FCN (OBG-FCN), is able to integrate the distinct properties of object shape and class features elegantly in a fully convolutional way with a designed masking architecture. We conduct experiments on the PASCAL VOC segmentation benchmark, and show that the end-to-end trainable OBG-FCN system offers great improvement in optimizing the target semantic segmentation quality.
- Mar 21 2016 cs.CV arXiv:1603.05930v1Graph based representation is widely used in visual tracking field by finding correct correspondences between target parts in consecutive frames. However, most graph based trackers consider pairwise geometric relations between local parts. They do not make full use of the target's intrinsic structure, thereby making the representation easily disturbed by errors in pairwise affinities when large deformation and occlusion occur. In this paper, we propose a geometric hypergraph learning based tracking method, which fully exploits high-order geometric relations among multiple correspondences of parts in consecutive frames. Then visual tracking is formulated as the mode-seeking problem on the hypergraph in which vertices represent correspondence hypotheses and hyperedges describe high-order geometric relations. Besides, a confidence-aware sampling method is developed to select representative vertices and hyperedges to construct the geometric hypergraph for more robustness and scalability. The experiments are carried out on two challenging datasets (VOT2014 and Deform-SOT) to demonstrate that the proposed method performs favorable against other existing trackers.
- Mar 01 2016 cs.SD arXiv:1602.08507v1In the past few years, several case studies have illustrated that the use of occupancy information in buildings leads to energy-efficient and low-cost HVAC operation. The widely presented techniques for occupancy estimation include temperature, humidity, CO2 concentration, image camera, motion sensor and passive infrared (PIR) sensor. So far little studies have been reported in literature to utilize audio and speech processing as indoor occupancy prediction technique. With rapid advances of audio and speech processing technologies, nowadays it is more feasible and attractive to integrate audio-based signal processing component into smart buildings. In this work, we propose to utilize audio processing techniques (i.e., speaker recognition and background audio energy estimation) to estimate room occupancy (i.e., the number of people inside a room). Theoretical analysis and simulation results demonstrate the accuracy and effectiveness of this proposed occupancy estimation technique. Based on the occupancy estimation, smart buildings will adjust the thermostat setups and HVAC operations, thus, achieving greater quality of service and drastic cost savings.
- Feb 25 2016 cs.NI arXiv:1602.07399v1In recent years, there has been an increasing number of information technologies utilized in buildings to advance the idea of "smart buildings". Among various potential techniques, the use of Wi-Fi based indoor positioning allows to locate and track smartphone users inside a building, therefore, location-aware intelligent solutions can be applied to control and of building operations. These location-aware indoor services (e.g., path finding, internet of things, location based advertising) demand real-time accurate indoor localization, which is a key issue to guarantee high quality of service in smart buildings. This paper presents a new Wi-Fi based indoor localization technique that achieves significantly improvement of indoor positioning accuracy with the help of Li-Fi assisted coefficient calibration. The proposed technique leverages indoor existing Li-Fi lighting and Wi-Fi infrastructure, and results in a cost-effective and user-convenient indoor accurate localization framework. In this work, experimental study and measurements are conducted to verify the performance of the proposed idea. The results substantiate the concept of refining Wi-Fi based indoor localization with Li-Fi assisted computation calibration.
- Feb 23 2016 cs.LG arXiv:1602.06586v4We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient algorithm that accurately recovers the underlying M by M matrix using Theta(M) samples. This result easily translates to Theta(M) sample algorithms for learning topic models and learning hidden Markov Models. These linear sample complexities are optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. We provide an even stronger lower bound where distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by an HMM with two hidden states requires Omega(M) observations. This precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs.
- Learning deeper convolutional neural networks becomes a tendency in recent years. However, many empirical evidences suggest that performance improvement cannot be gained by simply stacking more layers. In this paper, we consider the issue from an information theoretical perspective, and propose a novel method Relay Backpropagation, that encourages the propagation of effective information through the network in training stage. By virtue of the method, we achieved the first place in ILSVRC 2015 Scene Classification Challenge. Extensive experiments on two challenging large scale datasets demonstrate the effectiveness of our method is not restricted to a specific dataset or network architecture. Our models will be available to the research community later.
- We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of which are classified into 3,135 categories (WordNet synsets). In this report we describe the ShapeNet effort as a whole, provide details for all currently available datasets, and summarize future plans.
- Nov 25 2015 cs.CV arXiv:1511.07845v2Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint. In this paper we repurpose powerful learning machinery, originally developed for object classification, to discover image cues relevant for recovering the 3D shape of potentially unfamiliar objects. We cast the problem as one of local prediction of surface normals and global detection of 3D reflection symmetry planes, which open the door for extrapolating occluded surfaces from visible ones. We demonstrate that our method is able to recover accurate 3D shape information for classes of objects it was not trained on, in both synthetic and real images.
- The second generation (2G) cellular networks are the current workhorse for machine-to-machine (M2M) communications. Diversity in 2G devices can be present both in form of multiple receive branches and blind repetitions. In presence of diversity, intersymbol interference (ISI) equalization and co-channel interference (CCI) suppression are usually very complex. In this paper, we consider the improvements for 2G devices with receive diversity. We derive a low-complexity receiver based on a channel shortening filter, which allows to sum up all diversity branches to a single stream after filtering while keeping the full diversity gain. The summed up stream is subsequently processed by a single stream Max-log-MAP (MLM) equalizer. The channel shortening filter is designed to maximize the mutual information lower bound (MILB) with the Ungerboeck detection model. Its filter coefficients can be obtained mainly by means of discrete-Fourier transforms (DFTs). Compared with the state-of-art homomorphic (HOM) filtering based channel shortener which cooperates with a delayed-decision feedback MLM (DDF-MLM) equalizer, the proposed MILB channel shortener has superior performance. Moreover, the equalization complexity, in terms of real-valued multiplications, is decreased by a factor that equals the number of diversity branches.
- We propose a deep learning approach for finding dense correspondences between 3D scans of people. Our method requires only partial geometric information in the form of two depth maps or partial reconstructed surfaces, works for humans in arbitrary poses and wearing any clothing, does not require the two people to be scanned from similar viewpoints, and runs in real time. We use a deep convolutional neural network to train a feature descriptor on depth map pixels, but crucially, rather than training the network to solve the shape correspondence problem directly, we train it to solve a body region classification problem, modified to increase the smoothness of the learned descriptors near region boundaries. This approach ensures that nearby points on the human body are nearby in feature space, and vice versa, rendering the feature descriptor suitable for computing dense correspondences between the scans. We validate our method on real and synthetic data for both clothed and unclothed humans, and show that our correspondences are more robust than is possible with state-of-the-art unsupervised methods, and more accurate than those found using methods that require full watertight 3D geometry.
- We present a general framework for studying the multilevel structure of lattice network coding (LNC), which serves as the theoretical fundamental for solving the ring-based LNC problem in practice, with greatly reduced decoding complexity. Building on the framework developed, we propose a novel lattice-based network coding solution, termed layered integer forcing (LIF), which applies to any lattices having multilevel structure. The theoretic foundations of the developed multilevel framework lead to a new general lattice construction approach, the elementary divisor construction (EDC), which shows its strength in improving the overall rate over multiple access channels (MAC) with low computational cost. We prove that the EDC lattices subsume the traditional complex construction approaches. Then a soft detector is developed for lattice network relaying, based on the multilevel structure of EDC. This makes it possible to employ iterative decoding in lattice network coding, and simulation results show the large potential of using iterative multistage decoding to approach the capacity.
- Sep 29 2015 cs.LG arXiv:1509.07943v1Super-resolution is the problem of recovering a superposition of point sources using bandlimited measurements, which may be corrupted with noise. This signal processing problem arises in numerous imaging problems, ranging from astronomy to biology to spectroscopy, where it is common to take (coarse) Fourier measurements of an object. Of particular interest is in obtaining estimation procedures which are robust to noise, with the following desirable statistical and computational properties: we seek to use coarse Fourier measurements (bounded by some cutoff frequency); we hope to take a (quantifiably) small number of measurements; we desire our algorithm to run quickly. Suppose we have k point sources in d dimensions, where the points are separated by at least ∆from each other (in Euclidean distance). This work provides an algorithm with the following favorable guarantees: - The algorithm uses Fourier measurements, whose frequencies are bounded by O(1/∆) (up to log factors). Previous algorithms require a cutoff frequency which may be as large as \Omega( d/∆). - The number of measurements taken by and the computational complexity of our algorithm are bounded by a polynomial in both the number of points k and the dimension d, with no dependence on the separation ∆. In contrast, previous algorithms depended inverse polynomially on the minimal separation and exponentially on the dimension for both of these quantities. Our estimation procedure itself is simple: we take random bandlimited measurements (as opposed to taking an exponential number of measurements on the hyper-grid). Furthermore, our analysis and algorithm are elementary (based on concentration bounds for sampling and the singular value decomposition).
- Aug 07 2015 cs.CV arXiv:1508.01244v3We study gaze estimation on tablets, our key design goal is uncalibrated gaze estimation using the front-facing camera during natural use of tablets, where the posture and method of holding the tablet is not constrained. We collected the first large unconstrained gaze dataset of tablet users, labeled Rice TabletGaze dataset. The dataset consists of 51 subjects, each with 4 different postures and 35 gaze locations. Subjects vary in race, gender and in their need for prescription glasses, all of which might impact gaze estimation accuracy. Driven by our observations on the collected data, we present a TabletGaze algorithm for automatic gaze estimation using multi-level HoG feature and Random Forests regressor. The TabletGaze algorithm achieves a mean error of 3.17 cm. We perform extensive evaluation on the impact of various factors such as dataset size, race, wearing glasses and user posture on the gaze estimation accuracy and make important observations about the impact of these factors.
- Mar 03 2015 cs.LG arXiv:1503.00424v2Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > \Omega(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings.
- Feb 25 2015 cs.GR arXiv:1502.06686v1Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.
- Dec 30 2014 cs.DS arXiv:1412.8164v1In this paper we investigate the top-$k$-selection problem, i.e. determine the largest, second largest, ..., and the $k$-th largest elements, in the dynamic data model. In this model the order of elements evolves dynamically over time. In each time step the algorithm can only probe the changes of data by comparing a pair of elements. Previously only two special cases were studied[2]: finding the largest element and the median; and sorting all elements. This paper systematically deals with $k\in [n]$ and solves the problem almost completely. Specifically, we identify a critical point $k^*$ such that the top-$k$-selection problem can be solved error-free with probability $1-o(1)$ if and only if $k=o(k^*)$. A lower bound of the error when $k=\Omega(k^*)$ is also determined, which actually is tight under some condition. On the other hand, it is shown that the top-$k$-set problem, which means finding the largest $k$ elements without sorting them, can be solved error-free for all $k\in [n]$. Additionally, we extend the dynamic data model and show that most of these results still hold.
- Nov 14 2014 cs.LG arXiv:1411.3698v2Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM). Given the joint probabilities of finite length strings of the process, we are interested in finding a finite state generative model to describe the entire process. In particular, we focus on two classes of models: HMMs and quasi-HMMs, which is a strictly larger class of models containing HMMs. In the main theorem, we show that if the random process is generated by an HMM of order less or equal than k, and whose transition and observation probability matrix are in general position, namely almost everywhere on the parameter space, both the minimal quasi-HMM realization and the minimal HMM realization can be efficiently computed based on the joint probabilities of all the length N strings, for N > 4 lceil log_d(k) rceil +1. In this paper, we also aim to compare and connect the two lines of literature: realization theory of HMMs, and the recent development in learning latent variable models with tensor decomposition techniques.
- Nov 04 2014 cs.DB arXiv:1411.0064v1Detecting dominant clusters is important in many analytic applications. The state-of-the-art methods find dense subgraphs on the affinity graph as the dominant clusters. However, the time and space complexity of those methods are dominated by the construction of the affinity graph, which is quadratic with respect to the number of data points, and thus impractical on large data sets. To tackle the challenge, in this paper, we apply Evolutionary Game Theory (EGT) and develop a scalable algorithm, Approximate Localized Infection Immunization Dynamics (ALID). The major idea is to perform Localized Infection Immunization Dynamics (LID) to find dense subgraph within local range of the affinity graph. LID is further scaled up with guaranteed high efficiency and detection quality by an estimated Region of Interest (ROI) and a carefully designed Candidate Infective Vertex Search method (CIVS). ALID only constructs small local affinity graphs and has a time complexity of O(C(a^*+ \delta)n) and a space complexity of O(a^*(a^*+ \delta)), where a^* is the size of the largest dominant cluster and C << n and \delta << n are small constants. We demonstrate by extensive experiments on both synthetic data and real world data that ALID achieves state-of-the-art detection quality with much lower time and space cost on single machine. We also demonstrate the encouraging parallelization performance of ALID by implementing the Parallel ALID (PALID) on Apache Spark. PALID processes 50 million SIFT data points in 2.29 hours, achieving a speedup ratio of 7.51 with 8 executors.
- Deeply rooted in classical social choice and voting theory, statistical ranking with paired comparison data experienced its renaissance with the wide spread of crowdsourcing technique. As the data quality might be significantly damaged in an uncontrolled crowdsourcing environment, outlier detection and robust ranking have become a hot topic in such data analysis. In this paper, we propose a robust ranking framework based on the principle of Huber's robust statistics, which formulates outlier detection as a LASSO problem to find sparse approximations of the cyclic ranking projection in Hodge decomposition. Moreover, simple yet scalable algorithms are developed based on Linearized Bregman Iteration to achieve an even less biased estimator than LASSO. Statistical consistency of outlier detection is established in both cases which states that when the outliers are strong enough and in Erdos-Renyi random graph sampling settings, outliers can be faithfully detected. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ranking with large scale crowdsourcing data arising from computer vision, multimedia, machine learning, sociology, etc.
- Maximum a posteriori (MAP) inference over discrete Markov random fields is a fundamental task spanning a wide spectrum of real-world applications, which is known to be NP-hard for general graphs. In this paper, we propose a novel semidefinite relaxation formulation (referred to as SDR) to estimate the MAP assignment. Algorithmically, we develop an accelerated variant of the alternating direction method of multipliers (referred to as SDPAD-LR) that can effectively exploit the special structure of the new relaxation. Encouragingly, the proposed procedure allows solving SDR for large-scale problems, e.g., problems on a grid graph comprising hundreds of thousands of variables with multiple states per node. Compared with prior SDP solvers, SDPAD-LR is capable of attaining comparable accuracy while exhibiting remarkably improved scalability, in contrast to the commonly held belief that semidefinite relaxation can only been applied on small-scale MRF problems. We have evaluated the performance of SDR on various benchmark datasets including OPENGM2 and PIC in terms of both the quality of the solutions and computation time. Experimental results demonstrate that for a broad class of problems, SDPAD-LR outperforms state-of-the-art algorithms in producing better MAP assignment in an efficient manner.
- Apr 14 2014 cs.AR arXiv:1404.3162v1In this paper, we present a novel signal processing unit built upon the theory of factor graphs, which is able to address a wide range of signal processing algorithms. More specifically, the demonstrated factor graph processor (FGP) is tailored to Gaussian message passing algorithms. We show how to use a highly configurable systolic array to solve the message update equations of nodes in a factor graph efficiently. A proper instruction set and compilation procedure is presented. In a recursive least squares channel estimation example we show that the FGP can compute a message update faster than a state-ofthe- art DSP. The results demonstrate the usabilty of the FGP architecture as a flexible HW accelerator for signal-processing and communication systems.
- Joint matching over a collection of objects aims at aggregating information from a large collection of similar instances (e.g. images, graphs, shapes) to improve maps between pairs of them. Given multiple matches computed between a few object pairs in isolation, the goal is to recover an entire collection of maps that are (1) globally consistent, and (2) close to the provided maps --- and under certain conditions provably the ground-truth maps. Despite recent advances on this problem, the best-known recovery guarantees are limited to a small constant barrier --- none of the existing methods find theoretical support when more than $50\%$ of input correspondences are corrupted. Moreover, prior approaches focus mostly on fully similar objects, while it is practically more demanding to match instances that are only partially similar to each other. In this paper, we develop an algorithm to jointly match multiple objects that exhibit only partial similarities, given a few pairwise matches that are densely corrupted. Specifically, we propose to recover the ground-truth maps via a parameter-free convex program called MatchLift, following a spectral method that pre-estimates the total number of distinct elements to be matched. Encouragingly, MatchLift exhibits near-optimal error-correction ability, i.e. in the asymptotic regime it is guaranteed to work even when a dominant fraction $1-\Theta\left(\frac{\log^{2}n}{\sqrt{n}}\right)$ of the input maps behave like random outliers. Furthermore, MatchLift succeeds with minimal input complexity, namely, perfect matching can be achieved as soon as the provided maps form a connected map graph. We evaluate the proposed algorithm on various benchmark data sets including synthetic examples and real-world examples, all of which confirm the practical applicability of MatchLift.
- The encoding complexity of a general (en,ek) quasi-cyclic code is O[(e^2)(n-k)k]. This paper presents a novel low-complexity encoding algorithm for quasi-cyclic (QC) codes based on matrix transformation. First, a message vector is encoded into a transformed codeword in the transform domain. Then, the transmitted codeword is obtained from the transformed codeword by the inverse Galois Fourier transform. For binary QC codes, a simple and fast mapping is required to post-process the transformed codeword such that the transmitted codeword is binary as well. The complexity of our proposed encoding algorithm is O[e(n-k)k] symbol operations for non-binary codes and O[ek(n-k)(log_2 e)] bit operations for binary codes. These complexities are much lower than their traditional counterpart O[(e^2)(n-k)k]. For example, our complexity of encoding a 64-ary (4095,2160) QC code is only 1.59% of that of traditional encoding, and our complexities of encoding the binary (4095, 2160) and (8176, 7154) QC codes are respectively 9.52% and 1.77% of those of traditional encoding. We also study the application of our low-complexity encoding algorithm to one of the most important subclasses of QC codes, namely QC-LDPC codes, especially when their parity-check matrices are rank deficient.
- Sep 04 2012 cs.SY arXiv:1209.0229v2In this paper, we examine in an abstract framework, how a tradeoff between efficiency and robustness arises in different dynamic oligopolistic market architectures. We consider a market in which there is a monopolistic resource provider and agents that enter and exit the market following a random process. Self-interested and fully rational agents dynamically update their resource consumption decisions over a finite time horizon, under the constraint that the total resource consumption requirements are met before each individual's deadline. We then compare the statistics of the stationary aggregate demand processes induced by the non-cooperative and cooperative load scheduling schemes. We show that although the non-cooperative load scheduling scheme leads to an efficiency loss - widely known as the "price of anarchy" - the stationary distribution of the corresponding aggregate demand process has a smaller tail. This tail, which corresponds to rare and undesirable demand spikes, is important in many applications of interest. On the other hand, when the agents can cooperate with each other in optimizing their total cost, a higher market efficiency is achieved at the cost of a higher probability of demand spikes. We thus posit that the origins of endogenous risk in such systems may lie in the market architecture, which is an inherent characteristic of the system.
- This paper is concerned with general analysis on the rank and row-redundancy of an array of circulants whose null space defines a QC-LDPC code. Based on the Fourier transform and the properties of conjugacy classes and Hadamard products of matrices, we derive tight upper bounds on rank and row-redundancy for general array of circulants, which make it possible to consider row-redundancy in constructions of QC-LDPC codes to achieve better performance. We further investigate the rank of two types of construction of QC-LDPC codes: constructions based on Vandermonde Matrices and Latin Squares and give combinatorial expression of the exact rank in some specific cases, which demonstrates the tightness of the bound we derive. Moreover, several types of new construction of QC-LDPC codes with large row-redundancy are presented and analyzed.
- Dec 20 2010 cs.LG arXiv:1012.3877v1In this paper, we propose a two-timescale delay-optimal dynamic clustering and power allocation design for downlink network MIMO systems. The dynamic clustering control is adaptive to the global queue state information (GQSI) only and computed at the base station controller (BSC) over a longer time scale. On the other hand, the power allocations of all the BSs in one cluster are adaptive to both intra-cluster channel state information (CCSI) and intra-cluster queue state information (CQSI), and computed at the cluster manager (CM) over a shorter time scale. We show that the two-timescale delay-optimal control can be formulated as an infinite-horizon average cost Constrained Partially Observed Markov Decision Process (CPOMDP). By exploiting the special problem structure, we shall derive an equivalent Bellman equation in terms of Pattern Selection Q-factor to solve the CPOMDP. To address the distributive requirement and the issue of exponential memory requirement and computational complexity, we approximate the Pattern Selection Q-factor by the sum of Per-cluster Potential functions and propose a novel distributive online learning algorithm to estimate the Per-cluster Potential functions (at each CM) as well as the Lagrange multipliers (LM) (at each BS). We show that the proposed distributive online learning algorithm converges almost surely (with probability 1). By exploiting the birth-death structure of the queue dynamics, we further decompose the Per-cluster Potential function into sum of Per-cluster Per-user Potential functions and formulate the instantaneous power allocation as a Per-stage QSI-aware Interference Game played among all the CMs. We also propose a QSI-aware Simultaneous Iterative Water-filling Algorithm (QSIWFA) and show that it can achieve the Nash Equilibrium (NE).
- This paper is concerned with construction and structural analysis of both cyclic and quasi-cyclic codes, particularly LDPC codes. It consists of three parts. The first part shows that a cyclic code given by a parity-check matrix in circulant form can be decomposed into descendant cyclic and quasi-cyclic codes of various lengths and rates. Some fundamental structural properties of these descendant codes are developed, including the characterizations of the roots of the generator polynomial of a cyclic descendant code. The second part of the paper shows that cyclic and quasi-cyclic descendant LDPC codes can be derived from cyclic finite geometry LDPC codes using the results developed in first part of the paper. This enlarges the repertoire of cyclic LDPC codes. The third part of the paper analyzes the trapping sets of regular LDPC codes whose parity-check matrices satisfy a certain constraint on their rows and columns. Several classes of finite geometry and finite field cyclic and quasi-cyclic LDPC codes with large minimum weights are shown to have no harmful trapping sets with size smaller than their minimum weights. Consequently, their performance error-floors are dominated by their minimum weights.