results for au:Wu_L in:cs

- Aug 22 2017 cs.CL arXiv:1708.06073v1We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set.
- A signed tree-coloring of a signed graph $(G,\sigma)$ is a vertex coloring $c$ so that $G^{c}(i,\pm)$ is a forest for every $i\in c(u)$ and $u\in V(G)$, where $G^{c}(i,\pm)$ is the subgraph of $(G,\sigma)$ whose vertex set is the set of vertices colored by $i$ or $-i$ and edge set is the set of positive edges with two end-vertices colored both by $i$ or both by $-i$, along with the set of negative edges with one end-vertex colored by $i$ and the other colored by $-i$. If $c$ is a function from $V(G)$ to $M_n$, where $M_n$ is $\{\pm 1,\pm 2,\ldots,\pm k\}$ if $n=2k$, and $\{0,\pm 1,\pm 2,\ldots,\pm k\}$ if $n=2k+1$, then $c$ a signed tree-$n$-coloring of $(G,\sigma)$. The minimum integer $n$ such that $(G,\sigma)$ admits a signed tree-$n$-coloring is the signed vertex arboricity of $(G,\sigma)$, denoted by $va(G,\sigma)$. In this paper, we first show that two switching equivalent signed graphs have the same signed vertex arboricity, and then prove that $va(G,\sigma)\leq 3$ for every balanced signed triangulation and for every edge-maximal $K_5$-minor-free graph with balanced signature. This generalizes the well-known result that the vertex arboricity of every planar graph is at most 3.
- Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority.
- Prior asymptotic performance analyses are based on the series expansion of the moment-generating function (MGF) or the probability density function (PDF) of channel coefficients. However, these techniques fail for lognormal fading channels because the Taylor series of the PDF of a lognormal random variable is zero at the origin and the MGF does not have an explicit form. Although lognormal fading model has been widely applied in wireless communications and free-space optical communications, few analytical tools are available to provide elegant performance expressions for correlated lognormal channels. In this work, we propose a novel framework to analyze the asymptotic outage probabilities of selection combining (SC), equal-gain combining (EGC) and maximum-ratio combining (MRC) over equally correlated lognormal fading channels. Based on these closed-form results, we reveal the followings: i) the outage probability of EGC or MRC becomes an infinitely small quantity compared to that of SC at large signal-to-noise ratio (SNR); ii) channel correlation can result in an infinite performance loss at large SNR. More importantly, the analyses reveal insights into the long-standing problem of performance analyses over correlated lognormal channels at high SNR, and circumvent the time-consuming Monte Carlo simulation and numerical integration.
- Jul 25 2017 cs.CV arXiv:1707.07074v2Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emphfixed representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emphwhat-and-where to match for effective person re-id. To address \emphwhat to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emphwhere to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.
- It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don't. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.
- Jun 13 2017 cs.CV arXiv:1706.03160v1Person re-identification (re-id) aims to match pedestrians observed by disjoint camera views. It attracts increasing attention in computer vision due to its importance to surveillance system. To combat the major challenge of cross-view visual variations, deep embedding approaches are proposed by learning a compact feature space from images such that the Euclidean distances correspond to their cross-view similarity metric. However, the global Euclidean distance cannot faithfully characterize the ideal similarity in a complex visual feature space because features of pedestrian images exhibit unknown distributions due to large variations in poses, illumination and occlusion. Moreover, intra-personal training samples within a local range are robust to guide deep embedding against uncontrolled variations, which however, cannot be captured by a global Euclidean distance. In this paper, we study the problem of person re-id by proposing a novel sampling to mine suitable \textitpositives (\ie intra-class) within a local range to improve the deep embedding in the context of large intra-class variations. Our method is capable of learning a deep similarity metric adaptive to local sample structure by minimizing each sample's local distances while propagating through the relationship between samples to attain the whole intra-class minimization. To this end, a novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep embedding. This yields local discriminations by selecting local-ranged positive samples, and the learned features are robust to dramatic intra-class variations. Experiments on benchmarks show state-of-the-art results achieved by our method.
- In this paper, we study a new learning paradigm for Neural Machine Translation (NMT). Instead of maximizing the likelihood of the human translation as in previous works, we minimize the distinction between human translation and the translation given by an NMT model. To achieve this goal, inspired by the recent success of generative adversarial networks (GANs), we employ an adversarial training architecture and name it as Adversarial-NMT. In Adversarial-NMT, the training of the NMT model is assisted by an adversary, which is an elaborately designed Convolutional Neural Network (CNN). The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human. The goal of the NMT model is to produce high quality translations so as to cheat the adversary. A policy gradient method is leveraged to co-train the NMT model and the adversary. Experimental results on English$\rightarrow$French and German$\rightarrow$English translation tasks show that Adversarial-NMT can achieve significantly better translation quality than several strong baselines.
- The proliferation of social media in communication and information dissemination has made it an ideal platform for spreading rumors. Automatically debunking rumors at their stage of diffusion is known as \textitearly rumor detection, which refers to dealing with sequential posts regarding disputed factual claims with certain variations and highly textual duplication over time. Thus, identifying trending rumors demands an efficient yet flexible model that is able to capture long-range dependencies among postings and produce distinct representations for the accurate early detection. However, it is a challenging task to apply conventional classification algorithms to rumor detection in earliness since they rely on hand-crafted features which require intensive manual efforts in the case of large amount of posts. This paper presents a deep attention model on the basis of recurrent neural networks (RNN) to learn \textitselectively temporal hidden representations of sequential posts for identifying rumors. The proposed model delves soft-attention into the recurrence to simultaneously pool out distinct features with particular focus and produce hidden representations that capture contextual variations of relevant posts over time. Extensive experiments on real datasets collected from social media websites demonstrate that (1) the deep attention based RNN model outperforms state-of-the-arts that rely on hand-crafted features; (2) the introduction of soft attention mechanism can effectively distill relevant parts to rumors from original posts in advance; (3) the proposed method detects rumors more quickly and accurately than competitors.
- Apr 13 2017 cs.AI arXiv:1704.03612v1In this paper, we develop a novel paradigm, namely hypergraph shift, to find robust graph modes by probabilistic voting strategy, which are semantically sound besides the self-cohesiveness requirement in forming graph modes. Unlike the existing techniques to seek graph modes by shifting vertices based on pair-wise edges (i.e, an edge with $2$ ends), our paradigm is based on shifting high-order edges (hyperedges) to deliver graph modes. Specifically, we convert the problem of seeking graph modes as the problem of seeking maximizers of a novel objective function with the aim to generate good graph modes based on sifting edges in hypergraphs. As a result, the generated graph modes based on dense subhypergraphs may more accurately capture the object semantics besides the self-cohesiveness requirement. We also formally prove that our technique is always convergent. Extensive empirical studies on synthetic and real world data sets are conducted on clustering and graph matching. They demonstrate that our techniques significantly outperform the existing techniques.
- Feb 15 2017 cs.CV arXiv:1702.04179v2Given a pedestrian image as a query, the purpose of person re-identification is to identify the correct match from a large collection of gallery images depicting the same person captured by disjoint camera views. The critical challenge is how to construct a robust yet discriminative feature representation to capture the compounded variations in pedestrian appearance. To this end, deep learning methods have been proposed to extract hierarchical features against extreme variability of appearance. However, existing methods in this category generally neglect the efficiency in the matching stage whereas the searching speed of a re-identification system is crucial in real-world applications. In this paper, we present a novel deep hashing framework with Convolutional Neural Networks (CNNs) for fast person re-identification. Technically, we simultaneously learn both CNN features and hash functions/codes to get robust yet discriminative features and similarity-preserving hash codes. Thereby, person re-identification can be resolved by efficiently computing and ranking the Hamming distances between images. A structured loss function defined over positive pairs and hard negatives is proposed to formulate a novel optimization problem so that fast convergence and more stable optimized solution can be obtained. Extensive experiments on two benchmarks CUHK03 \citeFPNN and Market-1501 \citeMarket1501 show that the proposed deep architecture is efficacy over state-of-the-arts.
- A considerable amount of machine learning algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with equal or unequal lengths to a matrix format. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation. The learned feature representation is particularly suitable to the class of learning problems that are sensitive to data similarities. Given a set of $n$ time series, we first construct an $n\times n$ partially observed similarity matrix by randomly sampling $O(n \log n)$ pairs of time series and computing their pairwise similarities. We then propose an extremely efficient algorithm that solves a highly non-convex and NP-hard problem to learn new features based on the partially observed similarity matrix. We use the learned features to conduct experiments on both data classification and clustering tasks. Our extensive experimental results demonstrate that the proposed framework is both effective and efficient.
- Jan 19 2017 cs.CV arXiv:1701.05003v1Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may subsequently yield very different results. In fact, dealing with the landmarks with \illshapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper we propose a novel framework, namely multi-query expansions, to retrieve semantically robust landmarks by two steps. Firstly, we identify the top-$k$ photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible \illshape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical \emphcollaborative filtering methods, we propose to learn a \emphcollaborative deep networks based semantically, nonlinear and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over \emphcollaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods.
- Dec 07 2016 cs.CV arXiv:1612.01655v1Boundary incompleteness raises great challenges to automatic prostate segmentation in ultrasound images. Shape prior can provide strong guidance in estimating the missing boundary, but traditional shape models often suffer from hand-crafted descriptors and local information loss in the fitting procedure. In this paper, we attempt to address those issues with a novel framework. The proposed framework can seamlessly integrate feature extraction and shape prior exploring, and estimate the complete boundary with a sequential manner. Our framework is composed of three key modules. Firstly, we serialize the static 2D prostate ultrasound images into dynamic sequences and then predict prostate shapes by sequentially exploring shape priors. Intuitively, we propose to learn the shape prior with the biologically plausible Recurrent Neural Networks (RNNs). This module is corroborated to be effective in dealing with the boundary incompleteness. Secondly, to alleviate the bias caused by different serialization manners, we propose a multi-view fusion strategy to merge shape predictions obtained from different perspectives. Thirdly, we further implant the RNN core into a multiscale Auto-Context scheme to successively refine the details of the shape prediction map. With extensive validation on challenging prostate ultrasound images, our framework bridges severe boundary incompleteness and achieves the best performance in prostate boundary delineation when compared with several advanced methods. Additionally, our approach is general and can be extended to other medical image segmentation tasks, where boundary incompleteness is one of the main challenges.
- Nov 18 2016 cs.LG arXiv:1611.05521v1Learning hash functions/codes for similarity search over multi-view data is attracting increasing attention, where similar hash codes are assigned to the data objects characterizing consistently neighborhood relationship across views. Traditional methods in this category inherently suffer three limitations: 1) they commonly adopt a two-stage scheme where similarity matrix is first constructed, followed by a subsequent hash function learning; 2) these methods are commonly developed on the assumption that data samples with multiple representations are noise-free,which is not practical in real-life applications; 3) they often incur cumbersome training model caused by the neighborhood graph construction using all $N$ points in the database ($O(N)$). In this paper, we motivate the problem of jointly and efficiently training the robust hash functions over data objects with multi-feature representations which may be noise corrupted. To achieve both the robustness and training efficiency, we propose an approach to effectively and efficiently learning low-rank kernelized \footnoteWe use kernelized similarity rather than kernel, as it is not a squared symmetric matrix for data-landmark affinity matrix. hash functions shared across views. Specifically, we utilize landmark graphs to construct tractable similarity matrices in multi-views to automatically discover neighborhood structure in the data. To learn robust hash functions, a latent low-rank kernel function is used to construct hash functions in order to accommodate linearly inseparable data. In particular, a latent kernelized similarity matrix is recovered by rank minimization on multiple kernel-based similarity matrices. Extensive experiments on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions.
- In this paper we investigate the aesthetic image classification problem, also known as automatically classifying an image into low or high aesthetic quality, which is quite a challenging problem. Considering both the local and global information of images is quite important for image aesthetic quality assessment. Currently, a powerful inception module is proposed which shows very high performance in object classification. We have the observation that the inception module has the ability of considering both the local and global features in nature. Thus, in this paper, we propose a novel DCNN structure codenamed ILGNet for image aesthetics classification, which introduces the Inception module and connects intermediate Local layers to the Global layer for the output. In addition, the ILGNet is derived from part of the GoogLeNet. Thus, we can easily use a pre-trained image classification GoogleLeNet model on the ImageNet dataset and fine tune our connected local and global layer on the large scale aesthetics assessment AVA dataset. The experimental results show that the proposed ILGNet outperforms the state of the art results in image aesthetics assessment in the AVA benchmark. The time cost of both training and test of the ILGNet are significantly less than those of full GoogLeNet with only a little reduction of the classification accuracy. Our ILGNet can achieve similar classification accuracy as that of 2/3 GoogLeNet, whose computational cost is nearly twice of ours. This makes the aesthetic assessment model more easily to be integrated into mobile and embedded systems.
- Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emphcomplete robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.
- Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries to learn a common low-dimensional subspace for multi-view data, while inattentively ignoring the local manifold structure in each view, which is critically important to the spectral clustering; worse still, the low-rank minimization is enforced to achieve the data correlation consensus among all views, failing to flexibly preserve the local manifold structure for each view. In this paper, 1) we propose a multi-graph laplacian regularized LRR with each graph laplacian corresponding to one view to characterize its local manifold structure. 2) Instead of directly enforcing the low-rank minimization among all views for correlation consensus, we separately impose low-rank constraint on each view, coupled with a mutual structural consensus constraint, where it is able to not only well preserve the local manifold structure but also serve as a constraint for that from other views, which iteratively makes the views more agreeable. Extensive experiments on real-world multi-view data sets demonstrate its superiority.
- Aug 19 2016 cs.CL arXiv:1608.05129v1Sentiment in social media is increasingly considered as an important resource for customer segmentation, market understanding, and tackling other socio-economic issues. However, sentiment in social media is difficult to measure since user-generated content is usually short and informal. Although many traditional sentiment analysis methods have been proposed, identifying slang sentiment words remains untackled. One of the reasons is that slang sentiment words are not available in existing dictionaries or sentiment lexicons. To this end, we propose to build the first sentiment dictionary of slang words to aid sentiment analysis of social media content. It is laborious and time-consuming to collect and label the sentiment polarity of a comprehensive list of slang words. We present an approach to leverage web resources to construct an extensive Slang Sentiment word Dictionary (SlangSD) that is easy to maintain and extend. SlangSD is publicly available for research purposes. We empirically show the advantages of using SlangSD, the newly-built slang sentiment word dictionary for sentiment classification, and provide examples demonstrating its ease of use with an existing sentiment system.
- The increasing number of applications requiring the solution of large scale singular value problems have rekindled interest in iterative methods for the SVD. Some promising recent ad- vances in large scale iterative methods are still plagued by slow convergence and accuracy limitations for computing smallest singular triplets. Furthermore, their current implementations in MATLAB cannot address the required large problems. Recently, we presented a preconditioned, two-stage method to effectively and accurately compute a small number of extreme singular triplets. In this research, we present a high-performance software, PRIMME SVDS, that implements our hybrid method based on the state-of-the-art eigensolver package PRIMME for both largest and smallest singular values. PRIMME SVDS fills a gap in production level software for computing the partial SVD, especially with preconditioning. The numerical experiments demonstrate its superior performance compared to other state-of-the-art software and its good parallel performance under strong and weak scaling.
- Jun 07 2016 cs.CV arXiv:1606.01595v1Person re-identification is to seek a correct match for a person of interest across views among a large number of imposters. It typically involves two procedures of non-linear feature extractions against dramatic appearance changes, and subsequent discriminative analysis in order to reduce intra- personal variations while enlarging inter-personal differences. In this paper, we introduce a hybrid architecture which combines Fisher vectors and deep neural networks to learn non-linear representations of person images to a space where data can be linearly separable. We reinforce a Linear Discriminant Analysis (LDA) on top of the deep neural network such that linearly separable latent representations can be learnt in an end-to-end fashion. By optimizing an objective function modified from LDA, the network is enforced to produce feature distributions which have a low variance within the same class and high variance between classes. The objective is essentially derived from the general LDA eigenvalue problem and allows to train the network with stochastic gradient descent and back-propagate LDA gradients to compute the gradients involved in Fisher vector encoding. For evaluation we test our approach on four benchmark data sets in person re-identification (VIPeR [1], CUHK03 [2], CUHK01 [3], and Market1501 [4]). Extensive experiments on these benchmarks show that our model can achieve state-of-the-art results.
- Jun 07 2016 cs.CV arXiv:1606.01609v2In this paper, we present an end-to-end approach to simultaneously learn spatio-temporal features and corresponding similarity metric for video-based person re-identification. Given the video sequence of a person, features from each frame that are extracted from all levels of a deep convolutional network can preserve a higher spatial resolution from which we can model finer motion patterns. These low-level visual percepts are leveraged into a variant of recurrent model to characterize the temporal variation between time-steps. Features from all time-steps are then summarized using temporal pooling to produce an overall feature representation for the complete sequence. The deep convolutional network, recurrent layer, and the temporal pooling are jointly trained to extract comparable hidden-unit representations from input pair of time series to compute their corresponding similarity value. The proposed framework combines time series modeling and metric learning to jointly learn relevant features and a good similarity measure between time sequences of person. Experiments demonstrate that our approach achieves the state-of-the-art performance for video-based person re-identification on iLIDS-VID and PRID 2011, the two primary public datasets for this purpose.
- With the widespread use of mobile computing devices in contemporary society, our trajectories in the physical space and virtual world are increasingly closely connected. Using the anonymous smartphone data of $1 \times 10^5$ users in 30 days, we constructed the mobility network and the attention network to study the correlations between online and offline human behaviours. In the mobility network, nodes are physical locations and edges represent the movements between locations, and in the attention network, nodes are websites and edges represent the switch of users between websites. We apply the box-covering method to renormalise the networks. The investigated network properties include the size of box $l_B$ and the number of boxes $N(l_B)$. We find two universal classes of behaviours: the mobility network is featured by a small-world property, $N(l_B) \simeq e^{-l_B}$, whereas the attention network is characterised by a self-similar property $N(l_B) \simeq l_B^{-\gamma}$. In particular, with the increasing of the length of box $l_B$, the degree correlation of the network changes from positive to negative which indicates that there are two layers of structure in the mobility network. We use the results of network renormalisation to detect the community and map the structure of the mobility network. Further, we located the most relevant websites visited in these communities, and identified three typical location-based behaviours, including the shopping, dating, and taxi-calling. Finally, we offered a revised geometric network model to explain our findings in the perspective of spatial-constrained attachment.
- Jan 28 2016 cs.CV arXiv:1601.07255v2In this paper, we propose a deep end-to-end neu- ral network to simultaneously learn high-level features and a corresponding similarity metric for person re-identification. The network takes a pair of raw RGB images as input, and outputs a similarity value indicating whether the two input images depict the same person. A layer of computing neighborhood range differences across two input images is employed to capture local relationship between patches. This operation is to seek a robust feature from input images. By increasing the depth to 10 weight layers and using very small (3$\times$3) convolution filters, our architecture achieves a remarkable improvement on the prior-art configurations. Meanwhile, an adaptive Root- Mean-Square (RMSProp) gradient decent algorithm is integrated into our architecture, which is beneficial to deep nets. Our method consistently outperforms state-of-the-art on two large datasets (CUHK03 and Market-1501), and a medium-sized data set (CUHK01).
- Nov 30 2015 cs.CV arXiv:1511.08531v2Matching individuals across non-overlapping camera networks, known as person re-identification, is a fundamentally challenging problem due to the large visual appearance changes caused by variations of viewpoints, lighting, and occlusion. Approaches in literature can be categoried into two streams: The first stream is to develop reliable features against realistic conditions by combining several visual features in a pre-defined way; the second stream is to learn a metric from training data to ensure strong inter-class differences and intra-class similarities. However, seeking an optimal combination of visual features which is generic yet adaptive to different benchmarks is a unsoved problem, and metric learning models easily get over-fitted due to the scarcity of training data in person re-identification. In this paper, we propose two effective structured learning based approaches which explore the adaptive effects of visual features in recognizing persons in different benchmark data sets. Our framework is built on the basis of multiple low-level visual features with an optimal ensemble of their metrics. We formulate two optimization algorithms, CMCtriplet and CMCstruct, which directly optimize evaluation measures commonly used in person re-identification, also known as the Cumulative Matching Characteristic (CMC) curve.
- Nov 25 2015 physics.soc-ph cs.SI arXiv:1511.07616v1To uncover the mechanisms underlying the collaborative production of knowledge, we investigate a very large online Question and Answer system that includes the question asking and answering activities of millions of users over five years. We created knowledge networks in which nodes are questions and edges are the successive answering activities of users. We find that these networks have two common properties: 1) the mitigation of degree inequality among nodes; and 2) the assortative mixing of nodes. This means that, while the system tends to reduce attention investment on old questions in order to supply sufficient attention to new questions, it is not easy for novel knowledge be integrated into the existing body of knowledge. We propose a mixing model to combine preferential attachment and reversed preferential attachment processes to model the evolution of knowledge networks and successfully reproduce the ob- served patterns. Our mixing model is not only theoretically interesting but also provide insights into the management of online communities.
- Oct 13 2015 cs.CY arXiv:1510.03247v2The gerrymandering problem is a worldwide problem which sets great threat to democracy and justice in district based elections. Thanks to partisan redistricting commissions, district boundaries are often manipulated to benefit incumbents. Since an independent commission is hard to come by, the possibility of impartially generating districts with a computer is explored in this thesis. We have developed an algorithm to randomly produce legal redistricting schemes for Pennsylvania.
- Sep 18 2015 physics.soc-ph cs.SI arXiv:1509.05083v3Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze StackExchange, one of the largest question and answer systems in the world. We construct attention networks to model the growth of 110 communities in the StackExchange system and quantify individual answering strategies using the linking dynamics of attention networks. We identify two types of users taking different strategies. One strategy (type A) aims at performing maintenance by doing simple tasks, while the other strategy (type B) aims investing time in doing challenging tasks. We find that the number of type A needs to be twice as big as type B users for a sustainable growth of communities.
- A number of applications require the computation of the trace of a matrix that is implicitly available through a function. A common example of a function is the inverse of a large, sparse matrix, which is the focus of this paper. When the evaluation of the function is expensive, the task is computationally challenging because the standard approach is based on a Monte Carlo method which converges slowly. We present a different approach that exploits the pattern correlation, if present, between the diagonal of the inverse of the matrix and the diagonal of some approximate inverse that can be computed inexpensively. We leverage various sampling and fitting techniques to fit the diagonal of the approximation to the diagonal of the inverse. Depending on the quality of the approximate inverse, our method may serve as a standalone kernel for providing a fast trace estimate with a small number of samples. Furthermore, the method can be used as a variance reduction method for Monte Carlo in some cases. This is decided dynamically by our algorithm. An extensive set of experiments with various technique combinations on several matrices from some real applications demonstrate the potential of our method.
- This paper discussed some job scheduling algorithms for Hadoop platform, and proposed a jobs scheduling optimization algorithm based on Bayes Classification viewing the shortcoming of those algorithms which are used. The proposed algorithm can be summarized as follows. In the scheduling algorithm based on Bayes Classification, the jobs in job queue will be classified into bad job and good job by Bayes Classification, when JobTracker gets task request, it will select a good job from job queue, and select tasks from good job to allocate JobTracker, then the execution result will feedback to the JobTracker. Therefore the scheduling algorithm based on Bayes Classification influence the job classification via learning the result of feedback with the JobTracker will select the most appropriate job to execute on TaskTracker every time. We need to consider the feature usage of job resource and the influence of TaskTracker resource on task execution, the former of which we call it job feature, for instance, the average usage rate of CPU and average usage rate of memory, the latter node feature, such as the usage rate of CPU and the size of idle physical memory, the two are called feature variables. Results show that it has a significant improvement in execution efficiency and stability of job scheduling.
- A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.
- Potts model based on a Markov process computation solves the community structure problem effectivelyMar 30 2015 physics.soc-ph cs.SI arXiv:1503.08035v1Potts model is a powerful tool to uncover community structure in complex networks. Here, we propose a new framework to reveal the optimal number of communities and stability of network structure by quantitatively analyzing the dynamics of Potts model. Specifically we model the community structure detection Potts procedure by a Markov process, which has a clear mathematical explanation. Then we show that the local uniform behavior of spin values across multiple timescales in the representation of the Markov variables could naturally reveal the network's hierarchical community structure. In addition, critical topological information regarding to multivariate spin configuration could also be inferred from the spectral signatures of the Markov process. Finally an algorithm is developed to determine fuzzy communities based on the optimal number of communities and the stability across multiple timescales. The effectiveness and efficiency of our algorithm are theoretically analyzed as well as experimentally validated.
- It is proposed a class of statistical estimators $\hat H =(\hat H_1, \ldots, \hat H_d)$ for the Hurst parameters $H=(H_1, \ldots, H_d)$ of fractional Brownian field via multi-dimensional wavelet analysis and least squares, which are asymptotically normal. These estimators can be used to detect self-similarity and long-range dependence in multi-dimensional signals, which is important in texture classification and improvement of diffusion tensor imaging (DTI) of nuclear magnetic resonance (NMR). Some fractional Brownian sheets will be simulated and the simulated data are used to validate these estimators. We find that when $H_i \geq 1/2$, the estimators are efficient, and when $H_i < 1/2$, there are some bias.
- The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which is generated by collecting words from comments, identifiers, and descriptions of programs. However, in addition to these semantic information, there are rich information embedded in source codes themselves. These source codes, if analyzed properly, can be a rich source for enhancing the efforts of software artifact retrieval. To this end, in this paper, we develop a feature extraction method on source codes. Specifically, this method can capture both the inherent information in the source codes and the semantic information hidden in the comments, descriptions, and identifiers of the source codes. Moreover, we design a heterogeneous metric learning approach, which allows to integrate code features and text features into the same latent semantic space. This, in turn, can help to measure the artifact similarity by exploiting the joint power of both code and text features. Finally, extensive experiments on real-world data show that the proposed method can help to improve the performances of software artifact retrieval with a significant margin.
- The computation of a few singular triplets of large, sparse matrices is a challenging task, especially when the smallest magnitude singular values are needed in high accuracy. Most recent efforts try to address this problem through variations of the Lanczos bidiagonalization method, but they are still challenged even for medium matrix sizes due to the difficulty of the problem. We propose a novel SVD approach that can take advantage of preconditioning and of any well designed eigensolver to compute both largest and smallest singular triplets. Accuracy and efficiency is achieved through a hybrid, two-stage meta-method, PHSVDS. In the first stage, PHSVDS solves the normal equations up to the best achievable accuracy. If further accuracy is required, the method switches automatically to an eigenvalue problem with the augmented matrix. Thus it combines the advantages of the two stages, faster convergence and accuracy, respectively. For the augmented matrix, solving the interior eigenvalue is facilitated by a proper use of the good initial guesses from the first stage and an efficient implementation of the refined projection method. We also discuss how to precondition PHSVDS and to cope with some issues that arise. Numerical experiments illustrate the efficiency and robustness of the method.
- Jul 17 2014 physics.soc-ph cs.SI arXiv:1407.4194v1Travel activities have been widely applied to quantify spatial interactions between places, regions and nations. In this paper, we model the spatial connectivities between 652 Traffic Analysis Zones (TAZs) in Beijing by a taxi OD dataset. First, we unveil the gravitational structure of intra-urban spatial connectivities of Beijing. On overall, the inter-TAZ interactions are well governed by the Gravity Model $G_{ij} = {\lambda}p_{i}p_{j}/d_{ij}$, where $p_{i}$, $p_{j}$ are degrees of TAZ $i$, $j$ and $d_{ij}$ the distance between them, with a goodness-of-fit around 0.8. Second, the network based analysis well reveals the polycentric form of Beijing. Last, we detect the semantics of inter-TAZ connectivities based on their spatiotemporal patterns. We further find that inter-TAZ connections deviating from the Gravity Model can be well explained by link semantics.
- Currently, connectomes (e.g., functional or structural brain graphs) can be estimated in humans at $\approx 1~mm^3$ scale using a combination of diffusion weighted magnetic resonance imaging, functional magnetic resonance imaging and structural magnetic resonance imaging scans. This manuscript summarizes a novel, scalable implementation of open-source algorithms to rapidly estimate magnetic resonance connectomes, using both anatomical regions of interest (ROIs) and voxel-size vertices. To assess the reliability of our pipeline, we develop a novel nonparametric non-Euclidean reliability metric. Here we provide an overview of the methods used, demonstrate our implementation, and discuss available user extensions. We conclude with results showing the efficacy and reliability of the pipeline over previous state-of-the-art.
- Recent studies uncovered important core/periphery network structures characterizing complex sets of cooperative and competitive interactions between network nodes, be they proteins, cells, species or humans. Better characterization of the structure, dynamics and function of core/periphery networks is a key step of our understanding cellular functions, species adaptation, social and market changes. Here we summarize the current knowledge of the structure and dynamics of "traditional" core/periphery networks, rich-clubs, nested, bow-tie and onion networks. Comparing core/periphery structures with network modules, we discriminate between global and local cores. The core/periphery network organization lies in the middle of several extreme properties, such as random/condensed structures, clique/star configurations, network symmetry/asymmetry, network assortativity/disassortativity, as well as network hierarchy/anti-hierarchy. These properties of high complexity together with the large degeneracy of core pathways ensuring cooperation and providing multiple options of network flow re-channelling greatly contribute to the high robustness of complex systems. Core processes enable a coordinated response to various stimuli, decrease noise, and evolve slowly. The integrative function of network cores is an important step in the development of a large variety of complex organisms and organizations. In addition to these important features and several decades of research interest, studies on core/periphery networks still have a number of unexplored areas.
- We view web forums as virtual living organisms feeding on user's attention and investigate how these organisms grow at the expense of collective attention. We find that the "body mass" ($PV$) and "energy consumption" ($UV$) of the studied forums exhibits the allometric growth property, i.e., $PV_t \sim UV_t ^ \theta$. This implies that within a forum, the network transporting attention flow between threads has a structure invariant of time, despite of the continuously changing of the nodes (threads) and edges (clickstreams). The observed time-invariant topology allows us to explain the dynamics of networks by the behavior of threads. In particular, we describe the clickstream dissipation on threads using the function $D_i \sim T_i ^ \gamma$, in which $T_i$ is the clickstreams to node $i$ and $D_i$ is the clickstream dissipated from $i$. It turns out that $\gamma$, an indicator for dissipation efficiency, is negatively correlated with $\theta$ and $1/\gamma$ sets the lower boundary for $\theta$. Our findings have practical consequences. For example, $\theta$ can be used as a measure of the "stickiness" of forums, because it quantifies the stable ability of forums to convert $UV$ into $PV$, i.e., to remain users "lock-in" the forum. Meanwhile, the correlation between $\gamma$ and $\theta$ provides a convenient method to evaluate the `stickiness" of forums. Finally, we discuss an optimized "body mass" of forums at around $10^5$ that minimizes $\gamma$ and maximizes $\theta$.
- In this paper, we investigate the possibility to use two tilings of the hyperbolic plane as basic frame for devising a way to input texts in Chinese characters into messages of cellphones, smartphones, ipads and tablets.
- Mar 12 2013 cs.CC cond-mat.dis-nn arXiv:1303.2413v1The random 3-satisfiability (3-SAT) problem is in the unsatisfiable (UNSAT) phase when the clause density $\alpha$ exceeds a critical value $\alpha_s \approx 4.267$. However, rigorously proving the unsatisfiability of a given large 3-SAT instance is extremely difficult. In this paper we apply the mean-field theory of statistical physics to the unsatisfiability problem, and show that a specific type of UNSAT witnesses (Feige-Kim-Ofek witnesses) can in principle be constructed when the clause density $\alpha > 19$. We then construct Feige-Kim-Ofek witnesses for single 3-SAT instances through a simple random sampling algorithm and a focused local search algorithm. The random sampling algorithm works only when $\alpha$ scales at least linearly with the variable number $N$, but the focused local search algorithm works for clause densty $\alpha > c N^{b}$ with $b \approx 0.59$ and prefactor $c \approx 8$. The exponent $b$ can be further decreased by enlarging the single parameter $S$ of the focused local search algorithm.
- In this paper, a novel carrier frequency offset estimation approach, including preamble structure, carrier frequency offset estimation algorithm, is proposed for hexagonal multi-carrier transmission (HMCT) system. The closed-form Cramer-Rao lower bound of the proposed carrier frequency offset estimation scheme is given. Theoretical analyses and simulation results show that the proposed preamble structure and carrier frequency offset estimation algorithm for HMCT system obtains an approximation to the Cramer-Rao lower bound mean square error (MSE) performance over the doubly dispersive (DD) propagation channel.
- The discovery of new and interesting patterns in large datasets, known as data mining, draws more and more interest as the quantities of available data are exploding. Data mining techniques may be applied to different domains and fields such as computer science, health sector, insurances, homeland security, banking and finance, etc. In this paper we are interested by the discovery of a specific category of patterns, known as rare and non-present patterns. We present a novel approach towards the discovery of non-present patterns using rare item-set mining.
- Feb 01 2012 cs.DB arXiv:1201.6564v1Computing the shortest path between two given locations in a road network is an important problem that finds applications in various map services and commercial navigation products. The state-of-the-art solutions for the problem can be divided into two categories: spatial-coherence-based methods and vertex-importance-based approaches. The two categories of techniques, however, have not been compared systematically under the same experimental framework, as they were developed from two independent lines of research that do not refer to each other. This renders it difficult for a practitioner to decide which technique should be adopted for a specific application. Furthermore, the experimental evaluation of the existing techniques, as presented in previous work, falls short in several aspects. Some methods were tested only on small road networks with up to one hundred thousand vertices; some approaches were evaluated using distance queries (instead of shortest path queries), namely, queries that ask only for the length of the shortest path; a state-of-the-art technique was examined based on a faulty implementation that led to incorrect query results. To address the above issues, this paper presents a comprehensive comparison of the most advanced spatial-coherence-based and vertex-importance-based approaches. Using a variety of real road networks with up to twenty million vertices, we evaluated each technique in terms of its preprocessing time, space consumption, and query efficiency (for both shortest path and distance queries). Our experimental results reveal the characteristics of different techniques, based on which we provide guidelines on selecting appropriate methods for various scenarios.
- The core of the Web is a hyperlink navigation system collaboratively set up by webmasters to help users find desired websites. But does this system really work as expected? We show that the answer seems to be negative: there is a substantial mismatch between hyperlinks and the pathways that users actually take. A closer look at empirical surfing activities reveals the reason of the mismatch: webmasters try to build a global virtual world without geographical or cultural boundaries, but users in fact prefer to navigate within more fragmented, language-based groups of websites. We call this type of behavior "preferential navigation" and find that it is driven by "local" search engines.
- Background: The collective browsing behavior of users gives rise to a flow network transporting attention between websites. By analyzing the structure of this network we uncovered a nontrivial scaling regularity concerning the impact of websites. Methodology: We constructed three clickstreams networks, whose nodes were websites and edges were formed by the users switching between sites. We developed an indicator Ci as a measure of the impact of site i and investigated its correlation with the traffic of the site Ai both on the three networks and across the language communities within the networks. Conclusions: We found that the impact of websites increased slower than their traffic. Specifically, there existed a scaling relationship between Ci and Ai with an exponent gamma smaller than 1. We suggested that this scaling relationship characterized the decentralized structure of the clickstream circulation: the World Wide Web is a system that favors small sites in reassigning the collective attention of users.
- Allometric growth is found in many tagging systems online. That is, the number of new tags (T) is a power law function of the active population (P), or T P^gamma (gamma!=1). According to previous studies, it is the heterogeneity in individual tagging behavior that gives rise to allometric growth. These studies consider the power-law distribution model with an exponent beta, regarding 1/beta as an index for heterogeneity. However, they did not discuss whether power-law is the only distribution that leads to allometric growth, or equivalently, whether the positive correlation between heterogeneity and allometric growth holds in systems of distributions other than power-law. In this paper, the authors systematically examine the growth pattern of systems of six different distributions, and find that both power-law distribution and log-normal distribution lead to allometric growth. Furthermore, by introducing Shannon entropy as an indicator for heterogeneity instead of 1/beta, the authors confirm that the positive relationship between heterogeneity and allometric growth exists in both cases of power-law and log-normal distributions.
- Apr 06 2011 physics.soc-ph cs.SI arXiv:1104.0742v3Research on human online activities usually assumes that total activity $T$ increases linearly with active population $P$, that is, $T\propto P^{\gamma}(\gamma=1)$. However, we find examples of systems where total activity grows faster than active population. Our study shows that the power law relationship $T\propto P^{\gamma}(\gamma>1)$ is in fact ubiquitous in online activities such as micro-blogging, news voting and photo tagging. We call the pattern "accelerating growth" and find it relates to a type of distribution that changes with system size. We show both analytically and empirically how the growth rate $\gamma$ associates with a scaling parameter $b$ in the size-dependent distribution. As most previous studies explain accelerating growth by power law distribution, the model of size-dependent distribution is novel and worth further exploration.
- Research on the growth of online tagging systems not only is interesting in its own right, but also yields insights for website management and semantic web analysis. Traditional models that describing the growth of online systems can be divided between linear and nonlinear versions. Linear models, including the BA model (Brabasi and Albert, 1999), assume that the average activity of users is a constant independent of population. Hence the total activity is a linear function of population. On the contrary, nonlinear models suggest that the average activity is affected by the size of the population and the total activity is a nonlinear function of population. In the current study, supporting evidences for the nonlinear growth assumption are obtained from data on Internet users' tagging behavior. A power law relationship between the number of new tags (F) and the population (P), which can be expressed as F ~ P ^ gamma (gamma > 1), is found. I call this pattern accelerating growth and find it relates the to time-invariant heterogeneity in individual activities. I also show how a greater heterogeneity leads to a faster growth.
- Dec 20 2010 cs.CV arXiv:1012.3802v1This chapter presents a framework for detecting fake regions by using various methods including watermarking technique and blind approaches. In particular, we describe current categories on blind approaches which can be divided into five: pixel-based techniques, format-based techniques, camera-based techniques, physically-based techniques and geometric-based techniques. Then we take a second look on the geometric-based techniques and further categorize them in detail. In the following section, the state-of-the-art methods involved in the geometric technique are elaborated.
- Oct 15 2010 cs.DC arXiv:1010.2881v1In recent years, extensive research has been conducted in the area of Service Level Agreement (SLA) for utility computing systems. An SLA is a formal contract used to guarantee that consumers' service quality expectation can be achieved. In utility computing systems, the level of customer satisfaction is crucial, making SLAs significantly important in these environments. Fundamental issue is the management of SLAs, including SLA autonomy management or trade off among multiple Quality of Service (QoS) parameters. Many SLA languages and frameworks have been developed as solutions; however, there is no overall classification for these extensive works. Therefore, the aim of this chapter is to present a comprehensive survey of how SLAs are created, managed and used in utility computing environment. We discuss existing use cases from Grid and Cloud computing systems to identify the level of SLA realization in state-of-art systems and emerging challenges for future research.
- Apr 17 2006 cs.CV arXiv:cs/0604062v1Feature extraction and matching are among central problems of computer vision. It is inefficent to search features over all locations and scales. Neurophysiological evidence shows that to locate objects in a digital image the human visual system employs visual attention to a specific object while ignoring others. The brain also has a mechanism to search from coarse to fine. In this paper, we present a feature extractor and an associated hierarchical searching model to simulate such processes. With the hierarchical representation of the object, coarse scanning is done through the matching of the larger scale and precise localization is conducted through the matching of the smaller scale. Experimental results justify the proposed model in its effectiveness and efficiency to localize features.
- Folksonomy is an emerging technology that works to classify the information over WWW through tagging the bookmarks, photos or other web-based contents. It is understood to be organized by every user while not limited to the authors of the contents and the professional editors. This study surveyed the folksonomy as a complex network. The result indicates that the network, which is composed of the tags from the folksonomy, displays both properties of small world and scale-free. However, the statistics only shows a local and static slice of the vast body of folksonomy which is still evolving.