results for au:He_X in:cs

- May 16 2017 cs.SI arXiv:1705.04969v1Embedding network data into a low-dimensional vector space has shown promising performance for many real-world applications, such as node classification and entity retrieval. However, most existing methods focused only on leveraging network structure. For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks. These rich attribute information of social actors reveal the homophily effect, exerting huge impacts on the formation of social networks. In this paper, we explore the rich evidence source of attributes in social networks to improve network embedding. We propose a generic Social Network Embedding framework (SNE), which learns representations for social actors (i.e., nodes) by preserving both the structural proximity and attribute proximity. While the structural proximity captures the global network structure, the attribute proximity accounts for the homophily effect. To justify our proposal, we conduct extensive experiments on four real-world social networks. Compared to the state-of-the-art network embedding approaches, SNE can learn more informative representations, achieving substantial gains on the tasks of link prediction and node classification. Specifically, SNE significantly outperforms node2vec with an 8.2% relative improvement on the link prediction task, and a 12.7% gain on the node classification task.
- May 11 2017 cs.OS arXiv:1705.03591v1Imagining a disk which provides baseline performance at a relatively low price during low-load periods, but when workloads demand more resources, the disk performance is automatically promoted in situ and in real time. In a hardware era, this is hardly achievable. However, this imagined disk is becoming reality due to the technical advances of software-defined storage, which enable volume performance to be adjusted on the fly. We propose IOTune, a resource management middleware which employs software-defined storage primitives to implement G-states of virtual block devices. G-states enable virtual block devices to serve at multiple performance gears, getting rid of conflicts between immutable resource reservation and dynamic resource demands, and always achieving resource right-provisioning for workloads. Accompanying G-states, we also propose a new block storage pricing policy for cloud providers. Our case study for applying G-states to cloud block storage verifies the effectiveness of the IOTune framework. Trace-replay based evaluations demonstrate that storage volumes with G-states adapt to workload fluctuations. For tenants, G-states enable volumes to provide much better QoS with a same cost of ownership, comparing with static IOPS provisioning and the I/O credit mechanism. G-states also reduce I/O tail latencies by one to two orders of magnitude. From the standpoint of cloud providers, G-states promote storage utilization, creating values and benefiting competitiveness. G-states supported by IOTune provide a new paradigm for storage resource management and pricing in multi-tenant clouds.
- Apr 21 2017 cs.CL arXiv:1704.06217v1This paper addresses the problem of predicting popularity of comments in an online discussion forum using reinforcement learning, particularly addressing two challenges that arise from having natural language state and action spaces. First, the state representation, which characterizes the history of comments tracked in a discussion at a particular point, is augmented to incorporate the global context represented by discussions on world events available in an external knowledge source. Second, a two-stage Q-learning framework is introduced, making it feasible to search the combinatorial action space while also accounting for redundancy among sub-actions. We experiment with five Reddit communities, showing that the two methods improve over previous reported results on this task.
- Apr 11 2017 cs.CV arXiv:1704.02792v2Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising results, these methods mainly have two limitations: (1) not all the parts which obtained through the part detection models are beneficial and indispensable for classification, and (2) fine-grained image classification requires more detailed visual descriptions which could not be provided by the part locations or attribute annotations. For addressing the above two limitations, this paper proposes the two-stream model combining vision and language (CVL) for learning latent semantic representations. The vision stream learns deep representations from the original visual information via deep convolutional neural network. The language stream utilizes the natural language descriptions which could point out the discriminative parts or characteristics for each image, and provides a flexible and compact way of encoding the salient visual aspects for distinguishing sub-categories. Since the two streams are complementary, combining the two streams can further achieves better classification accuracy. Comparing with 12 state-of-the-art methods on the widely used CUB-200-2011 dataset for fine-grained image classification, the experimental results demonstrate our CVL approach achieves the best performance.
- Apr 07 2017 cs.CV arXiv:1704.01740v1Fine-grained image classification is to recognize hundreds of subcategories belonging to the same basic-level category, such as 200 subcategories belonging to bird, and highly challenging due to large variance in same subcategory and small variance among different subcategories. Existing methods generally find where the object or its parts are and then discriminate which subcategory the image belongs to. However, they mainly have two limitations: (1) Relying on object or parts annotations which are heavily labor consuming. (2) Ignoring the spatial relationship between the object and its parts as well as among these parts, both of which are significantly helpful for finding discriminative parts. Therefore, this paper proposes the object-part attention driven discriminative localization (OPADDL) approach for weakly supervised fine-grained image classification, and the main novelties are: (1) Object-part attention model integrates two level attentions: object-level attention localizes objects of images, and part-level attention selects discriminative parts of object. Both are jointly employed to learn multi-view and multi-scale features to enhance their mutual promotion. (2) Object-part spatial model combines two spatial constraints: object spatial constraint ensures selected parts highly representative, and part spatial constraint eliminates redundancy and enhances discrimination of selected parts. Both are jointly employed to exploit the subtle and local differences for distinguishing the subcategories. Importantly, neither objects nor parts annotations are used, which avoids the heavy labor consuming of labeling. Comparing with more than 10 state-of-the-art methods on 3 widely used datasets, our OPADDL approach achieves the best performance.
- Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide \& Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.
- Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity. However, the conflation task is challenging because two text strings that describe the same entity could be quite different from each other for reasons such as misspelling. It is therefore critical to develop a conflation model that is able to truly understand the semantic meaning of the strings and match them at the semantic level. To this end, we develop a character-level deep conflation model that encodes the input text strings from character level into finite dimension feature vectors, which are then used to compute the cosine similarity between the text strings. The model is trained in an end-to-end manner using back propagation and stochastic gradient descent to maximize the likelihood of the correct association. Specifically, we propose two variants of the deep conflation model, based on long-short-term memory (LSTM) recurrent neural network (RNN) and convolutional neural network (CNN), respectively. Both models perform well on a real-world business analytics dataset and significantly outperform the baseline bag-of-character (BoC) model.
- The problem of quantizing the activations of a deep neural network is considered. An examination of the popular binary quantization approach shows that this consists of approximating a classical non-linearity, the hyperbolic tangent, by two functions: a piecewise constant sign function, which is used in feedforward network computations, and a piecewise linear hard tanh function, used in the backpropagation step during network learning. The problem of approximating the ReLU non-linearity, widely used in the recent deep learning literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations commonly used in the literature. To overcome the problem of gradient mismatch, due to the use of different forward and backward approximations, several piece-wise backward approximators are then investigated. The implementation of the resulting quantized network, denoted as HWGQ-Net, is shown to achieve much closer performance to full precision networks, such as AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision networks, with 1-bit binary weights and 2-bit quantized activations.
- Private record linkage is the problem of identifying pairs of records that are similar as per an input matching rule from databases that are held by two parties that do not trust one another. We identify three key desiderata that a PRL solution must ensure: a proof of end-to-end privacy, communication and computational costs that scale sub-quadratically in the number of input records, perfect precision and high recall of matching pairs. We show that all of the existing solutions for PRL -- including secure 2-party computation (S2PC), and their variants that use non-private or differentially private (DP) blocking -- violate at least one of the three desiderata. In particular, S2PC techniques guarantee end-to-end privacy but have either low recall or high cost. We show that DP blocking based techniques do not provide an end-to-end privacy guarantee as DP does not permit the release of any exact answers (including matching records in PRL). In light of this deficiency, we propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of DP, but allows for the truthful release of the output of a certain function applied to the data. We apply this to PRL, and show that protocols satisfying this privacy model permit the disclosure of the true matching records, but their execution is insensitive to the presence or absence of a single non-matching record. We develop novel protocols that satisfy this end-to-end privacy guarantee and permit a tradeoff between recall, privacy and efficiency. Our empirical evaluations on real and synthetic datasets show that our protocols have high recall, scale near linearly in the size of the input databases (and the output set of matching pairs), and have communication and computational costs that are at least 2 orders of magnitude smaller than S2PC baselines.
- Dec 12 2016 cs.CV arXiv:1612.03129v2We address the problem of instance-level semantic segmentation, which aims at jointly detecting, segmenting and classifying every individual object in an image. In this context, existing methods typically propose candidate objects, usually as bounding boxes, and directly predict a binary mask within each such proposal. As a consequence, they cannot recover from errors in the object candidate generation process, such as too small or shifted boxes. In this paper, we introduce a novel object segment representation based on the distance transform of the object masks. We then design an object mask network (OMN) with a new residual-deconvolution architecture that infers such a representation and decodes it into the final binary object mask. This allows us to predict masks that go beyond the scope of the bounding boxes and are thus robust to inaccurate object candidates. We integrate our OMN into a Multitask Network Cascade framework, and learn the resulting boundary-aware instance segmentation (BAIS) network in an end-to-end manner. Our experiments on the PASCAL VOC 2012 and the Cityscapes datasets demonstrate the benefits of our approach, which outperforms the state-of-the-art in both object proposal generation and instance segmentation.
- Nov 30 2016 cs.IR arXiv:1611.09496v1It is well known that learning customers' preference and making recommendations to them from today's information-exploded environment is critical and non-trivial in an on-line system. There are two different modes of recommendation systems, namely pull-mode and push-mode. The majority of the recommendation systems are pull-mode, which recommend items to users only when and after users enter Application Market. While push-mode works more actively to enhance or re-build connection between Application Market and users. As one of the most successful phone manufactures,both the number of users and apps increase dramatically in Huawei Application Store (also named Hispace Store), which has approximately 0.3 billion registered users and 1.2 million apps until 2016 and whose number of users is growing with high-speed. For the needs of real scenario, we establish a Push Service Platform (shortly, PSP) to discover the target user group automatically from web-scale user operation log data with an additional small set of labelled apps (usually around 10 apps),in Hispace Store. As presented in this work,PSP includes distributed storage layer, application layer and evaluation layer. In the application layer, we design a practical graph-based algorithm (named A-PARW) for user group discovery, which is an approximate version of partially absorbing random walk. Based on I mode of A-PARW, the effectiveness of our system is significantly improved, compared to the predecessor to presented system, which uses Personalized Pagerank in its application layer.
- A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.
- We propose a new encoder-decoder approach to learn distributed sentence representations from unlabeled sentences. The word-to-vector representation is used, and convolutional neural networks are employed as sentence encoders, mapping an input sentence into a fixed-length vector. This representation is decoded using long short-term memory recurrent neural networks, considering several tasks, such as reconstructing the input sentence, or predicting the future sentence. We further describe a hierarchical encoder-decoder model to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic convolutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods.
- In recent years, interest in recommender research has shifted from explicit feedback towards implicit feedback data. A diversity of complex models has been proposed for a wide variety of applications. Despite this, learning from implicit feedback is still computationally challenging. So far, most work relies on stochastic gradient descent (SGD) solvers which are easy to derive, but in practice challenging to apply, especially for tasks with many items. For the simple matrix factorization model, an efficient coordinate descent (CD) solver has been previously proposed. However, efficient CD approaches have not been derived for more complex models. In this paper, we provide a new framework for deriving efficient CD algorithms for complex recommender models. We identify and introduce the property of k-separable models. We show that k-separability is a sufficient property to allow efficient optimization of implicit recommender problems with CD. We illustrate this framework on a variety of state-of-the-art models including factorization machines and Tucker decomposition. To summarize, our work provides the theory and building blocks to derive efficient implicit CD algorithms for complex recommender models.
- In an Internet of Things network, multiple sensors send information to a fusion center for it to infer a public hypothesis of interest. However, the same sensor information may be used by the fusion center to make inferences of a private nature that the sensors wish to protect. To model this, we adopt a decentralized hypothesis testing framework with binary public and private hypotheses. Each sensor makes a private observation and utilizes a local sensor decision rule or privacy mapping to summarize that observation before sending it to the fusion center. Without assuming knowledge of the joint distribution of the sensor observations and hypotheses, we adopt a nonparametric learning approach to design local privacy mappings. We introduce the concept of an empirical normalized risk, which provides a theoretical guarantee for the network to achieve information privacy for the private hypothesis with high probability when the number of training samples is large. We develop iterative optimization algorithms to determine an appropriate privacy threshold and the best sensor privacy mappings, and show that they converge. Numerical results on both synthetic and real data sets suggest that our proposed approach yields low error rates for inferring the public hypothesis, but high error rates for detecting the private hypothesis.
- We study the problem of learning influence functions under incomplete observations of node activations. Incomplete observations are a major concern as most (online and real-world) social networks are not fully observable. We establish both proper and improper PAC learnability of influence functions under randomly missing observations. Proper PAC learnability under the Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade (DIC) models is established by reducing incomplete observations to complete observations in a modified graph. Our improper PAC learnability result applies for the DLT and DIC models as well as the Continuous-Time Independent Cascade (CIC) model. It is based on a parametrization in terms of reachability features, and also gives rise to an efficient and practical heuristic. Experiments on synthetic and real-world datasets demonstrate the ability of our method to compensate even for a fairly large fraction of missing observations.
- Aug 12 2016 cs.CV arXiv:1608.03474v1With increasing demand for efficient image and video analysis, test-time cost of scene parsing becomes critical for many large-scale or time-sensitive vision applications. We propose a dynamic hierarchical model for anytime scene labeling that allows us to achieve flexible trade-offs between efficiency and accuracy in pixel-level prediction. In particular, our approach incorporates the cost of feature computation and model inference, and optimizes the model performance for any given test-time budget by learning a sequence of image-adaptive hierarchical models. We formulate this anytime representation learning as a Markov Decision Process with a discrete-continuous state-action space. A high-quality policy of feature and model selection is learned based on an approximate policy iteration method with action proposal mechanism. We demonstrate the advantages of our dynamic non-myopic anytime scene parsing on three semantic segmentation datasets, which achieves $90\%$ of the state-of-the-art performances by using $15\%$ of their overall costs.
- We develop a novel bi-directional attention model for dependency parsing, which learns to agree on headword predictions from the forward and backward parsing directions. The parsing procedure for each direction is formulated as sequentially querying the memory component that stores continuous headword embeddings. The proposed parser makes use of \it soft headword embeddings, allowing the model to implicitly capture high-order parsing history without dramatically increasing the computational complexity. We conduct experiments on English, Chinese, and 12 other languages from the CoNLL 2006 shared task, showing that the proposed model achieves state-of-the-art unlabeled attachment scores on 6 languages.
- Jul 28 2016 cs.CV arXiv:1607.08221v1In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is the largest publicly available one in the world.
- Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great success in many real-world applications. However, for large-scale problems involving a huge number of samples and extremely high-dimensional features, solving sparse SVM remains challenging. By noting that sparse SVM induces sparsities in both feature and sample spaces, we propose a novel approach---that is based on accurate estimations of the primal and dual optimums of sparse SVM---to simultaneously identify the features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified samples and features from the training phase, which leads to substantial savings in both memory usage and computational cost without sacrificing accuracy. To the best of our knowledge, the proposed method is the \emphfirst \emphstatic feature and sample reduction method for sparse SVM. Experiments on both synthetic and real datasets (e.g., the kddb dataset with about 20 million of samples and 30 million of features) demonstrate that our approach significantly outperforms existing state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.
- Jun 16 2016 cs.LG arXiv:1606.04646v1Unsupervised learning is the most challenging problem in machine learning and especially in deep learning. Among many scenarios, we study an unsupervised learning problem of high economic value --- learning to predict without costly pairing of input data and corresponding labels. Part of the difficulty in this problem is a lack of solid evaluation measures. In this paper, we take a practical approach to grounding unsupervised learning by using the same success criterion as for supervised learning in prediction tasks but we do not require the presence of paired input-output training data. In particular, we propose an objective function that aims to make the predicted outputs fit well the structure of the output while preserving the correlation between the input and the predicted output. We experiment with a synthetic structural prediction problem and show that even with simple linear classifiers, the objective function is already highly non-convex. We further demonstrate the nature of this non-convex optimization problem as well as potential solutions. In particular, we show that with regularization via a generative model, learning with the proposed unsupervised objective function converges to an optimal solution.
- Dealing with the complex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation. In contrast to most modern neural systems of translation, which discard the identity for rare words, in this paper we propose several architectures for learning word representations from character and morpheme level word decompositions. We incorporate these representations in a novel machine translation model which jointly learns word alignments and translations via a hard attention mechanism. Evaluating on translating from several morphologically rich languages into English, we show consistent improvements over strong baseline methods, of between 1 and 1.5 BLEU points.
- We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space. A specified number of discussion threads predicted to be popular are recommended, chosen from a fixed window of recent comments to track. Novel deep reinforcement learning architectures are studied for effective modeling of the value function associated with actions comprised of interdependent sub-actions. The proposed model, which represents dependence between sub-actions through a bi-directional LSTM, gives the best performance across different experimental configurations and domains, and it also generalizes well with varying numbers of recommendation requests.
- Jun 08 2016 cs.CV arXiv:1606.02009v1Conventional change detection methods require a large number of images to learn background models. The few recent approaches that attempt change detection between two images either use handcrafted features or depend strongly on tedious pixel-level labeling by humans. In this paper, we present a weakly supervised approach that needs only image-level labels to simultaneously detect and localize changes in a pair of images. To this end, we employ a deep neural network with DAG topology to learn patterns of change from image-level labeled training data. On top of the initial CNN activations, we define a CRF model to incorporate the local differences and the dense connections between individual pixels. We apply a constrained mean-field algorithm to estimate the pixel-level labels, and use the estimated labels to update the parameters of the CNN in an iterative EM framework. This enables imposing global constraints on the observed foreground probability mass function. Our evaluations on four large benchmark datasets demonstrate superior detection and localization performance.
- Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to non-covexity nature of the problem, it was observed that SGD slows down near saddle point. Recent empirical work claim that by detecting and escaping saddle point efficiently, it's more likely to improve training performance. With this objective, we revisit Hessian-free optimization method for deep networks. We also develop its distributed variant and demonstrate superior scaling potential to SGD, which allows more efficiently utilizing larger computing resources thus enabling large models and faster time to obtain desired solution. Furthermore, unlike truncated Newton method (Marten's HF) that ignores negative curvature information by using naïve conjugate gradient method and Gauss-Newton Hessian approximation information - we propose a novel algorithm to explore negative curvature direction by solving the sub-problem with stabilized bi-conjugate method involving possible indefinite stochastic Hessian information. We show that these techniques accelerate the training process for both the standard MNIST dataset and also the TIMIT speech recognition problem, demonstrating robust performance with upto an order of magnitude larger batch sizes. This increased scaling potential is illustrated with near linear speed-up on upto 16 CPU nodes for a simple 4-layer network.
- Jun 01 2016 cs.CV arXiv:1605.09546v1While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors' limitations. By contrast, here, we introduce an approach to performing depth super-resolution in more challenging conditions, such as in outdoor scenes. To this end, we first propose to exploit semantic information to better constrain the super-resolution process. In particular, we design a co-sparse analysis model that learns filters from joint intensity, depth and semantic information. Furthermore, we show how low-resolution training depth maps can be employed in our learning strategy. We demonstrate the benefits of our approach over state-of-the-art depth super-resolution methods on two outdoor scene datasets.
- To obtain a better cycle-structure is still a challenge for the low-density parity-check (LDPC) code design. This paper formulates two metrics firstly so that the progressive edge-growth (PEG) algorithm and the approximate cycle extrinsic message degree (ACE) constrained PEG algorithm are unified into one integrated algorithm, called the metric-constrained PEG algorithm (M-PEGA). Then, as an improvement for the M-PEGA, the multi-edge metric-constrained PEG algorithm (MM-PEGA) is proposed based on two new concepts, the multi-edge local girth and the edge-trials. The MM-PEGA with the edge-trials, say a positive integer $r$, is called the $r$-edge M-PEGA, which constructs each edge of the non-quasi-cyclic (non-QC) LDPC code graph through selecting a check node whose $r$-edge local girth is optimal. In addition, to design the QC-LDPC codes with any predefined valid design parameters, as well as to detect and even to avoid generating the undetectable cycles in the QC-LDPC codes designed by the QC-PEG algorithm, the multi-edge metric constrained QC-PEG algorithm (MM-QC-PEGA) is proposed lastly. It is verified by the simulation results that increasing the edge-trials of the MM-PEGA/MM-QC-PEGA is expected to have a positive effect on the cycle-structures and the error performances of the LDPC codes designed by the MM-PEGA/MM-QC-PEGA.
- Apr 29 2016 cs.DB arXiv:1604.08407v3Considerable effort has been made to increase the scale of Linked Data. However, an inevitable problem when dealing with data integration from multiple sources is that multiple different sources often provide conflicting objects for a certain predicate of the same real-world entity, so-called object conflicts problem. Currently, the object conflicts problem has not received sufficient attention in the Linked Data community. In this paper, we first formalize the object conflicts resolution problem as computing the joint distribution of variables on a heterogeneous information network called the Source-Object Network, which successfully captures the all correlations from objects and Linked Data sources. Then, we introduce a novel approach based on network effects called ObResolution(Object Resolution), to identify a true object from multiple conflicting objects. ObResolution adopts a pairwise Markov Random Field (pMRF) to model all evidences under a unified framework. Extensive experimental results on six real-world datasets show that our method exhibits higher accuracy than existing approaches and it is robust and consistent in various domains. \keywordsLinked Data, Object Conflicts, Linked Data Quality, Truth Discovery
- We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The first release of this dataset, SIND v.1, includes 81,743 unique photos in 20,211 sequences, aligned to both descriptive (caption) and story language. We establish several strong baselines for the storytelling task, and motivate an automatic metric to benchmark progress. Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression.
- Apr 14 2016 cs.DS arXiv:1604.03921v1A monotone drawing of a graph G is a straight-line drawing of G such that, for every pair of vertices u,w in G, there exists abpath P_uw in G that is monotone in some direction l_uw. (Namely, the order of the orthogonal projections of the vertices of P_uw on l_uw is the same as the order they appear in P_uw.) The problem of finding monotone drawings for trees has been studied in several recent papers. The main focus is to reduce the size of the drawing. Currently, the smallest drawing size is O(n^1.205) x O(n^1.205). In this paper, we present an algorithm for constructing monotone drawings of trees on a grid of size at most 12n x 12n. The smaller drawing size is achieved by a new simple Path Draw algorithm, and a procedure that carefully assigns primitive vectors to the paths of the input tree T. We also show that there exists a tree T_0 such that any monotone drawing of T_0 must use a grid of size Omega(n) x Omega(n). So the size of our monotone drawing of trees is asymptotically optimal.
- Representation and learning of commonsense knowledge is one of the foundational problems in the quest to enable deep language understanding. This issue is particularly challenging for understanding casual and correlational relationships between events. While this topic has received a lot of interest in the NLP community, research has been hindered by the lack of a proper evaluation framework. This paper attempts to address this problem with a new framework for evaluating story understanding and script learning: the 'Story Cloze Test'. This test requires a system to choose the correct ending to a four-sentence story. We created a new corpus of ~50k five-sentence commonsense stories, ROCStories, to enable this evaluation. This corpus is unique in two ways: (1) it captures a rich set of causal and temporal commonsense relations between daily events, and (2) it is a high quality collection of everyday life stories that can also be used for story generation. Experimental evaluation shows that a host of baselines and state-of-the-art models based on shallow language understanding struggle to achieve a high score on the Story Cloze Test. We discuss these implications for script and story learning, and offer suggestions for deeper language understanding.
- We show that a character-level encoder-decoder framework can be successfully applied to question answering with a structured knowledge base. We use our model for single-relation question answering and demonstrate the effectiveness of our approach on the SimpleQuestions dataset (Bordes et al., 2015), where we improve state-of-the-art accuracy from 63.9% to 70.9%, without use of ensembles. Importantly, our character-level model has 16x fewer parameters than an equivalent word-level model, can be learned with significantly less data compared to previous work, which relies on data augmentation, and is robust to new entities in testing.
- Mar 31 2016 cs.CV arXiv:1603.09016v2We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out of-domain datasets.
- There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.
- Feb 18 2016 cs.SI physics.soc-ph arXiv:1602.05240v2Uncertainty about models and data is ubiquitous in the computational social sciences, and it creates a need for robust social network algorithms, which can simultaneously provide guarantees across a spectrum of models and parameter settings. We begin an investigation into this broad domain by studying robust algorithms for the Influence Maximization problem, in which the goal is to identify a set of k nodes in a social network whose joint influence on the network is maximized. We define a Robust Influence Maximization framework wherein an algorithm is presented with a set of influence functions, typically derived from different influence models or different parameter settings for the same model. The different parameter settings could be derived from observed cascades on different topics, under different conditions, or at different times. The algorithm's goal is to identify a set of k nodes who are simultaneously influential for all influence functions, compared to the (function-specific) optimum solutions. We show strong approximation hardness results for this problem unless the algorithm gets to select at least a logarithmic factor more seeds than the optimum solution. However, when enough extra seeds may be selected, we show that techniques of Krause et al. can be used to approximate the optimum robust influence to within a factor of 1 - 1/e. We evaluate this bicriteria approximation algorithm against natural heuristics on several real-world data sets. Our experiments indicate that the worst-case hardness does not necessarily translate into bad performance on real-world data sets; all algorithms perform fairly well.
- Jan 13 2016 cs.AI arXiv:1601.02745v1In this paper we present the initial development of a general theory for mapping inference in predicate logic to computation over Tensor Product Representations (TPRs; Smolensky (1990), Smolensky & Legendre (2006)). After an initial brief synopsis of TPRs (Section 0), we begin with particular examples of inference with TPRs in the 'bAbI' question-answering task of Weston et al. (2015) (Section 1). We then present a simplification of the general analysis that suffices for the bAbI task (Section 2). Finally, we lay out the general treatment of inference over TPRs (Section 3). We also show the simplification in Section 2 derives the inference methods described in Lee et al. (2016); this shows how the simple methods of Lee et al. (2016) can be formally extended to more general reasoning tasks.
- Nov 23 2015 cs.CL arXiv:1511.06426v4Question answering tasks have shown remarkable progress with distributed vector representation. In this paper, we investigate the recently proposed Facebook bAbI tasks which consist of twenty different categories of questions that require complex reasoning. Because the previous work on bAbI are all end-to-end models, errors could come from either an imperfect understanding of semantics or in certain steps of the reasoning. For clearer analysis, we propose two vector space models inspired by Tensor Product Representation (TPR) to perform knowledge encoding and logical reasoning based on common-sense inference. They together achieve near-perfect accuracy on all categories including positional reasoning and path finding that have proved difficult for most of the previous approaches. We hypothesize that the difficulties in these categories are due to the multi-relations in contrast to uni-relational characteristic of other categories. Our exploration sheds light on designing more sophisticated dataset and moving one step toward integrating transparent and interpretable formalism of TPR into existing learning paradigms.
- Nov 20 2015 cs.CV arXiv:1511.06070v1In this paper, we tackle the problem of estimating the depth of a scene from a monocular video sequence. In particular, we handle challenging scenarios, such as non-translational camera motion and dynamic scenes, where traditional structure from motion and motion stereo methods do not apply. To this end, we first study the problem of depth estimation from a single image. In this context, we exploit the availability of a pool of images for which the depth is known, and formulate monocular depth estimation as a discrete-continuous optimization problem, where the continuous variables encode the depth of the superpixels in the input image, and the discrete ones represent relationships between neighboring superpixels. The solution to this discrete-continuous optimization problem is obtained by performing inference in a graphical model using particle belief propagation. To handle video sequences, we then extend our single image model to a two-frame one that naturally encodes short-range temporal consistency and inherently handles dynamic objects. Based on the prediction of this model, we then introduce a fully-connected pairwise CRF that accounts for longer range spatio-temporal interactions throughout a video. We demonstrate the effectiveness of our model in both the indoor and outdoor scenarios.
- Nov 20 2015 cs.NA arXiv:1511.06227v2Significant inaccuracy often occurs during the process of mathematical calculation due to the digit limitation of floating point, which may lead to catastrophic loss. Normally, people believe that adjustment of floating-point precision is an effective way to solve this problem, since high-precision floating-point has more digits to store information. Thus, it is a prevalent method to reduce the inaccuracy in much floating-point related research, that performing all the operations with higher precision. However, we discover that some operations may lead to larger error in higher precision. In this paper, we define this kind of operation that generates large error due to precision adjustment a precision-specific operation. Furthermore, we propose a light-weight searching algorithm for detecting precision-specific operations and figure out an automatic processing method to fixing them. In addition, we conducted an experiment on the scientific mathematical library of GLIBC. The result shows that there are many precision-specific operations, and our fixing approach can significantly reduce the inaccuracy.
- This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based games. Termed a deep reinforcement relevance network (DRRN), the architecture represents action and state spaces with separate embedding vectors, which are combined with an interaction function to approximate the Q-function in reinforcement learning. We evaluate the DRRN on two popular text games, showing superior performance over other deep Q-learning architectures. Experiments with paraphrased action descriptions show that the model is extracting meaning rather than simply memorizing strings of text.
- This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
- In this paper we develop dual free mini-batch SDCA with adaptive probabilities for regularized empirical risk minimization. This work is motivated by recent work of Shai Shalev-Shwartz on dual free SDCA method, however, we allow a non-uniform selection of "dual" coordinates in SDCA. Moreover, the probability can change over time, making it more efficient than fix uniform or non-uniform selection. We also propose an efficient procedure to generate a random non-uniform mini-batch through iterative process. The work is concluded with multiple numerical experiments to show the efficiency of proposed algorithms.
- The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such style is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.
- Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.
- Aug 17 2015 cs.LG arXiv:1508.03398v2We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document. Different from traditional variational learning or Gibbs sampling approaches, the proposed learning method applies (i) the mirror descent algorithm for maximum a posterior inference and (ii) back propagation over a deep architecture together with stochastic gradient/mirror descent for model parameter estimation, leading to scalable and end-to-end discriminative learning of the model. As a byproduct, we also apply this technique to develop a new learning method for the traditional unsupervised LDA model (i.e., BP-LDA). Experimental results on three real-world regression and classification tasks show that the proposed methods significantly outperform the previous supervised topic models, neural networks, and is on par with deep neural networks.
- Constructions of quantum MDS codes have been studied by many authors. We refer to the table in page 1482 of [3] for known constructions. However there are only few $q$-ary quantum MDS $[[n,n-2d+2,d]]_q$ codes with minimum distances $d>\frac{q}{2}$ for sparse lengths $n>q+1$. In the case $n=\frac{q^2-1}{m}$ where $m|q+1$ or $m|q-1$ there are complete results. In the case $n=\frac{q^2-1}{m}$ where $m|q^2-1$ is not a factor of $q-1$ or $q+1$, there is no $q$-ary quantum MDS code with $d> \frac{q}{2}$ has been constructed. In this paper we propose a direct approch to construct Hermitian self-orthogonal codes over ${\bf F}_{q^2}$. Thus we give some new $q$-ary quantum codes in this case. Moreover we present many new $q$-ary quantum MDS codes with lengths of the form $\frac{w(q^2-1)}{u}$ and minimum distances $d > \frac{q}{2}$.
- Jul 01 2015 cs.CV arXiv:1506.09075v3This letter presents a novel approach to extract reliable dense and long-range motion trajectories of articulated human in a video sequence. Compared with existing approaches that emphasize temporal consistency of each tracked point, we also consider the spatial structure of tracked points on the articulated human. We treat points as a set of vertices, and build a triangle mesh to join them in image space. The problem of extracting long-range motion trajectories is changed to the issue of consistency of mesh evolution over time. First, self-occlusion is detected by a novel mesh-based method and an adaptive motion estimation method is proposed to initialize mesh between successive frames. Furthermore, we propose an iterative algorithm to efficiently adjust vertices of mesh for a physically plausible deformation, which can meet the local rigidity of mesh and silhouette constraints. Finally, we compare the proposed method with the state-of-the-art methods on a set of challenging sequences. Evaluations demonstrate that our method achieves favorable performance in terms of both accuracy and integrity of extracted trajectories.
- May 27 2015 cs.DB arXiv:1505.06872v1Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data analytics workloads? Big data dwarfs are abstractions of extracting frequently appearing operations in big data computing. One dwarf represents one unit of computation, and big data workloads are decomposed into one or more dwarfs. Furthermore, dwarfs workloads rather than vast real workloads are more cost-efficient and representative to evaluate big data systems. In this paper, we extensively investigate six most important or emerging application domains i.e. search engine, social network, e-commerce, multimedia, bioinformatics and astronomy. After analyzing forty representative algorithms, we single out eight dwarfs workloads in big data analytics other than OLAP, which are linear algebra, sampling, logic operations, transform operations, set operations, graph operations, statistic operations and sort.
- May 08 2015 cs.LG arXiv:1505.01576v1In many naturally occurring optimization problems one needs to ensure that the definition of the optimization problem lends itself to solutions that are tractable to compute. In cases where exact solutions cannot be computed tractably, it is beneficial to have strong guarantees on the tractable approximate solutions. In order operate under these criterion most optimization problems are cast under the umbrella of convexity or submodularity. In this report we will study design and optimization over a common class of functions called submodular functions. Set functions, and specifically submodular set functions, characterize a wide variety of naturally occurring optimization problems, and the property of submodularity of set functions has deep theoretical consequences with wide ranging applications. Informally, the property of submodularity of set functions concerns the intuitive "principle of diminishing returns. This property states that adding an element to a smaller set has more value than adding it to a larger set. Common examples of submodular monotone functions are entropies, concave functions of cardinality, and matroid rank functions; non-monotone examples include graph cuts, network flows, and mutual information. In this paper we will review the formal definition of submodularity; the optimization of submodular functions, both maximization and minimization; and finally discuss some applications in relation to learning and reasoning using submodular functions.
- Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.
- Apr 14 2015 cs.LG arXiv:1504.02824v2Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items. In this paper, we study co-occurrence data using a general energy-based probabilistic model, and we analyze three different categories of energy-based model, namely, the $L_1$, $L_2$ and $L_k$ models, which are able to capture different levels of dependency in the co-occurrence data. We also discuss how several typical existing models are related to these three types of energy models, including the Fully Visible Boltzmann Machine (FVBM) ($L_2$), Matrix Factorization ($L_2$), Log-BiLinear (LBL) models ($L_2$), and the Restricted Boltzmann Machine (RBM) model ($L_k$). Then, we propose a Deep Embedding Model (DEM) (an $L_k$ model) from the energy model in a \emphprincipled manner. Furthermore, motivated by the observation that the partition function in the energy model is intractable and the fact that the major objective of modeling the co-occurrence data is to predict using the conditional probability, we apply the \emphmaximum pseudo-likelihood method to learn DEM. In consequence, the developed model and its learning method naturally avoid the above difficulties and can be easily used to compute the conditional probability in prediction. Interestingly, our method is equivalent to learning a special structured deep neural network using back-propagation and a special sampling strategy, which makes it scalable on large-scale datasets. Finally, in the experiments, we show that the DEM can achieve comparable or better results than state-of-the-art methods on datasets across several application domains.
- Apr 14 2015 cs.CV arXiv:1504.03083v2This technical report provides extra details of the deep multimodal similarity model (DMSM) which was proposed in (Fang et al. 2015, arXiv:1411.4952). The model is trained via maximizing global semantic similarity between images and their captions in natural language using the public Microsoft COCO database, which consists of a large set of images and their corresponding captions. The learned representations attempt to capture the combination of various visual concepts and cues.
- This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks with Long Short-Term Memory (LSTM) cells. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detects the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms it for web document retrieval task.
- Power systems are developing very fast nowadays, both in size and in complexity; this situation is a challenge for Early Event Detection (EED). This paper proposes a data- driven unsupervised learning method to handle this challenge. Specifically, the random matrix theories (RMTs) are introduced as the statistical foundations for random matrix models (RMMs); based on the RMMs, linear eigenvalue statistics (LESs) are defined via the test functions as the system indicators. By comparing the values of the LES between the experimental and the theoretical ones, the anomaly detection is conducted. Furthermore, we develop 3D power-map to visualize the LES; it provides a robust auxiliary decision-making mechanism to the operators. In this sense, the proposed method conducts EED with a pure statistical procedure, requiring no knowledge of system topologies, unit operation/control models, etc. The LES, as a key ingredient during this procedure, is a high dimensional indictor derived directly from raw data. As an unsupervised learning indicator, the LES is much more sensitive than the low dimensional indictors obtained from supervised learning. With the statistical procedure, the proposed method is universal and fast; moreover, it is robust against traditional EED challenges (such as error accumulations, spurious correlations, and even bad data in core area). Case studies, with both simulated data and real ones, validate the proposed method. To manage large-scale distributed systems, data fusion is mentioned as another data processing ingredient.
- Jan 20 2015 cs.SI arXiv:1501.04579v2The present article serves as an erratum to our paper of the same title, which was presented and published in the KDD 2014 conference. In that article, we claimed falsely that the objective function defined in Section 1.4 is non-monotone submodular. We are deeply indebted to Debmalya Mandal, Jean Pouget-Abadie and Yaron Singer for bringing to our attention a counter-example to that claim. Subsequent to becoming aware of the counter-example, we have shown that the objective function is in fact NP-hard to approximate to within a factor of $O(n^{1-\epsilon})$ for any $\epsilon > 0$. In an attempt to fix the record, the present article combines the problem motivation, models, and experimental results sections from the original incorrect article with the new hardness result. We would like readers to only cite and use this version (which will remain an unpublished note) instead of the incorrect conference version.
- Dec 23 2014 cs.IR arXiv:1412.6629v3In this paper we address the following problem in web document and information retrieval (IR): How can we use long-term context information to gain better IR performance? Unlike common IR methods that use bag of words representation for queries and documents, we treat them as a sequence of words and use long short term memory (LSTM) to capture contextual dependencies. To the best of our knowledge, this is the first time that LSTM is applied to information retrieval tasks. Unlike training traditional LSTMs, the training strategy is different due to the special nature of information retrieval problem. Experimental evaluation on an IR task derived from the Bing web search demonstrates the ability of the proposed method in addressing both lexical mismatch and long-term context modelling issues, thereby, significantly outperforming existing state of the art methods for web document retrieval task.
- Dec 23 2014 cs.CL arXiv:1412.6575v4We consider learning representations of entities and relations in KBs using the neural-embedding approach. We show that most existing models, including NTN (Socher et al., 2013) and TransE (Bordes et al., 2013b), can be generalized under a unified learning framework, where entities are low-dimensional vectors learned from a neural network and relations are bilinear and/or linear mapping functions. Under this framework, we compare a variety of embedding models on the link prediction task. We show that a simple bilinear formulation achieves new state-of-the-art results for the task (achieving a top-10 accuracy of 73.2% vs. 54.7% by TransE on Freebase). Furthermore, we introduce a novel approach that utilizes the learned relation embeddings to mine logical rules such as "BornInCity(a,b) and CityInCountry(b,c) => Nationality(a,c)". We find that embeddings learned from the bilinear objective are particularly good at capturing relational semantics and that the composition of relations is characterized by matrix multiplication. More interestingly, we demonstrate that our embedding-based rule extraction approach successfully outperforms a state-of-the-art confidence-based rule mining approach in mining Horn rules that involve compositional reasoning.
- This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.
- In this paper we present a unified framework for modeling multi-relational representations, scoring, and learning, and conduct an empirical study of several recent multi-relational embedding models under the framework. We investigate the different choices of relation operators based on linear and bilinear transformations, and also the effects of entity representations by incorporating unsupervised vectors pre-trained on extra textual resources. Our results show several interesting findings, enabling the design of a simple embedding model that achieves the new state-of-the-art performance on a popular knowledge base completion task evaluated on Freebase.
- Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics. Following recent advances in topic model inference via tensor decomposition, we make a first attempt to provide theoretical analysis on model selection in latent Dirichlet allocation. Under mild conditions, we derive the upper bound and lower bound on the number of topics given a text collection of finite size. Experimental results demonstrate that our bounds are accurate and tight. Furthermore, using Gaussian mixture model as an example, we show that our methodology can be easily generalized to model selection analysis for other latent models.
- Traditional anomaly detection on social media mostly focuses on individual point anomalies while anomalous phenomena usually occur in groups. Therefore it is valuable to study the collective behavior of individuals and detect group anomalies. Existing group anomaly detection approaches rely on the assumption that the groups are known, which can hardly be true in real world social media applications. In this paper, we take a generative approach by proposing a hierarchical Bayes model: Group Latent Anomaly Detection (GLAD) model. GLAD takes both pair-wise and point-wise data as input, automatically infers the groups and detects group anomalies simultaneously. To account for the dynamic properties of the social media data, we further generalize GLAD to its dynamic extension d-GLAD. We conduct extensive experiments to evaluate our models on both synthetic and real world datasets. The empirical results demonstrate that our approach is effective and robust in discovering latent groups and detecting group anomalies.
- Sep 18 2014 cs.CV arXiv:1409.4995v1Not all tags are relevant to an image, and the number of relevant tags is image-dependent. Although many methods have been proposed for image auto-annotation, the question of how to determine the number of tags to be selected per image remains open. The main challenge is that for a large tag vocabulary, there is often a lack of ground truth data for acquiring optimal cutoff thresholds per tag. In contrast to previous works that pre-specify the number of tags to be selected, we propose in this paper adaptive tag selection. The key insight is to divide the vocabulary into two disjoint subsets, namely a seen set consisting of tags having ground truth available for optimizing their thresholds and a novel set consisting of tags without any ground truth. Such a division allows us to estimate how many tags shall be selected from the novel set according to the tags that have been selected from the seen set. The effectiveness of the proposed method is justified by our participation in the ImageCLEF 2014 image annotation task. On a set of 2,065 test images with ground truth available for 207 tags, the benchmark evaluation shows that compared to the popular top-$k$ strategy which obtains an F-score of 0.122, adaptive tag selection achieves a higher F-score of 0.223. Moreover, by treating the underlying image annotation system as a black box, the new method can be used as an easy plug-in to boost the performance of existing systems.
- Multiobjective design optimization problems require multiobjective optimization techniques to solve, and it is often very challenging to obtain high-quality Pareto fronts accurately. In this paper, the recently developed flower pollination algorithm (FPA) is extended to solve multiobjective optimization problems. The proposed method is used to solve a set of multobjective test functions and two bi-objective design benchmarks, and a comparison of the proposed algorithm with other algorithms has been made, which shows that FPA is efficient with a good convergence rate. Finally, the importance for further parametric studies and theoretical analysis are highlighted and discussed.
- Learning a distance function or metric on a given data manifold is of great importance in machine learning and pattern recognition. Many of the previous works first embed the manifold to Euclidean space and then learn the distance function. However, such a scheme might not faithfully preserve the distance function if the original manifold is not Euclidean. Note that the distance function on a manifold can always be well-defined. In this paper, we propose to learn the distance function directly on the manifold without embedding. We first provide a theoretical characterization of the distance function by its gradient field. Based on our theoretical analysis, we propose to first learn the gradient field of the distance function and then learn the distance function itself. Specifically, we set the gradient field of a local distance function as an initial vector field. Then we transport it to the whole manifold via heat flow on vector fields. Finally, the geodesic distance function can be obtained by requiring its gradient field to be close to the normalized vector field. Experimental results on both synthetic and real data demonstrate the effectiveness of our proposed algorithm.
- Flower pollination algorithm is a new nature-inspired algorithm, based on the characteristics of flowering plants. In this paper, we extend this flower algorithm to solve multi-objective optimization problems in engineering. By using the weighted sum method with random weights, we show that the proposed multi-objective flower algorithm can accurately find the Pareto fronts for a set of test functions. We then solve a bi-objective disc brake design problem, which indeed converges quickly.
- Dec 16 2013 cs.DB arXiv:1312.3913v5Privacy definitions provide ways for trading-off the privacy of individuals in a statistical database for the utility of downstream analysis of the data. In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. In particular, we allow data publishers to extend differential privacy using a policy, which specifies (a) secrets, or information that must be kept secret, and (b) constraints that may be known about the data. While the secret specification allows increased utility by lessening protection for certain individual properties, the constraint specification provides added protection against an adversary who knows correlations in the data (arising from constraints). We formalize policies and present novel algorithms that can handle general specifications of sensitive information and certain count constraints. We show that there are reasonable policies under which our privacy mechanisms for k-means clustering, histograms and range queries introduce significantly lesser noise than their differentially private counterparts. We quantify the privacy-utility trade-offs for various policies analytically and empirically on real datasets.
- Nov 29 2013 cs.CL arXiv:1312.0482v1This paper presents a novel semantic-based phrase translation model. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent semantic space, where their translation score is computed by the distance between the pair in this new space. The projection is performed by a multi-layer neural network whose weights are learned on parallel training data. The learning is aimed to directly optimize the quality of end-to-end machine translation results. Experimental evaluation has been performed on two Europarl translation tasks, English-French and German-English. The results show that the new semantic-based phrase translation model significantly improves the performance of a state-of-the-art phrase-based statistical machine translation sys-tem, leading to a gain of 0.7-1.0 BLEU points.
- We study a multi-antenna broadcast channel with two legitimate receivers and an external eavesdropper. We assume that the channel matrix of the eavesdropper is unknown to the legitimate terminals but satisfies a maximum rank constraint. As our main result we characterize the associated secrecy degrees of freedom for the broadcast channel with common and private messages. We show that a direct extension of the single-user wiretap codebook does not achieve the secrecy degrees of freedom. Our proposed optimal scheme involves decomposing the signal space into a common subspace, which can be observed by both receivers, and private subspaces which can be observed by only one of the receivers, and carefully transmitting a subset of messages in each subspace. We also consider the case when each user's private message must additionally remain confidential from the other legitimate receiver and characterize the s.d.o.f.\ region in this case.
- Nature-inspired metaheuristic algorithms, especially those based on swarm intelligence, have attracted much attention in the last ten years. Firefly algorithm appeared in about five years ago, its literature has expanded dramatically with diverse applications. In this paper, we will briefly review the fundamentals of firefly algorithm together with a selection of recent publications. Then, we discuss the optimality associated with balancing exploration and exploitation, which is essential for all metaheuristic algorithms. By comparing with intermittent search strategy, we conclude that metaheuristics such as firefly algorithm are better than the optimal intermittent search strategy. We also analyse algorithms and their implications for higher-dimensional optimization problems.
- Apr 03 2013 cs.LG arXiv:1304.0740v1Traditional algorithms for stochastic optimization require projecting the solution at each iteration into a given domain to ensure its feasibility. When facing complex domains, such as positive semi-definite cones, the projection operation can be expensive, leading to a high computational cost per iteration. In this paper, we present a novel algorithm that aims to reduce the number of projections for stochastic optimization. The proposed algorithm combines the strength of several recent developments in stochastic optimization, including mini-batch, extra-gradient, and epoch gradient descent, in order to effectively explore the smoothness and strong convexity. We show, both in expectation and with a high probability, that when the objective function is both smooth and strongly convex, the proposed algorithm achieves the optimal $O(1/T)$ rate of convergence with only $O(\log T)$ projections. Our empirical study verifies the theoretical result.