Existing works for extracting navigation objects from webpages focus on navigation menus, so as to reveal the information architecture of the site. However, web 2.0 sites such as social networks, e-commerce portals etc. are making the understanding of the content structure in a web site increasingly difficult. Dynamic and personalized elements such as top stories, recommended list in a webpage are vital to the understanding of the dynamic nature of web 2.0 sites. To better understand the content structure in web 2.0 sites, in this paper we propose a new extraction method for navigation objects in a webpage. Our method will extract not only the static navigation menus, but also the dynamic and personalized page-specific navigation lists. Since the navigation objects in a webpage naturally come in blocks, we first cluster hyperlinks into different blocks by exploiting spatial locations of hyperlinks, the hierarchical structure of the DOM-tree and the hyperlink density. Then we identify navigation objects from those blocks using the SVM classifier with novel features such as anchor text lengths etc. Experiments on real-world data sets with webpages from various domains and styles verified the effectiveness of our method.
Aug 29 2017 cs.AI
Humans develop a common sense of style compatibility between items based on their attributes. We seek to automatically answer questions like "Does this shirt go well with that pair of jeans?" In order to answer these kinds of questions, we attempt to model human sense of style compatibility in this paper. The basic assumption of our approach is that most of the important attributes for a product in an online store are included in its title description. Therefore it is feasible to learn style compatibility from these descriptions. We design a Siamese Convolutional Neural Network architecture and feed it with title pairs of items, which are either compatible or incompatible. Those pairs will be mapped from the original space of symbolic words into some embedded style space. Our approach takes only words as the input with few preprocessing and there is no laborious and expensive feature engineering.
Nov 18 2016 cs.LG
Co-clustering targets on grouping the samples (e.g., documents, users) and the features (e.g., words, ratings) simultaneously. It employs the dual relation and the bilateral information between the samples and features. In many realworld applications, data usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifold of the data space in a principled way. In this study, we focus on improving the co-clustering performance via manifold ensemble learning, which is able to maximally approximate the intrinsic manifolds of both the sample and feature spaces. To achieve this, we develop a novel co-clustering algorithm called Relational Multi-manifold Co-clustering (RMC) based on symmetric nonnegative matrix tri-factorization, which decomposes the relational data matrix into three submatrices. This method considers the intertype relationship revealed by the relational data matrix, and also the intra-type information reflected by the affinity matrices encoded on the sample and feature data distributions. Specifically, we assume the intrinsic manifold of the sample or feature space lies in a convex hull of some pre-defined candidate manifolds. We want to learn a convex combination of them to maximally approach the desired intrinsic manifold. To optimize the objective function, the multiplicative rules are utilized to update the submatrices alternatively. Besides, both the entropic mirror descent algorithm and the coordinate descent algorithm are exploited to learn the manifold coefficient vector. Extensive experiments on documents, images and gene expression data sets have demonstrated the superiority of the proposed algorithm compared to other well-established methods.
Jun 04 2015 cs.NI
Performance characterization is a fundamental issue in wireless networks for real time routing, wireless network simulation, and etc. There are four basic wireless operations that are required to be modeled, i.e., unicast, anycast, broadcast, and multicast. As observed in many recent works, the temporal and spatial distribution of packet receptions can have significant impact on wireless performance involving multiple links (anycast/broadcast/multicast). However, existing performance models and simulations overlook these two wireless behaviors, leading to biased performance estimation and simulation results. In this paper, we first explicitly identify the necessary "3-Dimension" information for wireless performance modeling, i.e., packet reception rate (PRR), PRR spatial distribution, and temporal distribution. We then propose a comprehensive modeling approach considering 3-Dimension Wireless information (called 3DW model). Further, we demonstrate the generality and wide applications of 3DW model by two case studies: 3DWbased network simulation and 3DW-based real time routing protocol. Extensive simulation and testbed experiments have been conducted. The results show that 3DW model achieves much more accurate performance estimation for both anycast and broadcast/multicast. 3DW-based simulation can effectively reserve the end-to-end performance metric of the input empirical traces. 3DW-based routing can select more efficient senders, achieving better transmission efficiency.
Feb 10 2015 cs.CV
For long time, person re-identification and image search are two separately studied tasks. However, for person re-identification, the effectiveness of local features and the "query-search" mode make it well posed for image search techniques. In the light of recent advances in image search, this paper proposes to treat person re-identification as an image search problem. Specifically, this paper claims two major contributions. 1) By designing an unsupervised Bag-of-Words representation, we are devoted to bridging the gap between the two tasks by integrating techniques from image search in person re-identification. We show that our system sets up an effective yet efficient baseline that is amenable to further supervised/unsupervised improvements. 2) We contribute a new high quality dataset which uses DPM detector and includes a number of distractor images. Our dataset reaches closer to realistic settings, and new perspectives are provided. Compared with approaches that rely on feature-feature match, our method is faster by over two orders of magnitude. Moreover, on three datasets, we report competitive results compared with the state-of-the-art methods.
Complex networks provide a powerful mathematical representation of complex systems in nature and society. To understand complex networks, it is crucial to explore their internal structures, also called structural regularities. The task of network structure exploration is to determine how many groups in a complex network and how to group the nodes of the network. Most existing structure exploration methods need to specify either a group number or a certain type of structure when they are applied to a network. In the real world, however, not only the group number but also the certain type of structure that a network has are usually unknown in advance. To automatically explore structural regularities in complex networks, without any prior knowledge about the group number or the certain type of structure, we extend a probabilistic mixture model that can handle networks with any type of structure but needs to specify a group number using Bayesian nonparametric theory and propose a novel Bayesian nonparametric model, called the Bayesian nonparametric mixture (BNPM) model. Experiments conducted on a large number of networks with different structures show that the BNPM model is able to automatically explore structural regularities in networks with a stable and state-of-the-art performance.