Social and Information Networks (cs.SI)

  • PDF
    Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.
  • PDF
    Genealogical networks, also known as family trees or population pedigrees, are commonly studied by genealogists wanting to know about their ancestry, but they also provide a valuable resource for disciplines such as digital demography, genetics, and computational social science. These networks are typically constructed by hand through a very time-consuming process, which requires comparing large numbers of historical records manually. We develop computational methods for automatically inferring large-scale genealogical networks. A comparison with human-constructed networks attests to the accuracy of the proposed methods. To demonstrate the applicability of the inferred large-scale genealogical networks, we present a longitudinal analysis on the mating patterns observed in a network. This analysis shows a consistent tendency of people choosing a spouse with a similar socioeconomic status, a phenomenon known as assortative mating. Interestingly, we do not observe this tendency to consistently decrease (nor increase) over our study period of 150 years.
  • PDF
    We study transitivity in directed acyclic graphs and its usefulness in capturing nodes that act as bridges between more densely interconnected parts in such type of network. In transitively reduced citation networks degree centrality could be used as a measure of interdisciplinarity or diversity. We study the measure's ability to capture "diverse" nodes in random directed acyclic graphs and citation networks. We show that transitively reduced degree centrality is capable of capturing "diverse" nodes, thus this measure could be a timely alternative to text analysis techniques for retrieving papers, influential in a variety of research fields.

Recent comments

Piotr Migdał Jun 07 2014 09:08 UTC

[Carl Linnaeus](http://en.wikipedia.org/wiki/Carl_Linnaeus) appears to benefit a lot from this particular algorithm (and perhaps any other taking all links with the same value). Just look at [inbound links](http://en.wikipedia.org/wiki/Special:WhatLinksHere/Carl_Linnaeus) - vast majority of them ref

...(continued)
Jaiden Mispy May 31 2014 08:12 UTC

It'd be interesting to see if the results change at all by targeting groups based around subjects other than software development. I'd expect developers to have non-representative knowledge of and interactions with bots.