Oct 17 2017 cs.SI
Why is a given node in a time-evolving graph ($t$-graph) marked as an anomaly by an off-the-shelf detection algorithm? Is it because of the number of its outgoing or incoming edges, or their timings? How can we best convince a human analyst that the node is anomalous? Our work aims to provide succinct, interpretable, and simple explanations of anomalous behavior in $t$-graphs (communications, IP-IP interactions, etc.) while respecting the limited attention of human analysts. Specifically, we extract key features from such graphs, and propose to output a few pair (scatter) plots from this feature space which "best" explain known anomalies. To this end, our work has four main contributions: (a) problem formulation: we introduce an "analyst-friendly" problem formulation for explaining anomalies via pair plots, (b) explanation algorithm: we propose a plot-selection objective and the LookOut algorithm to approximate it with optimality guarantees, (c) generality: our explanation algorithm is both domain- and detector-agnostic, and (d) scalability: we show that LookOut scales linearly on the number of edges of the input graph. Our experiments show that LookOut performs near-ideally in terms of maximizing explanation objective on several real datasets including Enron e-mail and DBLP coauthorship. Furthermore, LookOut produces fast, visually interpretable and intuitive results in explaining "ground-truth" anomalies from Enron, DBLP and LBNL (computer network) data.
The dismantling network problem only asks the minimal vertex set of a graph after removing which the remaining graph will break into connected components of sub-extensive size, but we should also consider the efficiency of intermediate states during the entire dismantling process, which is measured by the general performance R in this paper. In order to improve the general performance of the belief-propagation decimation (BPD) algorithm, we introduce a compound algorithm (CA) mixing the BPD and the node explosive percolation (NEP) algorithm. In this CA, the NEP algorithm will rearrange and optimize the head part of a dismantling sequence given by the BPD. Two ancestor algorithms are connected at the joint point where the general performance can be optimized. It dismantles a graph to small pieces as quickly as the BPD, and it is with the efficiency of the NEP during the entire dismantling process. We find that a wise joint point is where the BPD breaks the original graph to subgraphs no longer larger than the 1% of the original one. We refer the CA with this settled joint point as the fast CA and the fast CA is in the same complexity class with the BPD algorithm. The computation on some real-world instances also exhibits that using the fast CA to optimize the intermediate process of a dismantling algorithm is an effective approach.
Social network analysis provides meaningful information about behavior of network members that can be used in diverse applications such as classification, link prediction, etc. however, network analysis is computationally expensive because of feature learning for different applications. In recent years, many researches have focused on feature learning methods in social networks. Network embedding represents the network in a lower dimensional representation space with the same properties which presents a compressed representation of the input network. In this paper, we introduce a novel algorithm named "CARE" for network embedding that can be used for different types of networks including weighted, directed and complex. While current methods try to preserve local neighborhood information of nodes, we utilize local neighborhood and community information of network nodes to cover both local and global structure of social networks. CARE builds customized paths, which are consisted of local and global structure of network nodes, as a basis for network embedding and uses skip-gram model to learn representation vector of nodes. Then, stochastic gradient descent is used to optimize our objective function and learn the final representation of nodes. Our method can be scalable when new nodes are appended to network without information loss. Parallelize generation of customized random walks is also used for speeding up CARE. We evaluate the performance of CARE on multi label classification and link prediction tasks. Experimental results on different networks indicate that the proposed method outperforms others in both Micro-f1 and Macro-f1 measures for different size of training data.
Predicting fine-grained interests of users with temporal behavior is important to personalization and information filtering applications. However, existing interest prediction methods are incapable of capturing the subtle degreed user interests towards particular items, and the internal time-varying drifting attention of individuals is not studied yet. Moreover, the prediction process can also be affected by inter-personal influence, known as behavioral mutual infectivity. Inspired by point process in modeling temporal point process, in this paper we present a deep prediction method based on two recurrent neural networks (RNNs) to jointly model each user's continuous browsing history and asynchronous event sequences in the context of inter-user behavioral mutual infectivity. Our model is able to predict the fine-grained interest from a user regarding a particular item and corresponding timestamps when an occurrence of event takes place. The proposed approach is more flexible to capture the dynamic characteristic of event sequences by using the temporal point process to model event data and timely update its intensity function by RNNs. Furthermore, to improve the interpretability of the model, the attention mechanism is introduced to emphasize both intra-personal and inter-personal behavior influence over time. Experiments on real datasets demonstrate that our model outperforms the state-of-the-art methods in fine-grained user interest prediction.
Oct 17 2017 cs.SI
In this paper, we describe \sc quantitative graph theory and argue it is a new graph-theoretical branch in network science, however, with significant different features compared to classical graph theory. The main goal of quantitative graph theory is the structural quantification of information contained in complex networks by employing a \it measurement approach based on numerical invariants and comparisons. Furthermore, the methods as well as the networks do not need to be deterministic but can be statistic. As such this complements the field of classical graph theory, which is descriptive and deterministic in nature. We provide examples of how quantitative graph theory can be used for novel applications in the context of the overarching concept network science.
Oct 17 2017 cs.SI
With a steadily growing human population and rapid advancements in technology, the global human network is increasing in size and connection density. This growth exacerbates networked global threats and can lead to unexpected consequences such as global epidemics mediated by air travel, threats in cyberspace, global governance, etc. A quantitative understanding of the mechanisms guiding this global network is necessary for proper operation and maintenance of the global infrastructure. Each year the World Economic Forum publishes an authoritative report on global risks, and applying this data to a CARP model, we answer critical questions such as how the network evolves over time. In the evolution, we compare not the current states of the global risk network at different time points, but its steady state at those points, which would be reached if the risk were left unabated. Looking at the steady states show more drastically the differences in the challenges to the global economy and stability the world community had faced at each point of the time. Finally, we investigate the influence between risks in the global network, using a method successful in distinguishing between correlation and causation. All results presented in the paper were obtained using detailed mathematical analysis with simulations to support our findings.
Spreading is a ubiquitous process in the social, biological and technological systems. Therefore, identifying influential spreaders, which is important to prevent epidemic spreading and to establish effective vaccination strategies, is full of theoretical and practical significance. In this paper, a weighted h-index centrality based on virtual nodes extension is proposed to quantify the spreading influence of nodes in complex networks. Simulation results on real-world networks reveal that the proposed method provides more accurate and more consistent ranking than the five classical methods. Moreover, we observe that the monotonicity and the computational complexity of our measure can also yield excellent performance.