Centrality measures and analyzing dot-product graphs
In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden.