Dense subgraph discovery: novel algorithms and applications
OA Version
Citation
Abstract
Dense subgraph discovery (DSD) is an important technique for analyzing and extracting valuable insights from large-scale graph data across various fields, including social sciences, biology, and chemistry. Lying at the core of DSD is the densest subgraph problem (DSP), which finds the subgraph that maximizes induced edge density given a graph. While the DSD problems, especially DSP, have been extensively studied in the last several decades, the major focus was simple undirected graphs. Nowadays, the graph data we collect and store carries more complex information beyond connections between nodes. For example, edges in transaction networks are associated with weights representing the transaction amounts, and social networks have users with attributes encoded as feature vectors. These complex graph data and the emergence of new use cases create challenges in DSD, necessitating novel formulations and solutions. How can we detect anomalous subgraphs in transaction networks using DSD? Given a social network, are dense clusters gender-balanced, and how can we detect them? Even seemingly unrelated questions, such as measuring the statistical significance of a motif, can be tackled using DSD. In this dissertation, we study a set of problems revolving around DSD on graphs that are associated with edge and node attributes, and possibly multiple edge layers. We propose novel mathematical formulations for these problems, study their hardness, and propose efficient algorithms with performance guarantees. Our methods are evaluated on synthetic data to verify their correctness and efficiency. We further deploy them on real-world networks, where their abilities are exhibited in applications, including anomaly detection, social network mining, and community detection.
Description
2024