Methods, algorithms and impossibility results for machine learning on graphs

Sotiropoulos, Konstantinos

Methods, algorithms and impossibility results for machine learning on graphs

Files

Sotiropoulos_bu_0017E_18234.pdf(5.53 MB)

Date

2023

Authors

Sotiropoulos, Konstantinos

URI

https://hdl.handle.net/2144/49699

Abstract

In recent years, there has been a remarkable increase in the use of machine learning techniques for analyzing graphs and their associated applications such as node classification, link prediction, community detection, and generating new graph instances with desired characteristics. This motivates the desire to create innovative and effective algorithms, as well as explore the potential and constraints of modern deep learning techniques, which have garnered considerable attention. This dissertation makes contributions in both of these areas. First, we propose innovative and scalable methods that rely solely on local node information for both unsupervised and supervised graph learning tasks. Specifically, we emphasize the significance of local triangle counts in community detection and introduce a novel triangle-aware spectral sparsification algorithm that enhances the efficiency of this task. Secondly, we analyze a Twitter dataset and create a supervised learning framework that leverages the multiple layers of interaction among Twitter users, resulting in a more precise prediction of new links among them. The emergence of deep learning has sparked interest in the use of unsupervised node embeddings, which are low-dimensional vector representations of nodes, and have become the primary tool in many graph-based machine learning tasks. A fundamental question arises: Can real-world networks be accurately represented in a low-dimensional space? We contribute to the understanding of node embeddings in two significant ways. Firstly, we prove that any graph with bounded maximum degree can be embedded in low dimensions, and we offer an algorithm that accurately embeds real-world networks in a few dimensions, typically in the order of tenths. Secondly, we explore contemporary embedding techniques and find that their embeddings are not always precise, as different graphs can have similar low-dimensional representations. However, despite the lack of exactness, these methods successfully encode sufficient information for high performance on node classification tasks. Finally, we study graph generative models under a unique novel criterion: their ability to generate graphs that are simultaneously edge-diverse and rich in small-sized dense subgraphs. We show the limitations of edge independent graph generative models and develop a hierarchy of models that are progressively more powerful in terms of mimicking better real-world networks. We complement our analysis with simple baseline methods relying on dense subgraph detection that perform competitively against more complex methods.

Description

2023

Collections

Boston University Theses & Dissertations

Full item page