Abstract:
Graphs are versatile data structures that represent individual data points and the
intricate relationships between them, making them invaluable for modelling complex
real-world systems. Their flexibility allows them to be applied to a wide range of problems, and with appropriate algorithms, they have become crucial in many domains.
In particular, unsupervised machine learning – where patterns are discovered without
labelled data – has greatly benefited from integrating graphs. This thesis presents an
introduction to unsupervised machine learning on graphs, focusing on clustering and
representation learning. We explore the strengths and challenges of applying graphs
to machine learning, emphasizing these two specific tasks. Our contributions span two
distinct perspectives within unsupervised learning on graphs. In the first part, we pro-
pose a novel clustering method inspired by the concept of tangles from mathematical
graph theory. Tangles are used to detect densely connected subsets within a graph,
and we adapt this idea to cluster data across various domains. Our approach yields a
soft clustering hierarchy that is often inherently interpretable. We provide theoretical
performance guarantees across different data types and validate the method’s effective-
ness through extensive empirical evaluations. The second part of this thesis delves into
representation learning on graphs. We investigate the inductive biases of graph auto-
encoders, connecting their non-linear architecture to a linear model. Our empirical
findings show that linear models can match or surpass the performance of non-linear
graph auto-encoders when node features are effectively leveraged. Additionally, we
analyze the constraints imposed by features on the solution space, offering theoretical
and empirical evidence that these features are the key to achieving strong performance
in this setting.