Graphs in Unsupervised Machine Learning

DSpace Repository


Dateien:

URI: http://hdl.handle.net/10900/161963
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1619638
http://dx.doi.org/10.15496/publikation-103295
Dokumentart: PhDThesis
Date: 2025-02-14
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Luxburg, Ulrike von (Prof. Dr.)
Day of Oral Examination: 2025-01-21
DDC Classifikation: 004 - Data processing and computer science
Keywords: Graph , Maschinelles Lernen
License: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Show full item record

Abstract:

Graphs are versatile data structures that represent individual data points and the intricate relationships between them, making them invaluable for modelling complex real-world systems. Their flexibility allows them to be applied to a wide range of problems, and with appropriate algorithms, they have become crucial in many domains. In particular, unsupervised machine learning – where patterns are discovered without labelled data – has greatly benefited from integrating graphs. This thesis presents an introduction to unsupervised machine learning on graphs, focusing on clustering and representation learning. We explore the strengths and challenges of applying graphs to machine learning, emphasizing these two specific tasks. Our contributions span two distinct perspectives within unsupervised learning on graphs. In the first part, we pro- pose a novel clustering method inspired by the concept of tangles from mathematical graph theory. Tangles are used to detect densely connected subsets within a graph, and we adapt this idea to cluster data across various domains. Our approach yields a soft clustering hierarchy that is often inherently interpretable. We provide theoretical performance guarantees across different data types and validate the method’s effective- ness through extensive empirical evaluations. The second part of this thesis delves into representation learning on graphs. We investigate the inductive biases of graph auto- encoders, connecting their non-linear architecture to a linear model. Our empirical findings show that linear models can match or surpass the performance of non-linear graph auto-encoders when node features are effectively leveraged. Additionally, we analyze the constraints imposed by features on the solution space, offering theoretical and empirical evidence that these features are the key to achieving strong performance in this setting.

This item appears in the following Collection(s)