Learning to Generalize Across Distribution Shifts

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/163498
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1634988
http://dx.doi.org/10.15496/publikation-104828
Dokumentart: Dissertation
Erscheinungsdatum: 2025-03-31
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Schölkopf, Bernhard (Prof. Dr.)
Tag der mündl. Prüfung: 2025-03-13
DDC-Klassifikation: 004 - Informatik
Schlagworte: Deep Learning , Maschinelles Lernen
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Biological intelligence demonstrates impressive abilities to learn and adapt to new situations. In contrast, the success of machine learning (ML) often relies on the assumption that data is independent and identically distributed (i.i.d.) during training and testing. When faced with data that deviates from this i.i.d. assumption, ML models are challenged with the experience of what is called Out-of-Distribution (OOD) shifts. These shifts represent changes in the data's underlying patterns, to which ML models are notably less adaptable compared to human capabilities. However, real-world data and applications are fundamentally characterized by a variety of such OOD shifts. In this dissertation, we focus on improving the generalization of ML models, especially those based on deep neural networks, across a wide range of realistic out-of-distribution scenarios. A central approach to achieving this is learning abstract representations that robustly infer the underlying generative factors and associated causal mechanisms of observations. Learning such factorizing representations -- ideally without knowing the factors a priori -- is the primary goal of disentanglement learning to then obtain more adaptable ML models in a further step. Another prerequisite for better generalization is that ML models are adaptable and resilient within dynamic settings, a challenge that mainly concerns reinforcement learning (RL) and continual learning. This work will extensively deal with how such methods behave under distribution shifts and present new approaches to address various weaknesses. We primarily deal with generalization problems in computer vision and robotics using empirical, experimental, as well as theoretical methods. In the first part of this dissertation, we deal with distribution shifts in settings whose underlying factors are known. We show the limitations of current methods in unsupervised learning, especially in disentanglement learning when the underlying factors contain various correlations during training. Although correlations represent a central characteristic of real-world data, they have been largely neglected when developing and testing disentanglement methods. Subsequently, we introduce a new benchmark environment for learning causal structures in robotics. This robotics environment, CausalWorld, allows for systematically assessing the adaptability of RL agents under various degrees of changes in their environment. CausalWorld enables doing interventions on any underlying variables. Then, we investigate the effectiveness of representation learning for OOD generalization within this complex and challenging robotics and RL setting. In addition to systematically investigating distribution shifts within the simulated environment, we also study the transfer of representations from simulation to reality. The presented results of these works are underpinned by large-scale empirical studies analyzing the influence of several representation metrics on different forms of OOD generalization. In the second part of this dissertation, we will focus on learning scenarios that deviate from standard distribution shifts. We address challenges inherent in scenarios of continual learning, mainly focusing on adaptive learning and mitigating catastrophic forgetting by introducing a new information bottleneck mechanism -- so-called Discrete Key-Value Bottlenecks. The proposed bottleneck mechanism allows for sparse, localized, and context-dependent model updates, thereby enabling favorable adaptability and generalization capabilities under strong distribution shifts at training time. Finally, we explore a particular challenge in label shifts of classification models. Here, we introduce the Prediction Update Problem, a problem that arises when ML predictions are to be updated by supposedly better models. To reduce the effect of negative flips, i.e., incorrect predictions that were previously correct, we present a probabilistic Bayesian approach to efficiently address backward compatibility and adaptation under inevitable shifts in the distribution of predictions by updating ML models.

Das Dokument erscheint in: