Abstract:
Machine learning can help meet rising demand for medical diagnosis and improve patient outcomes, but medicine is a high-risk domain in which uncertainty is pervasive and single-point predictions are insufficient. Despite this, the common practice in medical AI is to develop models for isolated tasks. This paradigm is fundamentally flawed, as it ignores how uncertainty from one step in a clinical workflow cascades and compounds, potentially leading to overconfident and unreliable downstream decisions. This thesis argues for a paradigm shift, moving from isolated models to a holistic, pipeline-aware framework. For machine learning to be deployed safely and effectively, uncertainty must be formally represented, propagated between tasks, made expressive through calibration, and ultimately leveraged to guide clinical decisions. To this end, we develop and validate a suite of methods that implement this vision. First, we establish a method for propagating uncertainty from upstream data acquisition (accelerated MRI) to downstream analysis (segmentation). Second, we demonstrate how leveraging this propagated uncertainty, by marginalizing over a distribution of plausible segmentations, significantly improves the performance and robustness of a final clinical decision (glaucoma diagnosis). Third, we make uncertainties more expressive and trustworthy through a novel method for distribution-free, subgroup-specific calibration, enabling reliable error control for dose estimation in radiation therapy. Finally, we integrate these principles into a closed-loop system where calibrated uncertainty in clinical metrics dynamically guides the MRI acquisition process to optimize scan time. Together, these contributions provide a methodological foundation for integrated, uncertainty-aware systems. By treating diagnosis as a sequence of dependent tasks, we show how expressive uncertainties can be propagated and acted upon end-to-end. Through demonstrations in accelerated MRI, glaucoma diagnosis, and radiation therapy, this work highlights a pathway towards building safer, more efficient, and clinically trustworthy ML-assisted diagnostics.