Abstract:
Geoscientists use computer simulations to understand and model a myriad of processes taking place on our planet. Bayesian inference is a statistical framework that identifies parameter configurations that make the simulator's predictions consistent with experimental observations. Traditional Bayesian inference approaches make use of the likelihood function, which quantifies the probability of an observation given a particular parameter configuration. However, this likelihood function cannot be evaluated directly for many computer simulators due to their complexity. This limitation necessitates new algorithms for performing inference. One recently proposed solution to this problem is Simulation-Based Inference (SBI).
This thesis consists of three publications that develop and apply new SBI approaches to enable statistical inference for larger, more complex geoscientific problems that were previously infeasible. In the first publication, we introduce a particular inference problem in the field of glaciology. This problem pertains to the inference of surface accumulation and basal melt rates of Antarctic ice shelves from radar measurements of their internal layering structure. We present a statistical framework to describe this inference problem, and apply an existing SBI method to provide an uncertainty-aware solution to this problem for the first time. In the second publication, we develop a new approach to simulation-based inference of function-valued parameters. Current SBI methods require a very large number of simulations to solve geoscientific inference problems, which can be computationally demanding or even unattainable. This is because geoscientific parameters are function-valued, describing quantities that vary in space and/or time and resulting in many values to infer. Our method exploits the spatial and temporal correlations in geoscientific parameters to perform inference using a much smaller number of simulations than existing SBI methods. We apply this approach to the Antarctic ice shelf case study, and show that it reduces the computational cost of inference by two orders of magnitude. In the third publication, we tackle a distinct type of inference problem known as source distribution estimation. This aims to identify a distribution over the parameters that is consistent with a dataset of observations, as opposed to a single or repeated measurements. This inference paradigm is also ubiquitous in geoscientific applications, for instance in extreme event modeling. We develop a new simulation-based approach to estimating source distributions and demonstrate its applicability to challenging scientific tasks.
Overall this thesis develops new statistical inference methods and applies them to solve challenging problems in various geoscience domains. It thus provides a vital connection between machine learning methodology and scientific practice, enabling statistical inference for complex and high-dimensional simulators in geoscience.