Abstract:
Tree search algorithms form a fundamental class of optimization algorithms for discrete, sequential-decision-making problems and hierarchical approaches to continuous optimization. Their goal is to find a path from the root of a tree to a leaf node that maximizes an objective function over the leaf nodes. However, the number of leaves in a tree typically grows exponentially with the depth of the tree. As a result, it is necessary to balance the exploitation of promising paths with the exploration of new paths to search efficiently.
Solving this trade-off between exploitation and exploration requires regularization, i.e., prior assumptions on the objective function or the optimal policy for traversing the tree.
The main focus of this thesis is the design of probabilistic models and algorithms that can incorporate and exploit such assumptions, e.g., by having a Gaussian process prior over the objective function as it is common in the field of Bayesian optimization. We begin by formally connecting Bayesian optimization to Optimistic optimization, a class of non-probabilistic tree search algorithms, by mapping the kernel function of a Gaussian process prior to the semi-metric used by Optimistic optimization algorithms for regularizing the search problem. This leads to a runtime-efficient tree search algorithm for the Bayesian optimization setting, suitable for objective functions with low evaluation costs.
In scenarios where evaluating the values of leaf nodes is expensive, reducing the overall number of such evaluations gains importance over the runtime efficiency of the decision-making process for which path to follow down the tree. We develop a Bayesian tree search algorithm that exploits correlations between nodes across the search tree to improve the search efficiency along this dimension.
While the above approaches assume generic Gaussian priors for the objective function, there are also interesting non-Gaussian settings. An increasingly important example is the decoding process of Large Language Models (LLMs), where paths in the tree represent token sequences and the objective function is their likelihood. We assume a prior for the stepwise, categorical likelihood over the next token and under independence and regularity assumptions, develop an efficient probabilistic decoding method for LLMs.
Finally, we shift our focus from probabilistic models for tree search to a particular tree search problem relevant as a subroutine in other probabilistic methods, like Bayesian quadrature. The problem consists of finding sets of k items (or points) that maximize the log-determinant over a Kernel Gram matrix. We consider a regularized, stochastic version of this search problem -- which coincides with sampling from k-Determinantal Point Processes (k-DPPs). We suggest a greedy, approximate search strategy and show that it is a (1-1/e)-approximation to the optimal one.
Taken together, this dissertation deepens the understanding of regularization in tree search methods through both theoretical analysis and empirical validation. It also proposes novel techniques that improve the data efficiency and computational performance of tree search algorithms.