Abstract:
Rapid climate change threatens both food security and biodiversity. The latter is ultimately a reflection of genetic change. Studying genetic variation in wild populations thus provides insight into how genomes change during evolution and may help explain how species adapt to different environments. While most population genetic studies have focused on the nuclear genome, organellar genomes, particularly those of mitochondria, have received much less attention, partly due to difficulties in assembly. Importantly, organellar genomes in land plants evolve more slowly than nuclear genomes and therefore tend to preserve signals of divergence and evolutionary history over deeper time scales. At the same time, there would be no complex life without mutations, which are the ultimate source of genetic diversity, but how they arise and accumulate remains poorly understood.
The development of long-read sequencing, such as PacBio high-fidelity (HiFi) reads and the most recent versions of Oxford Nanopore Technology (ONT) reads, has made it possible to assemble complete organellar genomes, but the use of these data requires a dedicated bioinformatics tool. The complete and accurate assemblies allow for more comprehensive investigation of organellar genome variation and its evolutionary significance. In parallel, long-read data also improve the quality of nuclear genome assemblies, which in turn enhances the detection of somatic mutations and then deepens our understanding of their mutation patterns and biological context.
This thesis addresses these gaps by developing a tool for assembling plant organellar genomes, applying it to more than one hundred accessions of Arabidopsis thaliana to investigate organellar genetic variation, and by refining mutation detection approach in a highly heterozygous oak genome to detect somatic mutations with higher accuracy in challenging genomes.
In the first chapter, I introduce TIPPo, a reference-free tool I developed for assembling plant organellar genomes using HiFi data. TIPPo addresses challenges specific to organelle genomes, including high copy number, repetitive content, and the presence of organelle-derived sequences in the nuclear genome. Through a combination of read classification and filtering strategies, TIPPo produces high-quality assemblies that outperform existing methods. Because both the nuclear and organellar genomes were assembled from the same sample, the identification of NUPTs and NUMTs and their substitution patterns becomes more reliable.
Building on this, the second chapter presents a population-scale analysis of organellar genomes from 143 A. thaliana accessions. While chloroplast genomes have conserved structure and size, mitochondrial genomes are more variable and they could be grouped into two major classes based on repeat content. I found that the number of large repeats in the mitochondrial genome correlated with sampling latitude, suggesting that geographic or historical factors may have influenced mitochondrial genome structure. In addition, I identified unannotated open reading frames (ORFs) with supporting expression evidence. I found two putative horizontally transferred ORFs, one of which is associated with cytoplasmic male sterility (CMS). These findings suggest that mitochondrial genomes can acquire novel functional elements from related species, enriching their evolutionary plasticity.
The third chapter shifts focus to the origin of new genetic variation, using a long-lived oak tree to study somatic mutations. By reassembling the oak genome with HiFi data and applying a hybrid alignment strategy tailored to high heterozygosity, I recovered substantially more high confidence somatic mutations than previous study. Most of the mutations were C:G > T:A transitions, consistent with the result in mutation accumulation studies. These results provide a clearer view of how mutations accumulate over time and show that high quality assembly and careful alignment strategies are key to detecting somatic mutation in complex plant genomes.
Together, these chapters explore how genetic variation arises, accumulates, and is maintained in plant genomes. By combining tool/approach development with population-scale and individual-level analyses, this work provides new insight into both the structure of organellar genomes and the processes shaping somatic mutation patterns. The methods developed here may be broadly useful for studying genome dynamics in diverse species and offer a foundation for future investigations into the evolutionary and functional consequences of genetic variation.