Last publication: Predicting natural variation in the yeast phenotypic landscape with machine learning – Mol Syst Biol (Sept. 2025)

0 Comments

Authors: Sakshi Khaiwal, Matteo De Chiara, Benjamin P Barré, Inigo Barrio-Hernandez, Simon Stenberg, Pedro Beltrao, Jonas Warringer, and Gianni Liti

Abstract: Most organismal traits result from the complex interplay of many genetic and environmental factors, making their prediction difficult. Here, we used machine learning (ML) models to explore phenotype predictions for 223 traits measured across 1011 genome-sequenced Saccharomyces cerevisiae strains isolated worldwide. We benchmarked a ML pipeline with multiple linear and non-linear models to predict phenotypes from genotypes and gene expression, and determined gradient boosting machines as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data mostly in similar conditions were used, suggesting that useful information can be conveyed across phenotypes. Overall, our study underscores the power of ML to interpret the functional outcome of genetic variants.

Read the full article

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 847581, the Region Sud and the UCA J.E.D.I