Benchmarking Model Performance on Pandemic-Threat Viruses with Sarah Gurev
[HPP] Debora MarksOctober 29, 20251h 1min
31 connections·40 entities in this video→Challenges in Viral Prediction
- 🦠 Viruses pose a significant threat due to rapid evolution and adaptability, making accurate mutation effect prediction crucial.
- 🔬 While machine learning and sequence data offer promise, viruses present unique biological and informational constraints that challenge existing models.
Evolutionary & Alignment-Based Models
- 💡 Alignment-based models learn from evolutionary sequences (e.g., pre-2020 coronaviruses) to predict mutation effects, considering site-independent, pairwise, and higher-order interactions.
- 🧬 The Evecape model combines evolutionary sequences with structural and biophysical information to predict antibody escape mutations, demonstrating its ability to forecast future SARS-CoV-2 variants more effectively than some experimental methods.
- 💉 These models can aid in vaccine design by predicting likely future variants, helping to create vaccines that offer long-term protection against evolving viruses like SARS-CoV-2 and influenza.
Protein Language Models & Data Constraints
- 🧠 Protein language models (PLMs), while state-of-the-art for many protein tasks, often perform poorly for viruses due to underrepresentation of viral sequences in training datasets like UniRef.
- 📊 Clustering methods (e.g., UniRef90, UniRef50) disproportionately reduce the number of viral sequences, leading to less effective training data for viral prediction.
- 📈 For viruses, larger PLMs (more parameters) continue to improve performance, suggesting they compensate for the lack of specific training data by generalizing from non-viral information.
Improving Model Performance
- 🏗️ The EVEREST framework (Evolutionary Variant Effect prediction with Reliability ESTimation) was introduced to systematically assess model performance and reliability for viral mutational fitness prediction.
- 🧬 Structural information can significantly enhance PLM performance for viruses, particularly for stability assays, by allowing models to learn from remote homologs with high structural similarity despite low sequence identity.
- ✅ Alignment relevance, focusing on sequences with high identity to the query, is a more effective strategy for selecting alignments than simply using deeper alignments, which can dilute signal with irrelevant sequences.
Reliability and Applications
- 🔑 Reliability metrics, such as sequence diversity for alignment models and pseudo-perplexity for PLMs, can predict model performance and indicate when a model's predictions for a new virus are trustworthy.
- ⚠️ Evaluation of 40 WHO-prioritized pandemic-threat viruses revealed that current models fail to reliably predict mutations for over half of them, highlighting critical gaps.
- 🚀 The findings offer actionable recommendations for improving viral mutation effect prediction, supporting pandemic preparedness, vaccine design, and objective assessment of biosecurity risks.
Knowledge graph40 entities · 31 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover · drag to explore
40 entities
Chapters19 moments
Key Moments
Transcript229 segments
Full Transcript
Topics15 themes
What’s Discussed
Viral EvolutionMachine LearningDeep Mutational ScanningProtein Language ModelsAntibody EscapeVaccine DesignSARS-CoV-2InfluenzaSequence AlignmentStructural InformationReliability EstimationPandemic PreparednessBiosecurity RiskWHO Priority VirusesVariational Autoencoders
Smart Objects40 · 31 links
Person· 1
Products· 4
Concepts· 26
Event· 1
Medias· 5
Companies· 3