Benchmarking Model Performance on Pandemic-Threat Viruses with Sarah Gurev

[HPP] Debora MarksOctober 29, 20251h 1min

31 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

Challenges in Viral Prediction

🦠 Viruses pose a significant threat due to rapid evolution and adaptability, making accurate mutation effect prediction crucial.
🔬 While machine learning and sequence data offer promise, viruses present unique biological and informational constraints that challenge existing models.

Evolutionary & Alignment-Based Models

💡 Alignment-based models learn from evolutionary sequences (e.g., pre-2020 coronaviruses) to predict mutation effects, considering site-independent, pairwise, and higher-order interactions.
🧬 The Evecape model combines evolutionary sequences with structural and biophysical information to predict antibody escape mutations, demonstrating its ability to forecast future SARS-CoV-2 variants more effectively than some experimental methods.
💉 These models can aid in vaccine design by predicting likely future variants, helping to create vaccines that offer long-term protection against evolving viruses like SARS-CoV-2 and influenza.

Protein Language Models & Data Constraints

🧠 Protein language models (PLMs), while state-of-the-art for many protein tasks, often perform poorly for viruses due to underrepresentation of viral sequences in training datasets like UniRef.
📊 Clustering methods (e.g., UniRef90, UniRef50) disproportionately reduce the number of viral sequences, leading to less effective training data for viral prediction.
📈 For viruses, larger PLMs (more parameters) continue to improve performance, suggesting they compensate for the lack of specific training data by generalizing from non-viral information.

Improving Model Performance

🏗️ The EVEREST framework (Evolutionary Variant Effect prediction with Reliability ESTimation) was introduced to systematically assess model performance and reliability for viral mutational fitness prediction.
🧬 Structural information can significantly enhance PLM performance for viruses, particularly for stability assays, by allowing models to learn from remote homologs with high structural similarity despite low sequence identity.
✅ Alignment relevance, focusing on sequences with high identity to the query, is a more effective strategy for selecting alignments than simply using deeper alignments, which can dilute signal with irrelevant sequences.

Reliability and Applications

🔑 Reliability metrics, such as sequence diversity for alignment models and pseudo-perplexity for PLMs, can predict model performance and indicate when a model's predictions for a new virus are trustworthy.
⚠️ Evaluation of 40 WHO-prioritized pandemic-threat viruses revealed that current models fail to reliably predict mutations for over half of them, highlighting critical gaps.
🚀 The findings offer actionable recommendations for improving viral mutation effect prediction, supporting pandemic preparedness, vaccine design, and objective assessment of biosecurity risks.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 31 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters19 moments

Key Moments

Transcript229 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Viral EvolutionMachine LearningDeep Mutational ScanningProtein Language ModelsAntibody EscapeVaccine DesignSARS-CoV-2InfluenzaSequence AlignmentStructural InformationReliability EstimationPandemic PreparednessBiosecurity RiskWHO Priority VirusesVariational Autoencoders

Smart Objects40 · 31 links

Person· 1

Products· 4

Concepts· 26

Event· 1

Medias· 5

Companies· 3

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free