Skip to main content

Jason Corso on Voxel51, Data-Centric AI, and Verified Auto-Labeling

Super Data Science: ML & AI Podcast with Jon KrohnJuly 18, 202529 min506 views
28 connections·40 entities in this video→

The Data-Centric Approach in Computer Vision

  • πŸ’‘ Professor Jason Corso highlights that data quality is paramount, often more so than algorithmic advancements, in achieving high-performance computer vision models.
  • πŸš€ This realization led to the founding of Voxel51, a company dedicated to building better tools for visual AI development, driven by the mantra "better data, better models."
  • 🧠 The increasing scale of datasets, from hundreds of images to billions, makes manual data analysis and intuition-building nearly impossible for individual practitioners.

Voxel51's Evolution and Mission

  • πŸ› οΈ Voxel51 initially focused on providing a flexible, open-source tool for data scientists and computer vision scientists to analyze and work with their visual data.
  • πŸ“ˆ The company's open-source tool has seen significant adoption, with over three million installs and a highly-starred GitHub repository, indicating a strong market need.
  • 🎯 Voxel51 strategically avoided being an annotation company, focusing instead on tooling that supports the entire data lifecycle.

Verified Auto-Labeling: The Future of Data Annotation

  • πŸ€– Voxel51's new product, Verified Auto-Labeling, leverages foundation models to automatically generate labels for raw media.
  • βœ… The system ranks these auto-generated labels, allowing users to accept around 70% automatically, significantly reducing time and cost.
  • 🧩 Human reviewers are then directed to focus only on the remaining 30% of challenging, corner-case scenarios that require verification.
  • 🌟 This approach is described as "curation is the new annotation," moving beyond manual labeling to intelligent data management.

The Next Frontier: Annotation 2.0

  • πŸ—£οΈ Looking ahead, Corso envisions Annotation 2.0, where AI agents become more autonomous, asking humans questions only when necessary to refine labels.
  • 🀝 This future state promises even less human involvement, driven by AI agents that can better understand and query data based on problem statements.
  • πŸ“Š The ultimate goal is to enable the creation of high-performance computer vision models with significantly reduced effort and cost through advanced automation.
Knowledge graph40 entities Β· 28 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover Β· drag to explore
40 entities
Chapters14 moments

Key Moments

Transcript109 segments

Full Transcript

Topics12 themes

What’s Discussed

Computer VisionVoxel51Data-Centric AIMachine LearningArtificial IntelligenceVerified Auto-LabelingFoundation ModelsData AnnotationOpen SourceRoboticsUniversity of MichiganVisual AI
Smart Objects40 Β· 28 links
PeopleΒ· 3
CompaniesΒ· 6
ConceptsΒ· 18
MediasΒ· 6
EventsΒ· 2
ProductsΒ· 4
LocationΒ· 1