Jason Corso on Voxel51, Data-Centric AI, and Verified Auto-Labeling
Super Data Science: ML & AI Podcast with Jon KrohnJuly 18, 202529 min506 views
28 connectionsΒ·40 entities in this videoβThe Data-Centric Approach in Computer Vision
- π‘ Professor Jason Corso highlights that data quality is paramount, often more so than algorithmic advancements, in achieving high-performance computer vision models.
- π This realization led to the founding of Voxel51, a company dedicated to building better tools for visual AI development, driven by the mantra "better data, better models."
- π§ The increasing scale of datasets, from hundreds of images to billions, makes manual data analysis and intuition-building nearly impossible for individual practitioners.
Voxel51's Evolution and Mission
- π οΈ Voxel51 initially focused on providing a flexible, open-source tool for data scientists and computer vision scientists to analyze and work with their visual data.
- π The company's open-source tool has seen significant adoption, with over three million installs and a highly-starred GitHub repository, indicating a strong market need.
- π― Voxel51 strategically avoided being an annotation company, focusing instead on tooling that supports the entire data lifecycle.
Verified Auto-Labeling: The Future of Data Annotation
- π€ Voxel51's new product, Verified Auto-Labeling, leverages foundation models to automatically generate labels for raw media.
- β The system ranks these auto-generated labels, allowing users to accept around 70% automatically, significantly reducing time and cost.
- π§© Human reviewers are then directed to focus only on the remaining 30% of challenging, corner-case scenarios that require verification.
- π This approach is described as "curation is the new annotation," moving beyond manual labeling to intelligent data management.
The Next Frontier: Annotation 2.0
- π£οΈ Looking ahead, Corso envisions Annotation 2.0, where AI agents become more autonomous, asking humans questions only when necessary to refine labels.
- π€ This future state promises even less human involvement, driven by AI agents that can better understand and query data based on problem statements.
- π The ultimate goal is to enable the creation of high-performance computer vision models with significantly reduced effort and cost through advanced automation.
Knowledge graph40 entities Β· 28 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover Β· drag to explore
40 entities
Chapters14 moments
Key Moments
Transcript109 segments
Full Transcript
Topics12 themes
Whatβs Discussed
Computer VisionVoxel51Data-Centric AIMachine LearningArtificial IntelligenceVerified Auto-LabelingFoundation ModelsData AnnotationOpen SourceRoboticsUniversity of MichiganVisual AI
Smart Objects40 Β· 28 links
PeopleΒ· 3
CompaniesΒ· 6
ConceptsΒ· 18
MediasΒ· 6
EventsΒ· 2
ProductsΒ· 4
LocationΒ· 1