Amazon Found Suspected Child Sex Abuse Material in AI Training Data
CBS NewsFebruary 3, 20263 min19,514 views
8 connectionsΒ·11 entities in this videoβAI Training Data Concerns
- π‘ Companies are racing to acquire massive datasets to train AI models, often scraping data from the internet or licensing it from external sources.
- π Amazon reportedly discovered hundreds of thousands of cases of suspected child sex abuse material within its AI training data.
- β οΈ While Amazon removed the content before training, they did not share the sources of this material with authorities.
Reporting and Industry Response
- π The National Center for Missing and Exploited Children (NCMEC) called Amazon an outlier for not providing details about the data's origin.
- π Other major tech companies like Meta, OpenAI, and Google reported seeing only a few instances of such material, significantly less than Amazon's reported numbers.
- π« Child safety experts urge companies to ensure their data is clean from the outset and to scrub it thoroughly before ingestion into AI models.
Potential Repercussions of Compromised Data
- π§ There is concern that if harmful data is included in AI training sets, the models could replicate or perpetuate the distribution of such material.
- π Experts worry that the primary focus on having the best AI models might lead to a reactive rather than proactive approach to data safety, potentially making this a recurring issue.
Knowledge graph11 entities Β· 8 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover Β· drag to explore
11 entities
Chapters2 moments
Key Moments
Transcript14 segments
Full Transcript
Topics10 themes
Whatβs Discussed
Artificial IntelligenceAI Training DataChild Sex Abuse MaterialData ScrapingData LicensingNational Center for Missing and Exploited ChildrenData ScrubbingData SecurityTech IndustryAI Ethics
Smart Objects11 Β· 8 links
CompaniesΒ· 5
ConceptsΒ· 3
ProductΒ· 1
PeopleΒ· 2