Apple and other firms might have secretly used YouTube content to train AI

Apple and other firms might have secretly used YouTube content to train AI
HIGHLIGHTS

AI companies typically maintain secrecy about where they source their training data.

Top firms have reportedly used content from YouTube to train their AI.

Creators claim their videos were used without their knowledge.

Imagine your favourite YouTube videos helping to teach Apple and other big companies’ AI systems. A recent report reveals that top firms have used content from YouTube to train their artificial intelligence without consent. Let’s delve into the details.

AI companies typically maintain secrecy about where they source their training data. However, an investigation by Proof News uncovered that some of the wealthiest AI firms globally have utilised content from thousands of YouTube videos to train their AI systems. This occurred despite YouTube’s policies prohibiting the use of materials from the platform without proper permission.

Also read: Beware! Having multiple SIM cards could land you in trouble: Check your numbers now

Apple and other top firms secretly used YouTube content to train AI

According to the report, subtitles extracted from 173,536 YouTube videos, originating from over 48,000 channels, were utilised by prominent Silicon Valley companies such as Anthropic, Nvidia, Apple, and Salesforce.

The dataset, known as YouTube Subtitles, includes video transcripts from educational and online learning channels such as Khan Academy, MIT and Harvard, alongside content from major media outlets such as The Wall Street Journal, NPR, and the BBC.

Notably, shows like “The Late Show With Stephen Colbert,” “Last Week Tonight With John Oliver,” and “Jimmy Kimmel Live” also contributed to this dataset. Additionally, material from YouTube celebrities like MrBeast, Marques Brownlee, Jacksepticeye, and PewDiePie was employed in training AI models. Some of the material used also promoted conspiracies such as the “flat-Earth theory.”

Apple and other top firms secretly used YouTube content to train AI

“No one came to me and said, ‘We would like to use this,’ ” said David Pakman, host of “The David Pakman Show.” Nearly 160 of his videos were included in the YouTube Subtitles training dataset. 

“It’s theft,” stated Dave Wiskus, CEO of Nebula, a streaming service partly owned by creators whose work has been taken from YouTube for AI training. Wiskus emphasised the disrespect in using creators’ content without consent, particularly as studios might employ “generative AI to replace as many of the artists along the way as they can.”

Ayushi Jain

Ayushi Jain

Tech news writer by day, BGMI player by night. Combining my passion for tech and gaming to bring you the latest in both worlds. View Full Profile

Digit.in
Logo
Digit.in
Logo