Meta CEO Mark Zuckerberg defends use of pirated ebooks for AI training amid legal battles
Zuckerberg likened Meta's use of pirated e-books for AI training to YouTube's handling of copyrighted content.
Plaintiffs allege Meta used LibGen and Z-Library to train Llama models without proper authorization.
Court documents reveal internal concerns about the legality of using pirated e-books for AI training.
Meta CEO Mark Zuckerberg has now found himself in the center stage amid the legal battles, defending the usage of pirated e-books to train AI models. As per the reports citing the excerpts that were revealed in the court filing, Zuckerberg compared Meta’s actions to YouTube’s efforts to remove pirated content, arguing that using such data sets, while potentially controversial, is not unreasonable. For the unversed, the company is facing an AI copyright case, which is among the many AI copyright cases to train AI.
Concerns about the AI models being trained on copyrighted content without authorization have been brought up by this case among authors, publishers, and intellectual property owners. Prominent writers including Ta-Nehisi Coates and Sarah Silverman have accused Meta of training their Llama AI models on data from the LibGen e-book repository, which is a collection of pirated books.
However, Zuckerberg’s reaction is on fair use, with the CEO drawing a comparison between Meta’s usage of illegally downloaded e-books and YouTube’s practice of hosting potentially illegal content while working to remove it. He maintained that it would be unjust and impractical to prohibit the use of data from these sources.
Also read: Samsung Galaxy S25 Ultra India price leaks ahead of Unpacked 2025: Here’s how much it can cost
“Do I want to establish a policy prohibiting individuals from using YouTube because some of the content may be copyrighted? No,” Zuckerberg replied during the deposition. However, he recognized that Meta should use caution when dealing with content that may violate copyright rules.
Despite Zuckerberg’s remarks, court documents indicate that Meta staff members were unsure if utilizing LibGen was legal. Zuckerberg stated in the deposition that he knew very little about LibGen, saying, “I haven’t really heard of it,” even though there was proof that Meta had utilized it to train at least one of its Llama AI models.
Plaintiffs allege that Meta cross-referenced pirated books on LibGen with copyrighted works to assess the viability of pursuing licensing deals with publishers. Furthermore, according to the amended complaint, Meta trained the latest version of the Llama model, Llama 3, using pirated e-books from another illicit source, Z-Library. The plaintiffs also claim that Meta intends to use similar data sets for its upcoming Llama 4 model.
Ashish Singh
Ashish Singh is the Chief Copy Editor at Digit. Previously, he worked as a Senior Sub-Editor with Jagran English from 2022, and has been a journalist since 2020, with experience at Times Internet. Ashish specializes in Technology. In his free time, you can find him exploring new gadgets, gaming, and discovering new places. View Full Profile