Accident or cover-up? OpenAI allegedly deletes potential evidence in copyright case

By Ayushi Jain | Updated on 22-Nov-2024

HIGHLIGHTS

OpenAI is facing new accusations from The New York Times and Daily News, who are suing the company for allegedly using their copyrighted content without consent.

The publishers’ lawyers have claimed that OpenAI engineers accidentally deleted critical data that could have played a key role in their legal battle.

While OpenAI tried to recover the lost information, it only partially succeeded.

Accident or cover-up? OpenAI allegedly deletes potential evidence in copyright case

Ayushi Jain

22-Nov-2024

OpenAI is facing new accusations from The New York Times and Daily News, who are suing the company for allegedly using their copyrighted content to train its AI models without consent. In a recent turn of events, the publishers’ lawyers have claimed that OpenAI engineers accidentally deleted critical data that could have played a key role in their legal battle.

Survey

✅ Thank you for completing the survey!

According to the publishers’ legal team, OpenAI engineers accidentally deleted important data that could have helped in the ongoing lawsuit. This data was stored on a virtual machine — a software-based computer system set up specifically for the case — that OpenAI provided to the plaintiffs to search for their content in its AI training sets.

Add

As A Trusted Source For Google.

Add as a preferred source on Google

Also read: OpenAI plans to build its first AI chip by 2026, check details

Since November 1, the publishers’ lawyers and experts have spent over 150 hours combing through OpenAI’s training data. But on November 14, engineers at OpenAI erased all the search data stored on one of the virtual machines, reports TechCrunch. While OpenAI tried to recover the lost information, it only partially succeeded. The recovered data was missing the original folder structures and file names, rendering it useless for identifying whether articles from The Times and Daily News had been used to train OpenAI’s models.

In a letter filed in the U.S. District Court for the Southern District of New York, lawyers for the publishers expressed their frustration. “News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” the letter stated. “The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”

Also read: Is OpenAI violating copyright laws? Former company employee says YES

While the letter emphasised there was no reason to believe the deletion was intentional, the plaintiffs’ counsel pointed out that the incident highlights OpenAI’s responsibility. OpenAI “is in the best position to search its own datasets” for potentially infringing content using its own tools, they said.

The company continues to defend its practice of using publicly available content to train its AI as fair use. Despite this, OpenAI has made licensing deals with some publishers, including the Associated Press and News Corp., reportedly paying millions for such agreements. However, OpenAI has neither confirmed nor denied if it used copyrighted works without permission in this case.

Ayushi Jain

Ayushi works as Chief Copy Editor at Digit, covering everything from breaking tech news to in-depth smartphone reviews. Prior to Digit, she was part of the editorial team at IANS. View Full Profile