Accident or cover-up? OpenAI allegedly deletes potential evidence in copyright case

Updated on 22-Nov-2024
HIGHLIGHTS

OpenAI is facing new accusations from The New York Times and Daily News, who are suing the company for allegedly using their copyrighted content without consent.

The publishers’ lawyers have claimed that OpenAI engineers accidentally deleted critical data that could have played a key role in their legal battle.

While OpenAI tried to recover the lost information, it only partially succeeded.

OpenAI is facing new accusations from The New York Times and Daily News, who are suing the company for allegedly using their copyrighted content to train its AI models without consent. In a recent turn of events, the publishers’ lawyers have claimed that OpenAI engineers accidentally deleted critical data that could have played a key role in their legal battle.

According to the publishers’ legal team, OpenAI engineers accidentally deleted important data that could have helped in the ongoing lawsuit. This data was stored on a virtual machine — a software-based computer system set up specifically for the case — that OpenAI provided to the plaintiffs to search for their content in its AI training sets.  

Also read: OpenAI plans to build its first AI chip by 2026, check details

Since November 1, the publishers’ lawyers and experts have spent over 150 hours combing through OpenAI’s training data. But on November 14, engineers at OpenAI erased all the search data stored on one of the virtual machines, reports TechCrunch. While OpenAI tried to recover the lost information, it only partially succeeded. The recovered data was missing the original folder structures and file names, rendering it useless for identifying whether articles from The Times and Daily News had been used to train OpenAI’s models.  

In a letter filed in the U.S. District Court for the Southern District of New York, lawyers for the publishers expressed their frustration. “News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” the letter stated. “The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”

Also read: Is OpenAI violating copyright laws? Former company employee says YES

While the letter emphasised there was no reason to believe the deletion was intentional, the plaintiffs’ counsel pointed out that the incident highlights OpenAI’s responsibility. OpenAI “is in the best position to search its own datasets” for potentially infringing content using its own tools, they said.  

The company continues to defend its practice of using publicly available content to train its AI as fair use. Despite this, OpenAI has made licensing deals with some publishers, including the Associated Press and News Corp., reportedly paying millions for such agreements. However, OpenAI has neither confirmed nor denied if it used copyrighted works without permission in this case.

Ayushi Jain

Tech news writer by day, BGMI player by night. Combining my passion for tech and gaming to bring you the latest in both worlds.

Connect On :