A federal district judge in New York recently ruled that OpenAI must turn over approximately 20 million anonymized ChatGPT user logs to The New York Times and other plaintiffs, marking a significant development in the high-profile AI copyright lawsuit. The order denied OpenAI’s request to limit the scope of discovery and emphasized that these logs are crucial for assessing whether ChatGPT replicates copyrighted content.
The New York Times sued OpenAI in December 2023, accusing it of using news content to train its models without authorization. In January, OpenAI filed a countersuit, claiming the media outlet “did not tell the whole truth.” The latest ruling further intensifies the conflict between the two sides over copyright, data governance, and the legality of AI training data.
The decision was issued by Magistrate Judge Ona T. Wang, who stated that while user privacy is important, it is not sufficient under the principle of proportionality in discovery to block access to relevant evidence. The court found that these 20 million samples are “clearly relevant” in determining whether ChatGPT’s outputs are similar to The New York Times content, and delivering the anonymized samples would not be an undue burden for OpenAI.
OpenAI opposed the ruling and argued in its latest filings that the order is “clearly erroneous” and “disproportionate,” as the large-scale disclosure of user conversations could increase privacy risks. In June, the court had already ordered OpenAI to preserve large amounts of user data, including chats that users had deleted, sparking further controversy.
As the transparency of AI training data faces increasing scrutiny, this lawsuit is seen as a key indicator for the future legal framework of the AI industry. Similar cases have been launched in the US and Europe, with publishers, music copyright holders, and code repository maintainers all seeking to clarify whether AI models infringe on intellectual property rights.
This ruling affects not only OpenAI but could also impact major AI companies such as Anthropic and Perplexity, having far-reaching implications for how the industry collects, uses, and discloses training data.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
OpenAI was ordered by the court to hand over 20 million ChatGPT logs as The New York Times copyright lawsuit enters a critical stage
A federal district judge in New York recently ruled that OpenAI must turn over approximately 20 million anonymized ChatGPT user logs to The New York Times and other plaintiffs, marking a significant development in the high-profile AI copyright lawsuit. The order denied OpenAI’s request to limit the scope of discovery and emphasized that these logs are crucial for assessing whether ChatGPT replicates copyrighted content.
The New York Times sued OpenAI in December 2023, accusing it of using news content to train its models without authorization. In January, OpenAI filed a countersuit, claiming the media outlet “did not tell the whole truth.” The latest ruling further intensifies the conflict between the two sides over copyright, data governance, and the legality of AI training data.
The decision was issued by Magistrate Judge Ona T. Wang, who stated that while user privacy is important, it is not sufficient under the principle of proportionality in discovery to block access to relevant evidence. The court found that these 20 million samples are “clearly relevant” in determining whether ChatGPT’s outputs are similar to The New York Times content, and delivering the anonymized samples would not be an undue burden for OpenAI.
OpenAI opposed the ruling and argued in its latest filings that the order is “clearly erroneous” and “disproportionate,” as the large-scale disclosure of user conversations could increase privacy risks. In June, the court had already ordered OpenAI to preserve large amounts of user data, including chats that users had deleted, sparking further controversy.
As the transparency of AI training data faces increasing scrutiny, this lawsuit is seen as a key indicator for the future legal framework of the AI industry. Similar cases have been launched in the US and Europe, with publishers, music copyright holders, and code repository maintainers all seeking to clarify whether AI models infringe on intellectual property rights.
This ruling affects not only OpenAI but could also impact major AI companies such as Anthropic and Perplexity, having far-reaching implications for how the industry collects, uses, and discloses training data.