What is 'copyright infringement' in the context of AI?

Copyright infringement occurs when someone uses copyrighted material without permission from the copyright holder. The New York Times alleges that Microsoft and OpenAI used its articles to train AI models without authorization, and that the AI's outputs sometimes reproduce or closely paraphrase Times content, thereby infringing on their intellectual property.

What is 'fair use' and why is it relevant here?

'Fair use' is a legal doctrine that allows limited use of copyrighted material without permission for purposes such as criticism, commentary, news reporting, teaching, scholarship, or research. AI companies often argue that training their models on copyrighted data falls under fair use because it's a transformative process, creating new capabilities rather than directly reproducing the original work. The lawsuit will test whether courts agree with this interpretation for AI training.

Why is The New York Times targeting Microsoft specifically?

The amended lawsuit suggests the Times is trying to establish a more direct link of responsibility. By alleging Microsoft built a 'tailor-made' supercomputer explicitly for infringement, the Times is attempting to portray Microsoft as an active enabler rather than just a passive provider of cloud services. This could strengthen their case by implicating a major corporation with deep pockets and a direct hand in the alleged infringing activities.

Image: courtesy of Ars Technica

techJune 27, 2026By Veridact EditorialUpdated Jun 27

NYT Sharpens Attack: Microsoft Accused of Building 'Tailor-Made' Supercomputer for OpenAI Copyright Infringement

The New York Times has escalated its legal battle against Microsoft and OpenAI, filing an amended complaint that specifically accuses Microsoft of constructing a custom-built supercomputer with the explicit purpose of enabling OpenAI to train its artificial intelligence models on copyrighted Times articles without permission. This shift in legal strategy moves beyond generic cloud services, aiming to establish a direct link between Microsoft's infrastructure and the alleged infringement.

Outlook

The amended lawsuit signals a more aggressive stance from The New York Times, focusing its legal arguments on Microsoft's active role in facilitating OpenAI's AI training. This refined complaint suggests the Times aims to prove intent and direct enablement rather than merely providing computing resources. Expect a vigorous defense from Microsoft and OpenAI, who have consistently denied the claims. The legal proceedings will likely involve extensive discovery, expert testimony on AI training methodologies, and detailed arguments over the definition of 'fair use' in the digital age. This could lead to a protracted court battle, potentially setting a significant precedent for how AI companies interact with copyrighted content.

Background

The New York Times first filed its lawsuit against Microsoft and OpenAI last year, alleging that their AI models were trained on millions of its copyrighted articles, leading to direct competition through AI-generated content that replicated or paraphrased Times journalism. The core issue revolves around whether the use of copyrighted material for training large language models (LLMs) constitutes 'fair use' under copyright law.

The recent amendment, filed on June 26, 2026, zeroes in on Microsoft's alleged involvement. Instead of presenting Microsoft's supercomputing systems as standard cloud services, the Times now claims these systems were 'tailor-made' to help OpenAI infringe. This implies a deliberate partnership to exploit copyrighted works. OpenAI CEO Sam Altman himself acknowledged in 2020 that Microsoft built their 'dream system' for AI work, a statement that could become a point of contention in court. Microsoft has, since 2016, publicly committed to building Azure into an 'AI supercomputer for the world,' and has announced constructing multiple AI supercomputing systems as part of its partnership with OpenAI.

The Times, in an emailed statement, clarified its position: while it acknowledges the 'power and potential of GenAI for the public and for journalism,' it maintains that using journalistic material for commercial gain demands permission and proper compensation for the original source. This is not a rejection of AI, but a demand for a framework that respects intellectual property.

Precedents

The legal battle between content creators and new technologies is a well-worn path. Historically, industries grappling with disruptive innovation, from radio to photocopiers to the internet, have faced similar questions about copyright and fair use. The music industry's struggles with Napster in the early 2000s, which redefined digital distribution and intellectual property, offer a compelling parallel. Similarly, Google's digitization of books for its Google Books project faced a decade-long legal challenge from authors and publishers, ultimately settling on fair use grounds.

What differentiates this case is the scale and nature of AI. Unlike simply making copies or facilitating peer-to-peer sharing, AI 'learns' from data, creating new outputs. This introduces a complex question: Is AI training a transformative use that falls under fair use, or is it a derivative use that requires licensing? This case could be the Napster moment for AI, defining how content is valued and licensed in an era of machine learning. Other publishers, such as Axel Springer, have opted for licensing agreements with AI companies, suggesting a path for collaboration rather than confrontation. The Times' decision to litigate, however, indicates a belief that the current use goes beyond what fair use permits and warrants judicial intervention.

The outcome of this lawsuit carries immense implications for both the future of artificial intelligence development and the economic viability of content creation. For AI companies like OpenAI and Microsoft, a ruling against them could force a fundamental rethinking of their training data acquisition strategies, potentially requiring costly licensing deals for vast datasets. This could slow innovation, increase development costs, and create significant barriers to entry for new AI players.

For news organizations and other content creators, the lawsuit represents a critical fight for control over their intellectual property in the digital age. If AI companies are allowed to use copyrighted material for training without compensation, it could severely undermine the economic models of journalism and creative industries, which are already under pressure. This could devalue original content, making it harder for creators to fund the production of high-quality material. The legal precedent set here will likely influence how AI developers approach all forms of copyrighted text, images, audio, and video, shaping the compensation structures for creators for decades to come. It's a foundational dispute over who benefits from the vast new economy AI is creating.

Scenarios

Analysis

One possible outcome is that the parties could eventually reach a settlement outside of court. Given the complexity of the legal questions and the potential for a lengthy, expensive trial, a negotiated agreement might allow both sides to gain some clarity and avoid an all-or-nothing ruling. Such a settlement could involve licensing fees, data usage agreements, or even revenue-sharing models, setting a template for future interactions between AI developers and content owners.

Another scenario involves the case proceeding through the courts, potentially reaching the Supreme Court. A definitive court ruling could either uphold the AI companies' 'fair use' arguments, granting them broad permission to use publicly available data for training, or it could affirm the Times' copyright claims, establishing a strong requirement for licensing. The latter could dramatically reshape the AI industry, forcing a shift towards curated, licensed datasets. This would likely lead to higher costs for AI development but could also create new revenue streams for content creators.

A third possibility is a mixed ruling, where some claims are upheld and others are dismissed, or where the court issues a narrow interpretation of fair use specifically for AI training. This would leave significant ambiguity, likely prompting further litigation and a patchwork of agreements across different industries.

Timeline

2016

Microsoft's AI Supercomputer Commitment

Microsoft announces its commitment to building Azure into an AI supercomputer, laying the groundwork for its future AI infrastructure investments.

2020

OpenAI CEO Remarks on Microsoft Collaboration

OpenAI CEO Sam Altman states that Microsoft was able to build their 'dream system' for AI work, highlighting the close collaboration on supercomputing infrastructure.

2023-12-27

The New York Times Files Initial Lawsuit

The New York Times files a lawsuit against Microsoft and OpenAI, alleging copyright infringement based on the use of its articles to train AI models.

2026-06-26

NYT Amends Lawsuit

The New York Times files an amended complaint, specifically accusing Microsoft of building a 'tailor-made supercomputer' to enable OpenAI's alleged copyright infringement, dropping certain claims against OpenAI and modifying others.

2028

Projected Stargate Launch

Microsoft and OpenAI are reportedly planning to launch a $100 billion 'Stargate' AI supercomputer as soon as 2028, underscoring their ongoing commitment to advanced AI infrastructure.

Frequently Asked Questions

In this context, a supercomputer refers to a highly specialized and powerful computing system designed to handle extremely complex and data-intensive tasks, far beyond what a typical server can do. For AI, these systems are crucial for training large language models (LLMs), which require processing massive amounts of data to learn patterns and generate human-like text.

Discussion

Be the first to share your thoughts.