Britannica & Merriam-Webster vs OpenAI: The AI Copyright War Just Got Personal

The oldest reference publishers in the English-speaking world are now fighting the most powerful AI company on Earth. On March 13, 2026, Encyclopedia Britannica and its subsidiary Merriam-Webster filed a copyright and trademark lawsuit against OpenAI in federal court in Manhattan. The case — filed just six months after the same publishers sued AI search engine Perplexity on nearly identical grounds — marks a significant escalation in what has become the defining legal conflict of the AI era.

This article is for anyone trying to understand what this lawsuit actually argues, what the legal landscape looks like right now, and what it means for the future of AI and the open web. We evaluate the case on four dimensions: the strength of the legal claims, OpenAI's defense, the broader industry context, and what could actually change. One thing is already clear: this case is harder for OpenAI to dismiss than most people realize.

Quick Verdict

The Britannica vs. OpenAI lawsuit is the strongest AI copyright case filed to date against a major LLM provider — not because the law is settled, but because the facts are unusually clean. Britannica has documented specific verbatim reproduction, attacked three distinct stages of infringement, and added trademark claims most publishers haven't tried.

Use Britannica's complaint as a reference if you want to understand how the legal strategy against AI companies is evolving — it is the most complete and precise filing yet.
Watch this case if you care about RAG liability specifically — it may be the first to force a court ruling on whether real-time retrieval-augmented generation constitutes copyright infringement.
Do not assume OpenAI's "fair use" defense is a slam dunk. Two courts have split on this issue in the past year, and the legal framework remains unstable.

What Britannica and Merriam-Webster Are Actually Claiming

Three Stages of Alleged Infringement

The lawsuit is built around two legal pillars, both mirroring the framework the same plaintiffs used when they sued Perplexity in September 2025. The first is copyright infringement under the Copyright Act of 1976, which gives authors the exclusive right to reproduce and distribute their works. Britannica argues OpenAI violated those rights at multiple stages: by scraping its websites to create training inputs, by feeding that content into its models during training, and then by generating outputs that reproduce or closely summarise the originals when users query ChatGPT on topics covered by Britannica's editorial catalogue.

That three-stage framing is significant. Most AI copyright cases focus on the training data question alone. Britannica is arguing infringement happens at input, during training, and at output — each independently actionable.

Britannica also accuses OpenAI of violating copyright laws when it generates outputs that contain full or partial verbatim reproductions of its content and when the AI lab uses its articles in ChatGPT's RAG workflow. OpenAI's RAG tool is how the LLM scans the web or other databases for newly updated information when responding to a query.

The RAG claim is the most novel part of the complaint. It argues that every time a user asks ChatGPT something and the system fetches updated information from Britannica's website in real time, that constitutes a fresh act of unauthorized copying — even after any training-data questions are resolved.

The Trademark Twist

The suit also accuses the AI giant of trademark infringement when ChatGPT hallucinates false information and attributes it to Britannica or Merriam-Webster.

This is a clever legal move. The Lanham Act claim covers a scenario that copyright alone cannot: when ChatGPT makes up an answer and presents it as if Britannica or Merriam-Webster endorsed it, the publishers argue their brand is being used to lend false credibility to fabricated content. It attacks both the accuracy and the attribution problems in one claim.

The Plagiarize Moment

The most vivid example in the complaint is almost too on-the-nose. The case is the latest in a series accusing AI firms of data theft, raising questions about what counts as public knowledge and what information online should be off-limits for AI use. In an apt example, the complaint describes a prompt asking "How does Merriam-Webster define plagiarize?" to which the model reportedly responded with a definition identical to the one found in Merriam-Webster's own copyrighted entry. A model reproducing a dictionary's verbatim definition of plagiarism, without credit or licensing, may be the most fitting example of this entire legal conflict.

Scale of the Alleged Harm

The complaint, filed in Manhattan federal court, alleges that OpenAI scraped Britannica's encyclopedia entries and Merriam-Webster dictionary definitions without authorization to teach its flagship chatbot ChatGPT to respond to human queries. The publishers allege that ChatGPT generates outputs containing full or near-verbatim reproductions of their content, diverting users who would otherwise visit their websites and depriving them of the subscription and advertising revenue that funds their content creation.

The lawsuit argues that OpenAI's use of their content could produce a positive feedback loop in which declining advertising and subscription revenue leads to lower-quality content, which in turn further reduces revenue. "Less content of poorer quality will further result in reduced revenue, and thus less spending on content creation," the complaint alleges, "spawning even less content of even poorer quality and even less revenue, and so on in a downward spiral for content creators."

OpenAI's Defense — and Why It's Shakier Than It Sounds

OpenAI's response has been consistent across every copyright lawsuit it faces. A spokesperson said the company's models "empower innovation, and are trained on publicly available data and grounded in fair use."

That defense rests on fair use — the legal doctrine that allows limited use of copyrighted material without permission, particularly when the use is transformative. OpenAI has pointed to two favorable rulings in 2025 as validation.

Two federal judges in two separate cases have independently confirmed that training AI models is highly transformative and protected by fair use. Both judges noted that each AI model's use of the copyrighted material was highly transformative, a key element of satisfying the fair use doctrine.

Those rulings — in Bartz v. Anthropic and Kadrey v. Meta — gave AI companies genuine relief. But they came with important caveats. The case is far from closed on fair use for training data and demonstrates that courts — even those sitting in the same district — can reach different conclusions when confronted with similar facts.

More critically, those rulings addressed training on lawfully obtained data. The Bartz case against Anthropic ultimately settled for $1.5 billion specifically because the court found that storing pirated copies — even if training itself was fair use — was not protected. That distinction matters here: Britannica's complaint questions whether Perplexity's and OpenAI's scraping, which allegedly bypassed robots.txt and access controls, constitutes lawful acquisition at all.

At least one court — in Thomson Reuters v. Ross Intelligence — has held that fair use might not protect use of non-licensed material to generate a directly competitive market substitute. Britannica's complaint suggests that they view Perplexity's and OpenAI's use of their content as an attempt to provide a competitive market substitute.

That is the key battleground: if a court decides ChatGPT is a direct commercial substitute for Britannica — not a transformative new product, but a replacement — the fair use defense collapses.

The Perplexity Precedent: Why Britannica Filed Twice

This is not Britannica's first such case. In September 2025, the same plaintiffs filed an essentially parallel complaint against Perplexity, the AI-powered answer engine. That complaint alleged that Perplexity's system scraped Britannica's content to build its responses in real time, bypassing robots.txt protections and presenting verbatim or near-verbatim reproductions under the guise of AI-generated summaries. The Perplexity case is still proceeding.

The decision to sue Perplexity first was strategic. Perplexity's RAG-based model makes the copying argument simpler — its system fetches content from the live web on every query, so the chain from Britannica's server to the user's screen is short and direct. With OpenAI, the training-data question is more complex, but the output reproduction and RAG claims follow the same template.

Both suits claim the AI companies infringe in three different ways: mass-scale copying of protected content to train large language models; retrieving, copying, and using content upon a user prompt; and generating infringing outputs. The substantively overlapping complaints were both filed by Susman Godfrey LLP, which also represents news organizations and authors in other AI copyright suits.

Using the same law firm and nearly identical complaints is itself a signal. Britannica is building a coordinated legal campaign, not a one-off reaction.

The Bigger Battlefield: 91 Cases and Counting

The Britannica filing did not occur in a vacuum. It is lawsuit number 91 against AI companies in the United States — more than triple the number that existed at the start of 2024.

Plaintiff	Defendant	Filed	Status (Mar 2026)
NY Times	OpenAI / Microsoft	Dec 2023	Active — discovery ongoing
Bartz et al. (authors)	Anthropic	2024	Settled — $1.5B (2025)
Kadrey et al. (authors)	Meta	2023	Partial fair use ruling; seeding claims active
Dow Jones	Perplexity	Oct 2024	Proceeding
Disney / Universal	Midjourney	Jun 2025	Active
Britannica / M-W	Perplexity	Sep 2025	Proceeding
UMG / Concord	Anthropic	Oct 2023	Active ($3.1B claimed, Jan 2026)
Britannica / M-W	OpenAI	Mar 13, 2026	Just filed

The landscape has split into two camps. Not all publishers have pursued litigation. News Corp signed a licensing deal with Meta worth up to $50 million annually in March 2026. UK publisher Reach agreed a usage-based deal with Amazon for its Nova AI model the same month.

That divergence is the most important dynamic in AI copyright right now: some publishers are suing while others are licensing. The outcome of cases like Britannica vs. OpenAI will determine which path becomes dominant.

The Surprising Finding: RAG May Be More Vulnerable Than Training Data

The conventional wisdom in AI copyright circles has focused almost entirely on the training data question — did AI companies have the right to scrape the web to build their models? Two fair use rulings in 2025 tilted that debate toward AI companies, at least for lawfully obtained content.

But Britannica's complaint shifts the attack to RAG — the live retrieval system that lets ChatGPT access current web content in real time. This is legally distinct from training. When ChatGPT fetches a Britannica article in real time to answer a user query, it is not doing something that happened once in 2021. It is doing it continuously, on every relevant query, today.

The Perplexity case may be the first to decide whether the use of copyrighted material to ground LLMs infringes copyright and whether that grounding may comprise fair use. Cases such as Associated Press v. Meltwater are often cited for the proposition that mere scraping and re-delivery of online content is not sufficiently transformative to warrant fair use protection.

If courts find that real-time RAG copying is not fair use, that ruling would affect the entire AI industry instantly — not just historical training runs, but the live systems running right now. That makes the RAG claim potentially more consequential than any training-data verdict.

The Licensing Alternative — and Why OpenAI Said No

The lawsuit comes after the plaintiffs reached out to OpenAI in November 2024 to discuss a potential licensing agreement, that OpenAI rebuffed, according to the complaint.

This detail matters legally. It undermines any argument that Britannica lacked interest in licensing. The publishers tried to negotiate. OpenAI declined. The lawsuit followed.

Licensing is already happening at scale elsewhere in the industry. News Corp signed with Meta. Disney licensed to OpenAI for video. Suno reached a deal with UMG. The pattern is clear: AI companies are willing to license visual and music content, but have been more resistant to licensing the factual reference content — encyclopedias, dictionaries, news archives — that forms the foundation of their models' knowledge.

That resistance may now be tested in court rather than at a negotiating table.

What the Courts Have Said So Far — and What Comes Next

Case	Court	Ruling on Fair Use	Result
Bartz v. Anthropic	N.D. Cal.	Training on lawfully acquired books = fair use. Storing pirated copies ≠ fair use.	$1.5B settlement
Kadrey v. Meta	N.D. Cal.	Training fair use; seeding pirated copies — still active	Partial summary judgment for Meta
Thomson Reuters v. ROSS	D. Del.	Fair use does NOT protect a direct competitive substitute	Finding for Reuters
NYT v. OpenAI	S.D.N.Y.	Still in discovery	20M ChatGPT logs ordered disclosed
Britannica v. Perplexity	S.D.N.Y.	No ruling yet	Proceeding
Britannica v. OpenAI	S.D.N.Y.	No ruling yet	Just filed

No court will decide fair use in AI training until summer 2026 at the earliest. Judges Lee and Saylor will have the next opportunity.

The Britannica case will not move quickly. Discovery alone will take months. OpenAI will file motions to dismiss. Some claims may be narrowed. But the RAG question — if it reaches a ruling — could produce the most broadly applicable legal standard in the entire AI copyright wave.

Who Should Care About This Case

If you are a publisher or content creator, this lawsuit represents the most complete attempt yet to define what AI companies owe for using your work. The three-stage infringement argument — training, retrieval, output — is a template others will follow.

If you are an AI developer or product manager, the RAG liability claim is the one to track. Training happened in the past and cannot be undone. RAG is happening right now, in production, at scale.

If you are a researcher or student relying on AI-generated answers, this case directly addresses whether the information you receive is ethically and legally sourced. Britannica also alleges that ChatGPT's hallucinations jeopardize the public's continued access to high-quality and trustworthy online information.

If you are an investor or business leader, the Anthropic settlement provides the clearest financial benchmark so far. The settlement could be "huge in shaping" current and future litigation against AI companies, according to a Syracuse University law professor who specializes in IP law. A similar settlement in this case — covering one of the world's most recognized reference publishers — would set an entirely new standard.

Conclusion

Britannica and Merriam-Webster vs. OpenAI is not just another publisher suing an AI company. It is the most carefully constructed AI copyright complaint filed to date, targeting three distinct stages of alleged infringement and adding a trademark claim that most plaintiffs have overlooked. OpenAI's fair use defense has held up in two courts so far — but neither of those cases involved real-time RAG retrieval, and neither involved a publisher that tried to license its content first and was refused.

The single most important thing to watch is not the training data fight — that legal question may take years to fully resolve. Watch the RAG ruling. If a court decides that live retrieval of copyrighted content for AI answers is not fair use, the entire architecture of AI-powered search and reference tools changes overnight.

The next major development will be OpenAI's motion to dismiss, expected in mid-2026. If the RAG claims survive that motion, this becomes one of the most consequential technology cases of the decade. If you work in AI, publishing, or information law, start reading the complaint now. Case no. 1:2026cv02097, Southern District of New York.