Guide April 30, 2026 10 min read

AI Training and Fair Use: What the Law Actually Says in 2026

Is it legal for AI companies to train on copyrighted data? A deep dive into fair use doctrine, the Copyright Office Part 3 report, and landmark rulings from NYT v. OpenAI to Authors v. Anthropic.

AI Training and Fair Use: What the Law Actually Says in 2026

Can AI companies legally scrape the internet and use copyrighted books, articles, images, and music to train their models? This is the multi-billion dollar question at the heart of AI copyright law.

The Short Answer

It depends. The U.S. Copyright Office concluded in May 2025 that some uses of copyrighted works for AI training will qualify as fair use, and some will not. The key factor: whether the AI generates content that competes with the original works.

The Copyright Office Part 3 Report (May 2025)

The most authoritative guidance came on May 9, 2025, when the Copyright Office released Part 3 of its AI report, specifically addressing whether unauthorized use of copyrighted materials to train generative AI systems is defensible as fair use.

Key Conclusion

AI developers who use copyrighted works to train models that generate expressive content that competes with the original works are going beyond the scope of fair use.

The Four Fair Use Factors Applied to AI Training

Factor 1: Purpose and Character of Use

Training AI for research or non-commercial purposes: favors fair use
Training AI to generate competing commercial content: weighs against fair use
The Supreme Court's Andy Warhol decision (2023) raised the bar for transformative use claims

Factor 2: Nature of the Copyrighted Work

Using factual works (news, data): slightly favors fair use
Using highly creative works (novels, art, music): weighs against fair use

Factor 3: Amount Used

AI training typically uses entire works: weighs against fair use
But courts have sometimes allowed copying entire works for transformative purposes (Google Books)

Factor 4: Market Effect

If AI outputs substitute for or compete with originals: strongly against fair use
This is often the most important factor in AI cases

Major Court Rulings

NYT v. OpenAI (Active - 2026)

The New York Times alleges OpenAI used millions of its articles to train ChatGPT, creating a product that directly competes with NYT journalism. Judge has expressed skepticism about fair use defense.

Authors v. Meta (June 2025)

Judge Chhabria granted summary judgment for Meta on purchased books (fair use), but expressed strong doubts: You have companies using copyright-protected material to create a product capable of producing an infinite number of competing products.

Authors v. Anthropic (2025)

Judge Alsup ruled that Anthropic's use of purchased books for training was fair use, but using unlicensed works from The Pile dataset was NOT fair use. Anthropic offered a 1.5 billion dollar settlement.

Thomson Reuters v. Ross Intelligence (2025)

Court found that using Westlaw headnotes to train an AI legal research tool constituted copyright infringement. The AI produced output composed of pieces of the original material.

What This Means for AI Companies

1. Licensing is becoming necessary - The era of free training data may be ending

2. Opt-out mechanisms matter - Respecting robots.txt and creator preferences helps fair use arguments

3. Output similarity is key - If your AI can reproduce or closely imitate training data, you have a problem

4. Documentation helps - Showing you took steps to minimize harm supports fair use

What This Means for Creators

1. You have rights - Your copyrighted work cannot be freely used for AI training in all cases

2. Opt-out tools exist - Use robots.txt, meta tags, and formal opt-out requests

3. Class actions are viable - Multiple successful class actions show creators can fight back

4. Licensing revenue is possible - Some AI companies are now paying for training data

How to Protect Your Work from AI Training

Technical Measures

Add AI-blocking rules to robots.txt (GPTBot, CCBot, ClaudeBot, etc.)
Use meta tags: noai, noimageai
Register with opt-out programs offered by AI companies

Legal Measures

Register your copyrights (required before filing suit in the U.S.)
Document your original works and publication dates
Consider joining class action lawsuits if your work was used without permission
Send DMCA notices if AI outputs reproduce your work

Key Takeaways

1. AI training on copyrighted data is NOT automatically fair use

2. The key test: does the AI generate content competing with originals?

3. Courts are increasingly skeptical of blanket fair use claims by AI companies

4. Creators have real legal options to protect their work

5. The landscape is shifting toward licensing and compensation

This article is for informational purposes only. Last updated: April 2026

Also Read

Guide

AI Training and Fair Use: What the Law Actually Says in 2026

AI Training and Fair Use: What the Law Actually Says in 2026

The Short Answer

The Copyright Office Part 3 Report (May 2025)

Key Conclusion

The Four Fair Use Factors Applied to AI Training

Major Court Rulings

NYT v. OpenAI (Active - 2026)

Authors v. Meta (June 2025)

Authors v. Anthropic (2025)

Thomson Reuters v. Ross Intelligence (2025)

What This Means for AI Companies

What This Means for Creators

How to Protect Your Work from AI Training

Technical Measures

Legal Measures

Key Takeaways

Also Read

Related Articles

AI Copyright Due Diligence Checklist: What to Audit Before You Launch an AI Product in 2026

AI Vendor Contract Copyright Indemnity Checklist: 18 Clauses to Negotiate in 2026

AI Output Takedown Notice Template: How to Remove Infringing AI-Generated Content in 2026

AI Training Data License Agreement Checklist: 25 Clauses Creators and Companies Need in 2026

AI Copyright Compliance Checklist: 20 Questions Every Business Must Answer in 2026