Guide 10 min read

AI Training and Fair Use: What the Law Actually Says in 2026

Is it legal for AI companies to train on copyrighted data? A deep dive into fair use doctrine, the Copyright Office Part 3 report, and landmark rulings from NYT v. OpenAI to Authors v. Anthropic.

AI Training and Fair Use: What the Law Actually Says in 2026

Can AI companies legally scrape the internet and use copyrighted books, articles, images, and music to train their models? This is the multi-billion dollar question at the heart of AI copyright law.

The Short Answer

It depends. The U.S. Copyright Office concluded in May 2025 that some uses of copyrighted works for AI training will qualify as fair use, and some will not. The key factor: whether the AI generates content that competes with the original works.

The Copyright Office Part 3 Report (May 2025)

The most authoritative guidance came on May 9, 2025, when the Copyright Office released Part 3 of its AI report, specifically addressing whether unauthorized use of copyrighted materials to train generative AI systems is defensible as fair use.

Key Conclusion

AI developers who use copyrighted works to train models that generate expressive content that competes with the original works are going beyond the scope of fair use.

The Four Fair Use Factors Applied to AI Training

Factor 1: Purpose and Character of Use

  • Training AI for research or non-commercial purposes: favors fair use
  • Training AI to generate competing commercial content: weighs against fair use
  • The Supreme Court's Andy Warhol decision (2023) raised the bar for transformative use claims

Factor 2: Nature of the Copyrighted Work

  • Using factual works (news, data): slightly favors fair use
  • Using highly creative works (novels, art, music): weighs against fair use

Factor 3: Amount Used

  • AI training typically uses entire works: weighs against fair use
  • But courts have sometimes allowed copying entire works for transformative purposes (Google Books)

Factor 4: Market Effect

  • If AI outputs substitute for or compete with originals: strongly against fair use
  • This is often the most important factor in AI cases

Major Court Rulings

NYT v. OpenAI (Active - 2026)

The New York Times alleges OpenAI used millions of its articles to train ChatGPT, creating a product that directly competes with NYT journalism. Judge has expressed skepticism about fair use defense.

Authors v. Meta (June 2025)

Judge Chhabria granted summary judgment for Meta on purchased books (fair use), but expressed strong doubts: You have companies using copyright-protected material to create a product capable of producing an infinite number of competing products.

Authors v. Anthropic (2025)

Judge Alsup ruled that Anthropic's use of purchased books for training was fair use, but using unlicensed works from The Pile dataset was NOT fair use. Anthropic offered a 1.5 billion dollar settlement.

Thomson Reuters v. Ross Intelligence (2025)

Court found that using Westlaw headnotes to train an AI legal research tool constituted copyright infringement. The AI produced output composed of pieces of the original material.

What This Means for AI Companies

  1. Licensing is becoming necessary - The era of free training data may be ending
  2. Opt-out mechanisms matter - Respecting robots.txt and creator preferences helps fair use arguments
  3. Output similarity is key - If your AI can reproduce or closely imitate training data, you have a problem
  4. Documentation helps - Showing you took steps to minimize harm supports fair use

What This Means for Creators

  1. You have rights - Your copyrighted work cannot be freely used for AI training in all cases
  2. Opt-out tools exist - Use robots.txt, meta tags, and formal opt-out requests
  3. Class actions are viable - Multiple successful class actions show creators can fight back
  4. Licensing revenue is possible - Some AI companies are now paying for training data

How to Protect Your Work from AI Training

Technical Measures

  • Add AI-blocking rules to robots.txt (GPTBot, CCBot, ClaudeBot, etc.)
  • Use meta tags: noai, noimageai
  • Register with opt-out programs offered by AI companies

Legal Measures

  • Register your copyrights (required before filing suit in the U.S.)
  • Document your original works and publication dates
  • Consider joining class action lawsuits if your work was used without permission
  • Send DMCA notices if AI outputs reproduce your work

Key Takeaways

  1. AI training on copyrighted data is NOT automatically fair use
  2. The key test: does the AI generate content competing with originals?
  3. Courts are increasingly skeptical of blanket fair use claims by AI companies
  4. Creators have real legal options to protect their work
  5. The landscape is shifting toward licensing and compensation

This article is for informational purposes only. Last updated: April 2026