Regulation 10 min read

Copyright Office Part 3: AI Training Fair Use

The Copyright Office concludes some AI training goes beyond fair use.

Copyright Office Part 3: AI Training Is Not Always Fair Use

On May 9, 2025, the U.S. Copyright Office released the most significant government analysis of AI and copyright to date: Part 3 of its report on Copyright and Artificial Intelligence, specifically addressing whether using copyrighted works to train generative AI systems qualifies as fair use.

The Bottom Line

Some uses of copyrighted works for AI training will qualify as fair use, and some will not.

The critical factor: whether the AI is trained to generate "expressive content that competes with" the original copyrighted works.

Key Findings

1. No Blanket Fair Use for AI Training

The Copyright Office explicitly rejected the argument that all AI training is automatically fair use. This was a major blow to AI companies who had argued that training is inherently transformative.

2. The Competition Test

The report established a clear principle: AI developers who use copyrighted works to train models that generate expressive content competing with the original works go beyond fair use.

In plain English: If you train an AI on romance novels and it generates romance novels, that is likely NOT fair use. If you train an AI on medical papers to build a diagnostic tool, that is more likely fair use.

3. The Andy Warhol Factor

The Copyright Office heavily referenced the Supreme Court's 2023 decision in Andy Warhol Foundation v. Goldsmith, which raised the bar for transformative use claims. Under Warhol, even significant transformation may not qualify as fair use if the new work serves the same market purpose as the original.

The Four Factors Analysis

Factor 1: Purpose and Character

  • Favors fair use: Non-commercial research, building non-expressive tools (search engines, medical diagnostics)
  • Against fair use: Commercial AI that generates content in the same category as training data
  • Key question: Does the AI serve the same purpose as the original works?

Factor 2: Nature of the Copyrighted Work

  • Favors fair use: Training on factual works, data, public records
  • Against fair use: Training on highly creative works (fiction, art, music, poetry)
  • Key question: How creative and original are the training materials?

Factor 3: Amount and Substantiality

  • Reality: AI training typically requires copying entire works
  • Nuance: Courts have allowed full copying for transformative purposes (Google Books)
  • Key question: Was copying the entirety necessary for the purpose?

Factor 4: Market Effect (Most Important)

  • Strongly against fair use: When AI outputs substitute for or compete with originals
  • Favors fair use: When AI serves a completely different market
  • Key question: Does the AI reduce demand for the original works?

What This Means in Practice

Likely Fair Use

  • Training AI for scientific research tools
  • Building search and retrieval systems
  • Creating translation or accessibility tools
  • Training AI for non-expressive analysis (sentiment analysis, classification)

Likely NOT Fair Use

  • Training image generators on artists' portfolios
  • Training writing AI on copyrighted books to generate similar books
  • Training music AI on copyrighted songs to generate competing music
  • Training code AI on proprietary codebases

Gray Area

  • Training general-purpose AI (like ChatGPT) on diverse copyrighted content
  • Using copyrighted works for AI that generates content in different formats
  • Training on publicly available but copyrighted web content

Impact on Pending Lawsuits

This report strengthens the position of plaintiffs in several major cases:

  • NYT v. OpenAI: ChatGPT generates news-like content that competes with NYT journalism
  • Getty v. Stability AI: Stable Diffusion generates images competing with Getty's library
  • Authors v. Meta: LLaMA generates text competing with authors' books

Recommendations from the Copyright Office

The report stopped short of recommending new legislation but suggested:

  1. Transparency: AI companies should disclose what copyrighted works they use for training
  2. Licensing markets: The development of licensing frameworks for AI training data
  3. Technical measures: Respect for robots.txt and opt-out mechanisms
  4. Case-by-case analysis: Courts should evaluate each situation individually

What Creators Should Do

  1. Register your copyrights — Required before filing suit
  2. Document your works — Establish clear publication dates and ownership
  3. Use opt-out tools — Block AI crawlers, submit formal opt-out requests
  4. Monitor AI outputs — Check if AI systems reproduce your work
  5. Consider legal action — The legal landscape increasingly favors creators

What AI Companies Should Do

  1. Audit training data — Know what copyrighted works you are using
  2. Pursue licenses — Proactively license content where possible
  3. Implement guardrails — Prevent AI from reproducing training data verbatim
  4. Respect opt-outs — Honor robots.txt and creator preferences
  5. Prepare for litigation — Budget for potential licensing costs or settlements

This article is for informational purposes only and does not constitute legal advice. Last updated: April 2026