Skip to content

Menu

LexBlog, Inc. logo
NetworkSub-MenuBrowse by SubjectBrowse by PublisherJoin the NetworkGet StartedSubscribeSupport
Contact Us
Search
Close

As Generative AI Training Lawsuits Mount, Some Providers Offer New Assurances

By Peter Cramer on July 20, 2023
Email this postTweet this postLike this postShare this post on LinkedIn

In April, we wrote about how OpenAI had eased the procedure by which ChatGPT users can opt out of their inputs being used for model training purposes (click here for that post). While neither web scraping nor the collection of user data to improve services are new concepts, AI did not attract much attention for these practices in prior years. However, with the runaway success of generative AI (“GAI”) tools like ChatGPT, customers ranging from individual consumers to large businesses are starting to take notice of GAI developers’ data supply.[1]

In 2023, we’ve seen several lawsuits against OpenAI and other GAI providers alleging unauthorized and improper use of plaintiffs’ proprietary data as GAI model training material, with claims variously based on copyright, contract, and privacy law. And lawsuits aren’t the only way that GAI providers have lately faced increased scrutiny over how and where they obtain training data to develop their GAI products. For example, Reddit recently announced a plan to begin charging for access to its API, which is generally how GAI providers import its data into their models (i.e., Reddit has decided that user posts shouldn’t be given away for free to GAI providers whose products might undermine the popularity of its platform). On top of these new hurdles, the FTC is reportedly looking into OpenAI’s collection of user data (among other issues, such as publication of false information and potentially anti-competitive practices surrounding GAI).

In light of these recent events, it is perhaps not surprising that some GAI providers have revised certain provisions in the terms and conditions for their tools, in an apparent attempt to reassure their users about how user data may – or more precisely, may not be – used. For example, Microsoft has updated its default commercial terms for its Azure OpenAI service (which provides licensed access to OpenAI’s GPT models) to explicitly state that user inputs are not used for training, and GitHub has done the same for its GAI coding tool, Copilot. OpenAI has made a similar update to its template Enterprise Agreement. Even Anthropic (provider of ChatGPT competitor Claude), the newest player on the scene whose terms assert a broad right to use user data to develop new products and services, explicitly excludes model training. Other providers may follow suit.

Although a pattern is emerging on this topic – and certainly, GAI providers’ default positions have also congealed in other areas, such as accuracy and bias disclaimers – there are plenty of other areas where default terms can vary significantly (e.g., terms surrounding ownership of outputs and IP infringement). We will see if some of these currently varied default terms begin to homogenize over time, as GAI providers compete to offer not only the best tools, but also the best terms, to business customers. [2]


[1] Note that generally, GAI models are “trained” first, and then made available to users in a relatively static form (at least until the next model is released). However, models can also be “fine-tuned” with additional data after receiving initial training. Thus, data rightsholders have two overlapping but distinct areas of concern. We use “training” to refer to both initial training and any subsequent fine-tuning.

[2] Note that even seemingly congealed terms of use may change over time, especially in an evolving space like GAI, and users should never assume they know what position a given set of terms takes on an issue – or assume terms in place today are the same tomorrow – without confirming.

 

Photo of Peter Cramer Peter Cramer

Peter Cramer is an associate in the Corporate Department and a member of the Technology, Media & Telecommunications Group.

Peter earned his J.D. from Columbia Law School in 2021, where he was honored as a James Kent Scholar and received the Michael D.

Peter Cramer is an associate in the Corporate Department and a member of the Technology, Media & Telecommunications Group.

Peter earned his J.D. from Columbia Law School in 2021, where he was honored as a James Kent Scholar and received the Michael D. Remer Memorial Prize for Excellence in Copyright and Art Law. At Columbia, Peter served as co-President of the Entertainment, Art and Sports Law Society; as coach of AIPLA, Columbia’s intellectual property moot court team; and as a staffer for the Columbia Journal of Law and the Arts.

Peter received his B.A. from Wesleyan University in 2014, where his senior thesis documentary film earned him Departmental Honors and the Best Documentary Award from the Film Studies Department. After college, he was nominated for an Emmy for his work on the documentary film American Experience: The Mine Wars. Peter was born and raised in Massachusetts.

Read more about Peter CramerEmail
Show more Show less
  • Posted in:
    Technology and AI
  • Blog:
    New Media and Technology Law Blog
  • Organization:
    Proskauer Rose LLP
  • Article: View Original Source

Call us at 1-800-913-0988 or email sales@lexblog.com.

Facebook LinkedIn Twitter RSS
  • About LexBlog
  • The Field We Built
  • Our Beliefs
  • Our Team
  • Contact LexBlog
  • Disclaimer
  • Editorial Policy
  • Terms of Service
  • Get Started
  • Publishing Solutions
  • Compass
  • Submit a Request
  • Support Center
  • System Status
Copyright © 2026, LexBlog, Inc. All Rights Reserved.
Law blog design & platform by LexBlog LexBlog Logo