Look for Voluntary Open Data Practices to Follow Other Open IP Trends

By Suzanne Bell & Lee Tiedrich on February 22, 2021

A foundation of intellectual property rights (IPR) is that authors and inventors are entitled to some level of exclusivity over their works in the form of copyrights and patents to incentivize innovation; that’s written into the Constitution. However, various voluntary open innovation practices have emerged, highlighting that developers also can benefit by choosing to widely share certain intellectual property in ways that also can help foster innovation.

While there is no “one size fits all approach,” with the growth of artificial intelligence (AI), there has been a trend to similarly facilitate more voluntary data sharing. Especially considering how AI is being used to address the COVID-19 pandemic and other important needs, voluntary open access to data could have a significant impact in the immediate future. However, practices for voluntarily sharing or providing open access to data are still developing and vary widely (in part because of the state of IPR protection for data). These evolving practices create some challenges for data contributors and users alike. However, the challenges often can be overcome by carefully selecting contract terms to govern the data sharing arrangement that factor in the goals and needs of the participants and relevant legal principles.

The Trend Towards Open Innovation Broadly

The concept of openly and voluntarily sharing innovation has existed for decades, and the following are examples of some longstanding initiatives and practices.

Open Source Software: Founded in the late 1990s, this movement is centered around the belief that developers should be free to exchange, study, and modify software. The earliest open source licenses that were used are so-called copyleft licenses, under which source code is made available to users and any distributions of the original or modified licensed code carry the same copyleft license terms as the original code. Copyleft licenses sometimes are referred to as “viral” licenses. From copyleft licenses evolved permissive licenses, which allow users to relicense software (or modifications of it) under less restrictive terms. The open source movement initially faced some trepidation, arising in part from concerns about its potential impact on proprietary licensing models. Today, however, both models remain in use. Examples of business models that have been adapted to embrace open source software include software as a service- based businesses, dual-licensing models and the open-core model. Copyrights and, to some extent, patents are implicated in open source software licenses, and today there are many different versions of copyleft and permissive open source licenses.
Creative Commons: Creative Commons is a non-profit organization founded 2001 that designed a menu of licenses that creators can adopt when distributing their creative content to the public. These licenses grant permissions under copyright and similar laws (e.g., the sui generis database right in the EU) and are intended for works other than software. As with open source licenses, the Creative Commons menu includes both copyleft and more permissive licenses, providing creators with a host of options.
Standards Setting Organizations (SSOs): As telecommunications, internet and other emerging technologies continue to scale, the demand for standards that enable organizations to leverage common protocols continues to escalate. Because standards often are protected by numerous patents that are privately owned, practices have emerged for broadly licensing these patents to any entity that wants to implement the standard. In many instances, the standards are set by standards setting organizations (SSOs) that also impose requirements for the licensing of patents that are essential to the standards (i.e., standard essential patents (SEPs)). For example, many SSOs require SEPs to be licensed on fair, reasonable and non-discriminatory (e.g., FRAND) terms. As an alternative to SSOs, patent pools have formed for some technologies, and provide for the licensing of patents, and in some cases other IPR, on a bundled basis, thereby streamlining the transaction process and enabling developers to compete based on their standards-compliant products.
Patent Pledges: Some patent owners have made public declarations that they will not enforce at least some of their patent rights offensively against would-be infringers in certain situations, paving the way for other inventors, and even competitors, to practice an invention for certain purposes. Pledges typically reserve the option for owners to terminate the non-assertion promise as to any entity that asserts its patents against the owner or against a third party based on the owner’s products or services (known as defensive termination). Such pledges have been used in a variety of industries from software to hybrid and electric vehicles. Similarly, some companies are agreeing in their employment agreements not to assert their patent rights on employees’ inventions offensively without the employee’s permission.
Non-Practicing Entity Counter-Measures: In recent years, organizations have been formed to acquire patent rights for the benefit of their members as a means of keeping those patents from non-practicing entities. Examples are Rational Patent Exchange or the Allied Security Trust, which acquire patents and license the rights to their members. Similarly, members of the License on Transfer Network license patents to each other if the patents are to be transferred to a non-practicing entity.

Open Innovation as a Strategy to Address COVID

With the need for rapid advancement, open innovation practices are being utilized to help combat the COVID-19 pandemic. These initiatives are helping to facilitate access to information and intellectual property that might otherwise be out of reach of developers, and remove transaction costs that would otherwise impede innovation. A few examples include:

The Open Covid Pledge: A pledge made by companies to license their IPR (patent and/or copyright rights) on a non-exclusive, royalty-free basis for the purpose of ending the COVID-19 pandemic or minimizing its impact. The licenses last either until the World Health Organization (WHO) declares that the pandemic is over or until the beginning of the year 2023, whichever occurs first.
COVID-19 Technology Access Pool: A centralized location, sponsored by WHO, for companies to “voluntarily share COVID-19 health technology related to knowledge, intellectual property, and data,” in order to pool knowledge to address the pandemic.
AbbVie’s Kaletra Pledge: AbbVie, a pharmaceutical company, had pledged not to enforce its patent rights in Kaletra, an HIV drug that was being used as a possible treatment for COVID.

Some Key Considerations for Open Data Practices

Building upon the voluntary open innovation practices described above, many organizations, both public and private, are exploring ways to expand voluntary data sharing to help meet the ever-growing demand for high quality datasets and data for AI training other purposes. A critical step in implementing these efforts is deciding upon an appropriate data sharing framework that protects the data contributors’ rights, imposes appropriate limitations on data usage, incentivizes others to use the shared data, and otherwise aligns with the organizers’ objectives. (Of course, if personal data is involved, all applicable privacy laws have to be considered as well, besides the IPR and contractual matters covered here.) Several standard form contracts currently are used for data sharing, including copyright licenses (to the extent copyright applies to the data) (e.g., Open Covid License, Creative Commons Licenses) and data use agreements (e.g., Open Data Commons Licenses, Community Data License Agreements, Microsoft Data Use Agreements). In other situations, data is shared without contractual terms or pursuant to bespoke terms. Which contractual approach works best for a particular data sharing arrangement depends on a variety of factors, such as the goals of the arrangement, the data being shared, and how IPR laws might protect the data. Understanding the various contractual options can help data users and contributors navigate these different questions and converge on a framework that is suitable for them, as well as for the broader community.

Some Data and Databases May Not Be Protected by Copyright or Similar Laws

While Creative Commons and the Open Covid Licenses often are used in connection with data, their license terms apply only to copyrighted materials, which under certain circumstances, could be a material limitation for this approach. Copyright generally only applies to creative works (e.g., written works, images, art work, videos), not factual content. In the United States, databases themselves, separate from the data within them, may be protected by copyright, but only as to the creative or original aspects of the selection, arrangement, and coordination of the contents. Therefore, a license that grants permissions and imposes restrictions only under copyright rights may not have the force that the publisher of the data intended. For example, this could result in the licensee’s rights in non-copyrightable data defaulting to a patchwork of laws, which may have the unintended consequence of providing a licensee greater rights than the publisher intended.

Outside of the U.S., there are jurisdictional variations. For example, the EU recognizes – in addition to copyright protection for databases that meet an originality threshold – a sui generis right for database creators if they can show substantial investment in obtaining, verifying, and presenting the contents of a database, and if they have sufficient nexus to the EU. Directive 96/9/EC, of the European Parliament and of the Council of 11 March 1996 on the Legal Protection of Databases, 1996 O.J. (L. 77) 20, 25-26. The EU recently enacted a directive requiring Member States to provide for an exception that benefits commercial entities that wish to make copies of copyright-protected materials or take extracts of content from databases protected by the sui generis database right “for the purposes of text and data mining.” This exception is subject to certain conditions, however, including that the owner has not “expressly reserved” their rights “in an appropriate manner.” As a practical matter, these conditions may limit the usefulness of this exception for those hoping to conduct text and data mining activities with a view to training AI. Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on Copyright and Related Rights in the Single Digital Market and Amending Directives 96/9/EC and 2001/29/EC, 2019 O.J. (L. 130) 92, 113. Further legislative initiatives may be on the way.

Some Uses of Copyright-Protected Data and Databases May Constitute Fair Use

Data and databases that are protected under copyright, and released under a license based on copyright rights, may still be used in ways unintended by the data provider, for example, in the U.S. under the fair use exception under U.S. copyright law. Fair use is a defense to copyright infringement and is evaluated by considering these four factors: (1) the purpose and character of the use, (2) the nature of the copyrighted work, (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and (4) the effect of the use upon the potential market for or value of the copyrighted work. 17 U.S.C. § 107. A finding of fair use is more likely in instances where the use of the copied material is “transformative,” or “adds something new, with a further purpose or different character.” Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994). How and whether using data to train an AI or ML model qualifies as fair use is an open question. Circumstances that may be relevant to these considerations are the type of the data being used, how the model being trained operates, how it uses the data, and the task the model is being trained to do.

Fair use may apply to a copyright-protected database if the data within it is not copyright protected. Copying the copyright-protected aspects of a database to the extent necessary to access the non-copyright protected data may be an “intermediate use” that may tip the scales in favor of finding fair use. See Sega Enterprises Ltd. v. Accolade, 977 F.2d 1510, 1522-23, 1526 (9th Cir. 1992), as amended (Jan. 6, 1993).

Contracts and Licenses

Because of the inherent limitations and uncertainties regarding how copyright can preserve a data contributor’s intentions to restrict uses over its data, an alternative practice is to release the data under a data use agreement. In addition to the standard form data use agreements referenced above, a variety of agreements exist with terms that include prohibiting commercial uses. While these arrangements have the advantage of tailoring terms to the data publisher’s specific needs, they can become burdensome for data users to manage, especially when multiple datasets are combined, potentially forcing users to navigate a diverse set of terms and agreements to ensure compliance, or risk breach of contract.

Key Considerations

Data contributors must carefully decide how best to release their data. Likewise, data users must be equally thoughtful when deciding whether to use a specific dataset subject to the specified terms. Here are some considerations that may inform these decisions.

Data contributors who are planning to provide open access should consider:

Is the data or database copyright-protected (or protected by the EU sui generis database right?) If so, then existing models such as Creative Commons licenses may be appropriate, absent the need for special restrictions or concerns about copyright fair use exceptions.
Is it important to impose restrictions on use of the data? If so, then the rights and restrictions must be stated explicitly in terms that are unambiguous, both in order to indicate restrictions, but also to indicate permitted uses. For example, a data contributor may want to allow broad use of data internally by commercial entities including in their product development, but not allow republication of the dataset itself as part of a commercial product. These are nuances that need to be carefully considered and addressed.
What are the risks of use of the data? Given concerns about potentially unsavory uses of AI, data contributors should consider prohibiting uses that the data contributor would find objectionable.
What are the standard practices and conventions used by others in the field? A key part of providing open data is to facilitate its use and uptake by others. Terms that are too onerous or restrictive, relative to the field, may effectively prevent the community from using the published data, even if that is the intent.

Potential users of open datasets should likewise consider:

What are the terms of use or license terms provided, if any? It is important to identify what the terms of a license are, and whether they apply only to copyright protected aspects of the data and database, and if the terms form an agreement which makes them enforceable through contractual obligations.
What aspects of the data and database are likely to be protected by copyright or other intellectual property rights? If data is published with no terms of use, or if the terms only apply to copyright, it is important to ascertain whether the contents might be protected under copyright or the EU database right. Data users will need to consider the jurisdiction in which the database originates and where it is used in order to understand what laws apply.
Is it possible that fair use applies? Even if copyright-protected, fair use may permit intended uses of data and datasets. Data users, however, should tread lightly. The uncertainty and fact-specific nature of a fair use analysis may result in a finding against fair use, thus leaving users liable for infringement.
How does the data user plan to use the data? Determining whether the intended uses conflict with the scope or restrictions enumerated in the applicable terms may be dispositive as to whether a dataset should be used. When the terms are ambiguous or susceptible to interpretation, the risk tolerance of the organization will have to be taken into account in deciding whether to go forward.

Developing Standard Open Data Practices Can Have a Significant Impact

As organizations adapt to utilize other open innovation practices generally, they may come to understand the extent to which their data can be voluntarily shared without harming their businesses and the benefits they may reap from carefully-considered external contributions. It is likely that in the short term, open data practices will continue to evolve to balance the needs of data contributors and users. As communities converge more on standard terms, as has happened with other forms of voluntary intellectual property sharing, open data practices may become more commonplace and help to pave the way for greater innovation. For the time being, in the absence of broadly implemented data sharing terms, data users will have to continue to evaluate their rights to use a particular public dataset on a case by case basis.