Linux Foundation publishes new Community Data License Agreement

The Linux Foundation recently released a new version of its permissive Community Data License Agreement, a licensing option designed to make sharing data easier for machine learning and artificial intelligence projects. More precisely referred to as the Community Data License Agreement – Permissive, Version 2.0, the agreement has been given the formal identifier of CDLA-Permissive-2.0 and should be available in SPDX metadata as soon as Steve Winslow’s PR is merged.

Substantively, this agreement is an enhancement of the first version of the license, making the license easier to understand and removing the previous attribution requirement. Under the original CDLA-Permissive-1.0, data had to be properly attributed to its author:

3.1(c) If You Publish Data You Receive, You must preserve all credit or attribution to the Data Provider(s). Such retained credit or attribution includes any of the following to the extent they exist in Data as You have Received it: legal notices or metadata; identification of the Data Provider(s); or hyperlinks to Data to the extent it is practical to do so.

This requirement posed unforeseen problems with how some datasets might be used or distributed, and therefore led to an unintended disincentive to use the license. Datasets tend to be combined and split apart as it suits the programmer, which can make it very difficult to determine which pieces of data were from which authors. This logistical issue lead the Linux Foundation to the conclusion that removing this burden was more important than properly attributing the authors of data. The only restriction on version 2 of the license is that the text of the CDLA be included with the shared data – a very typical restriction of permissive licenses.

Other than that, the license permits users to use, share, and modify to their heart’s content!

Are you wondering how the CDLA compares to other options for licensing data? We’ll specifically cover data licensing in an upcoming post, covering how the Creative Commons, Open Knowledge Foundation, and Linux Foundation licenses compare or interact.