The Linux Foundation’s Community Data License Agreement

The Linux Foundation released version 2.0 of its permissive Community Data License Agreement (CDLA-Permissive-2.0), a licensing option designed to make sharing data easier for machine learning and artificial intelligence projects.

Why Data Licensing Matters

Software licensing is well-established — developers choose from MIT, Apache 2.0, GPL, and dozens of other options with decades of legal precedent. Data licensing is far less mature. Creative Commons licenses were designed for creative works, not datasets. The Open Knowledge Foundation’s Open Data Commons licenses addressed some gaps but predate the current era of large-scale ML training data.

The CDLA was created specifically for data sharing in technical contexts, recognizing that datasets have different practical requirements than code or creative works.

What Changed in Version 2.0

The CDLA-Permissive-2.0 is a significant simplification of the original CDLA-Permissive-1.0. The key change: removing the attribution requirement.

Under the original CDLA-Permissive-1.0, data had to be attributed to its source:

3.1(c) If You Publish Data You Receive, You must preserve all credit or attribution to the Data Provider(s).

This posed unforeseen problems. Datasets get combined, split, filtered, and transformed constantly during ML workflows. Tracking which individual data points came from which sources — and carrying attribution metadata through every transformation — created a logistical burden that discouraged adoption.

Version 2.0 keeps it simple: include the license text with the shared data, and you can use, share, and modify the data freely. This mirrors the approach of permissive software licenses like MIT.

CDLA in Context

The CDLA-Permissive-2.0 fills an important gap in the data licensing landscape:

Creative Commons (CC-BY, CC0): Designed for creative works. CC0 is used for some datasets but lacks provisions specific to data combination and enhancement.
Open Data Commons (ODC-By, ODbL, PDDL): Purpose-built for databases, but predates modern ML data pipelines and can be ambiguous about derived datasets.
CDLA-Permissive-2.0: Purpose-built for data sharing in AI/ML contexts, with clear terms for combining and modifying datasets.

For organizations building or distributing training data for machine learning, understanding these licensing options — and their implications for data provenance — is increasingly important as AI training data copyright questions reach courts and regulators worldwide.

If you’re evaluating licensing options for your data assets or assessing the provenance of training data in your ML pipeline, our AI & Data team can help you understand the risks and options.

Privacy & Security

Zero to a Million in Twelve Weeks: Why YC's Incentive Structure Is an Enterprise Vendor Risk Problem

Apr 12, 2026

When a startup accelerator tells founders that failing to hit a million dollars in revenue in twelve weeks is a 'skill issue,' the pressure does not just produce growth. It produces shortcuts. Enterprise buyers should pay attention.

Privacy & Security

Five Lawsuits in One Week: The Legal Fallout from the Mercor Data Breach

Apr 9, 2026

Five class action lawsuits filed against Mercor in a single week trace a direct line from a supply chain compromise through fake compliance certifications to 4 terabytes of stolen contractor data.

Privacy & Security

Delve and the 494 Fake SOC 2 Reports: What the Compliance Industry Should Learn

Apr 3, 2026

A Y Combinator-backed compliance startup allegedly fabricated 494 SOC 2 reports with auditor conclusions pre-written before clients submitted any evidence.

Want to discuss this topic?

We'll give you a straight answer — not a sales pitch.

Get in Touch

The Linux Foundation's Community Data License Agreement

The Linux Foundation’s Community Data License Agreement

Why Data Licensing Matters

What Changed in Version 2.0

CDLA in Context

Related posts

Zero to a Million in Twelve Weeks: Why YC's Incentive Structure Is an Enterprise Vendor Risk Problem

Five Lawsuits in One Week: The Legal Fallout from the Mercor Data Breach

Delve and the 494 Fake SOC 2 Reports: What the Compliance Industry Should Learn

Want to discuss this topic?