OSI Releases the Open Source AI Definition: Most 'Open' AI Models Don't Qualify

On October 28, 2024, the Open Source Initiative finally releases the Open Source AI Definition 1.0. That sounds like a niche standards update. It is not. It is a market reset.

For the last two years, the word open has been doing far too much work in AI. Vendors have used it to mean open weights, open-ish access, community access, source-available access, and sometimes just “please do not ask follow-up questions.” The OSI definition cuts through that fog with a fairly simple proposition: if you want to call an AI system open source, you need the data information, the complete source code, and the model parameters. Not one out of three. Not two out of three. All three.

That matters because AI procurement is already crowded with confident language and thin substantiation. The new definition gives buyers, auditors, investors, and boards something better than vibes.

What OSI Actually Defines

The OSI’s definition is built around the same basic freedoms people expect from open source software: the freedom to use, study, modify, and share.

For AI, though, those freedoms are not meaningful unless you can actually make use of the system in a way that supports modification and review. That is why the definition insists on the preferred form to make modifications. In plain English: if a vendor hands you a model but withholds the critical ingredients needed to understand and change it, the system is not truly open in the OSI sense.

The required elements are straightforward:

Data Information: enough detail about the training data to let a skilled person build a substantially equivalent system
Code: the complete source code used to train and run the system
Parameters: the model weights or other configuration settings

That last one is the trap many vendors fall into. A lot of companies like to say they have “open source AI” because they publish weights. But weights alone are not the whole story. Open weights are not the same thing as open source AI. A model can be downloadable and still be deeply opaque.

In other words, the definition is not asking for a marketing brochure. It is asking for the thing.

Why Most “Open” Models Do Not Qualify

The market gets uncomfortable here.

OSI’s own validation work is already pointing to a fairly short list of systems that meet the bar. The models that pass include Pythia, OLMo, Amber, CrystalCoder, and T5. A few others might pass with license changes. Then there are the better-known systems that do not make it: Llama 2, Grok, Phi-2, and Mixtral.

That is not an edge case. That is the story.

If the models most people casually describe as “open” do not qualify under the actual definition of open source AI, then the industry has a terminology problem. Or more accurately, it has an open washing problem.

And the issue is not just legal pedantry. It is operational. A model that is open in name only can still create real procurement risk:

You cannot verify what data trained it
You cannot assess whether the training stack respects licensing constraints
You cannot reliably evaluate what parts are reproducible
You cannot know whether the claimed freedoms are real or conditional

That is a bad place to be if you are buying the model, investing in the company, or letting it touch regulated workflows.

The phrase “open source AI” has now been given a formal meaning. That is useful precisely because it lets us stop pretending that every “open” model means the same thing.

Why This Matters for Buyers

This is where the conversation leaves philosophy and enters diligence.

If you are evaluating a vendor, the question is no longer “Do they say open?” The question is: open what, exactly, and under what terms?

That is where AI training data compliance and licensing risk assessment stop being back-office chores and become core procurement issues. If the vendor cannot explain the provenance of the training data, the rights attached to it, and the code path that produced the model, then the “open” label is doing too much work.

For enterprise buyers, this is not an abstract licensing debate. It is a risk conversation. It affects:

purchasing decisions
indemnity terms
model governance
downstream deployment rights
board-level AI education
technology diligence in M&A and financing

If you are doing diligence on an AI product, the OSI definition gives you a sharper question set. Can the seller provide the data information? Can they provide the full code? Can they provide the parameters under terms that actually preserve the freedoms the label implies?

If the answer is “sort of,” then the answer is no.

That is especially relevant in software and AI transactions, where buyers often need to distinguish between a system that is genuinely reusable and one that is simply accessible through a license with a lot of fine print. The difference may look subtle in a demo. It is not subtle when the deal closes.

The End of Lazy Language

There is a dry irony here. The open source movement built its reputation on precision. Licenses mattered. Rights mattered. Distribution terms mattered. The AI market, by contrast, has spent much of 2024 treating the word open like a decorative garnish.

That era is getting harder to sustain.

The OSI definition does not solve every AI governance problem. It does not tell you whether a model is safe, fair, accurate, or useful in a particular workflow. It does not tell you whether a vendor is well-managed, well-capitalized, or capable of supporting customers. But it does something more basic and more important: it draws a line between real openness and branding.

That helps everyone.

It helps builders because they now know what the market means when it says open source AI. It helps buyers because they can separate genuine openness from commercial theater. It helps boards because they can ask better questions before a product claim becomes a policy commitment. And it helps compliance teams because they can finally stop debating whether an API, a model card, and a friendly license header somehow add up to open source.

They do not.

What to Do With It

If your organization is using or evaluating AI systems, this is a good time to tighten the definitions in your own playbook.

At minimum, your procurement and governance language should distinguish between:

open source AI
open weights
source-available models
closed models accessed through APIs

Those are not interchangeable categories. Treating them as interchangeable is how companies end up with surprises in diligence, licensing, or downstream deployment.

If you are building policy, this is also a useful moment to revisit your AI governance framework. A clean definition of openness makes it easier to write meaningful rules around model intake, vendor review, documentation, and board reporting. And if your team needs help mapping model claims to actual rights and risk, this is exactly the kind of work that sits inside Data Strategy & AI-Enabled Services and AI Governance & Compliance. The important part is not the label on the model. It is whether the model can survive scrutiny.

The market is going to keep saying “open” for a while. That is fine. Markets always lag standards, and standards always lag marketing. But now there is at least a reference point that says what open source AI actually means.

That is a useful development. Slightly overdue, but useful.

And for a market that has been running on enthusiasm, inference, and a lot of loose language, a little precision is a good thing.

Data Strategy