The Exponentially Zero Valuation of IPython —

or, Why Valuing Software is so Hard.

A CPA and recovering financial engineer sit down at a bar. The CPA looks over and says…well, just kidding. We haven’t been out to a bar for a long time now, so I can’t imagine what Jill would really say.

But as a CPA and financial engineer focusing on valuing the “new economy,” we do spend a lot of time thinking and talking about how to value software, data, or machine learning models.

But if you’ve ever looked at formal guidance in accounting standards or academic literature, well, first: I’m sorry. And second, you probably know that most methods for software focus on either the direct expense (e.g., replacement cost) or conjure up a price based on the cost per line of code (e.g., COCOMO).

Valuation obviously means many things to many people in many contexts. But I think everyone can agree that there are situations where the “true” value of some unit of software is much greater.

Take, for example, the IPython project. You might know it as Python notebooks, IPython notebooks, or Jupyter notebooks. In fact, you might know it without even realizing you’re using it, embedded in something like ArcGIS, papermill, or PyCharm. Regardless of what name you know them by, they’ve been an incredible tool for education, documentation, and iteration, especially in data science workflows.

So how would you value IPython? Well, a CPA or MBA would normally start with something like this:

“How much does a license cost?”

And the answer is $0. It’s BSD-3-Clause in SPDX speak. No real commercial re-use restrictions or other obligations to enforce or waive for consideration.

The valuation agent might then ask:

“OK, but do they have another commercial model with associated cashflows?”

And the answer is nope. Not really. There are grants and donations, but no ARR or support contracts and no custom implementation sales pipeline.

OK. So we’re giving up on the usual financial metrics. You then start to go down the replacement cost or COCOMO valuation rabbit hole…

Let’s say you start by looking at the 5.0.0 release for IPython, which first came out in July 2016. You could pip install the wheel and use something like wc to check the number of lines in the installed package source files. Great — something around 60 to 70K LOC. Set a cost per line of code and you’ve got a number.

Oops. You forgot that there might be other source files for other platforms or distribution methods. And how can you fairly scale the cost per line of code — both across projects and across files or methods within a project? Back to the drawing board.

OK, so next you might look at something like “real” lines of code across all distributions and add AST-based representation of cost or complexity. That’s what we do for every package, including old Python 2.x releases, in our data platform. Maybe the results look like this:

Great — we’ve got another, more realistic number for the replacement cost of IPython. For fun, let’s just say that replacing the core ipython package would have “cost” $1.75M in 2016. I think it’s almost insulting — but it’s probably a “fair” number as many traditional valuation methods would produce.

Now, look at the figures below.

Figure 1: Number of unique packages using official IPython or Jupyter packages over time.

Figure 2: Number of unique packages using IPython and Jupyter “ecosystem” packages, as of Jan 1, 2021. Note: includes “non-official” packages or extensions.

Imagine you were an investor being shown a pitch deck for a startup with usage statistics like this. Even if the business were B2C, you’d get a valuation well above $1.75M! Then, remember that many of these “users” are themselves popular packages with many users too…

If you went to market as a pre-revenue B2B platform play with over 2,000 active customers in 2020, you might have already been a Unicorn at your Series A.

The true impact of software is exponential — both for value-creation and risk. This is even more true given the economics and use cases for open source, as we’ve seen with recent events like log4j. Even the White House and Congress agree.

And so many people, like Fernando Perez and all of the many other IPython contributors over the years, have created immense value. They may not have captured that value in a traditional economic sense through the license or sale of assets, but the value was created — both through direct and indirect, “network” use — nonetheless.

Next time you think about what something is “worth,” think about IPython. Because while the value you write down in a purchase agreement is whatever someone is willing to pay, it’s also probably much, much more than that. Only by looking at the whole picture will you see the real worth.