Licens.io Publishes Ground-Breaking Python Research

MICHIGAN, JULY 26, 2019Licensio, LLC has released the largest, most-detailed empirical study of the Python programming language ever published. This research, published on arXiv, the world’s foremost research preprint service, includes an analysis of nearly two-hundred thousand packages, two million releases, one-hundred thousand contributors, and two-hundred million dependency imports.

This analysis forms the foundation of Licens.io’s proprietary platform and services for risk management and valuation services like compliance and due diligence. The full abstract of the article is below:

In this research, we provide a comprehensive empirical summary of the Python Package Repository, PyPI, including both package metadata and source code covering 178,592 packages, 1,745,744 releases, 76,997 contributors, and 156,816,750 import statements. We provide counts and trends for packages, releases, dependencies, category classifications, licenses, and package imports, as well as authors, maintainers, and organizations. As one of the largest and oldest software repositories as of publication, PyPI provides insight not just into the Python ecosystem today, but also trends in software development and licensing more broadly over time. Within PyPI, we find that the growth of the repository has been robust under all measures, with a compound annual growth rate of 47% for active packages, 39% for new authors, and 61% for new import statements over the last 15 years. As with many similar social systems, we find a number of highly right-skewed distributions, including the distribution of releases per package, packages and releases per author, imports per package, and size per package and release. However, we also find that most packages are contributed by single individuals, not multiple individuals or organizations. The data, methods, and calculations herein provide an anchor for public discourse on PyPI and serve as a foundation for future research on the Python software ecosystem.

An Empirical Analysis of the Python Package Index (PyPI). Ethan Bommarito, Michael Bommarito. Dated: 2019-07-26.

To learn more about this Python research or how Licensio’s technology and team can help you solve similar problems, please contact us today.


Licensio, LLC is a privately-owned technology and consulting firm that provides risk management, valuation, and escrow solutions for software and data. Our solutions apply proprietary data, technology, training, and advisory services to manage risk and maximize enterprise value for builders, buyers, sellers, and insurers of software and data. For press inquiries, please contact press@licens.io for more information.