MICHIGAN, FEBRUARY 21, 2021 – Licensio has published the largest, most comprehensive empirical analysis of the R programming language ever conducted. This research, published on arXiv, the world’s foremost research preprint service, includes an analysis of over 25,000 packages, 150,000 releases, and 15 million files over two decades. The data collection and analysis platform behind this study are integrated into Licensio’s commercial systems, expanding coverage for solutions and services like risk management and valuation for R software projects. The full abstract of the research article is below:
In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades, providing comprehensive counts and trends for common metrics across packages, releases, authors, licenses, and other important metadata. We find that the historical growth of the ecosystem has been robust under all measures, with a compound annual growth rate of 29% for active packages, 28% for new releases, and 26% for active maintainers. As with many similar social systems, we find a number of highly right-skewed distributions with practical implications, including the distribution of releases per package, packages and releases per author or maintainer, package and maintainer dependency in-degree, and size per package and release. For example, the top five packages are imported by nearly 25% of all packages, and the top ten maintainers support packages that are imported by over half of all packages. We also highlight the dynamic nature of the ecosystem, recording both dramatic acceleration and notable deceleration in the growth of R. From a licensing perspective, we find a notable majority of packages are distributed under copyleft licensing or omit licensing information entirely. The data, methods, and calculations herein provide an anchor for public discourse and industry decisions related to R and CRAN, serving as a foundation for future research on the R software ecosystem and “data science” more broadly.
An Empirical Analysis of the R Package Ecosystem. Ethan Bommarito, Michael Bommarito. Dated: 2021-02-19.
To learn more about this research or how Licens.io’s technology, data, and team can help you solve similar problems, please contact us today.
Licensio, LLC is a privately-owned technology and consulting firm that provides risk management, valuation, and escrow solutions for software and data. Our solutions apply proprietary data, technology, training, and advisory services to manage risk and maximize enterprise value for builders, buyers, sellers, and insurers of software and data. For press inquiries, please contact email@example.com for more information.