DataOnTools #1: A comprehensive data analysis of bioinformatics tools

Ever wondered how many tools are released each year? What country or institute produce the most tool? Or what are the most used programming languages? To answer these burning questions, and a lot more, we are launching a new series of articles, DataOnTools, providing a wide range of data on bioinformatics tools evolution and current status. Most of these data representation will be freely accessible and reusable.


DataOnTools, everything you always wanted to know about bioinformatics tools


Just like the data they are built to analyze, bioinformatics tools are growing at an exponential pace. The origin of bioinformatics tools is well documented and can be traced back to the late 60s. However, documenting the evolution of tools in the recent years is challenging, because the information is scattered all over the internet and buried under gigaoctets of data.


omicX gathers more than 29,000 bioinformatics tools and their associated metadata, such as year of publication, country and institution of development, number of citation, usage and specifications, and more, which can be mined and analyzed to shed a historical light on software development.


Today, we are starting this new series of articles by exploring the evolution of the production of software tools from 1990 to 2017.


Evolution of the number of bioinformatics tools


Since the 2000s and the completion of the Human Genome Project, the number of tools being developed has grown exponentially, today doubling in under four years (Figure 1). Of note, the total number of tools (software and database) published in PUBMED has recently been estimated to approximately 100,000 (Callahan et al.).


Figure 1. Cumulative production of tools among 23,201 tools classified in the omicX repository


The 2000s were marked by the first publication of a complete human genome. In the past two decades, this success and the high demand for low-cost sequencing has fostered the development of “next-generation” sequencing, which not only applies to genome sequencing, but also to transcriptome profiling (RNA-sequencing), proteomics, metabolomics, and epigenomics. The number of tools dedicated to these fields has been constantly increasing (Figure 2).


bioinformatics tools evolution in omics fields
Figure 2. Evolution of the number of tools produced annually according to main omics field


Evaluating the global and specialized production of bioinformatics tools brings interesting insights on emerging trends and timeline history of a field. For example, there has been a boom in RNA-sequencing in the last 10 years (Figure 3).


Transcriptomics bioinformatics tools evolution
Figure 3. Evolution of the number of tools produced annually in the transcriptomics field by technology.


These figures and a lot more are interactives and freely accessible on omicX. More articles will come in the next few weeks, so be prepared for some more data on tools!



Callahan et al. U-Index, a dataset and an impact metric for informatics tools and databases. 2018. Nature Scientific Data.