DataOnTools #4: Evolution of tool specifications

When it comes to developing a bioinformatics software tool, many different languages can be used. Moreover, developers might have to choose on which operating systems among the most used they want their creation to be run. Finally, the target audience of the software (personal use, free distribution or commercial distribution) may influence the usage of the software (web interface, desktop, etc.).

 

Users can run bioinformatics software tools either on the web, locally on a desktop or server, or both. While tools that can be used on the web could be expected to be more common, reflecting the need for user-friendliness for less-skilled users, we in fact found that more than 69% of the 20,918 tools registered as software are developed as desktop applications only (Figure 1a). Similarly, we speculated that the majority of tools could be run on the mainstream operating systems (OS) Windows and Mac OS; we in fact found that more than half of the 15,736 tools with a known OS usage can be run on UNIX/Linux exclusively (Figure 1b). Moreover, 42.6% of tools are usable on more than one OS (34.2% on all three).

 

interface operating system omictools
Figure 1. (a) Proportion of tools by interface, among all tools except datasets (n=20918). (b) Proportion of tools by operating systems (n=15736).

 

Figure 2 represents the number of tools produced each year by the top-6 most used programming languages, plus Fortran and Javascript. Interestingly, while Java was the most used programming language each year from 2000 to 2010, it has now been surpassed by R, Python, and C++.

 

developing languages omictools
Figure 2. Number of tools produced each year by programming languages. Only the top 7 most-used programming languages and Fortran are represented.

 

One need, one software?

 

Overall, these results likely reflect the fact that software tools are usually developed to address a need, pipeline, or problem that is specific to their developer, and are not necessarily designed to be easy-to-use for the average biologist. Accordingly, figure 3 (interactive and available online) represents the number of tools by specific usage (packages, pipelines, application…) and shows that the majority of software tools are developed as packages/modules.

 

These data also indicate that tool development remains a field that requires specific sets of skills and highlights the importance of interactions between the biologists who use them and the bioinformaticians creating them.