DataOnTools #5: Tools in the literature
Scientific communications and publications are the best way to promote your work – whether you are a tool developer or a bioinformatician conducting analysis. In this article, we analyze tools publications to explore a new aspect of the evolution of the field of bioinformatics.
Tool publications and citation
In the omicX database, nearly eight out of ten have been published in a peer-reviewed journal, with 41.7% of all tools cited at least once, while 36.0% have never been cited in the literature (Figure 1a). Among those with a PMID, about half of them have never been cited in the literature, and the other half has been cited at least once. This is of importance because citations are the best way for your article to be read.
However, getting more citations is not (always) related to the quality of your work, but may rather be a matter of good timing. To test this hypothesis, we analyzed the extent of tool citation in relation to the timing of development of a new technology, taking into account their publication age (Figure 1b).
A clear trend emerged as seen with the example of RNA-sequencing technology, with the first papers published in 2008; tools dedicated to analysis of RNA-sequencing data that were published in 2009 are on average significantly more cited that tools published in subsequent years (P <0.05), irrespective of the impact factor of the journal of publication (data not shown).
This trend was observed with other technologies including WGS, CHIP-seq and CLIP-seq, suggesting that tools that are the first to resolve a problem are more likely to be established as gold standards or default methods, and by consequence accumulate more citations over time.
The rise of tool pipelines
There are several ways for a bioinformatics tools publication to be cited. For example, when a new version of the tool is release, or in the material and method section of an analysis paper.
For an arguably extended period, biological data consisted of a handful of sequences to be analyzed and compared, which could be done in a few computational steps and by using a single program. However, due to their complexity and quantity, to obtain meaningful data, today’s biological datasets require multiple analysis steps that often need a series of different programs that must be run in a specific order.
To verify this, we followed the evolution of co-citations (the number of tools cited per publication) and observed an increase in the number of tools jointly cited in scientific publications over time (Figure 2). While publications in the early 2000s were citing one to five tools at most, the number of tools cited per publication has continuously increased since 2005, with 20% of publications in 2015 citing more than six tools.
These results indicate a shift in biological data complexity, now requiring the use of pipelines of tools for effective and productive analysis.