How to identify the right tools for biological data analysis

In the biology big data context, managing the amount and diversity of data that experiments produce is a challenging task. Depending on the scope of your research, you probably spend a lot of your time searching for the right bioinformatics tools.


Like most, you probably have a general idea of how to analyze your data and have used more than one tool. If you’re working in a computational biology laboratory, you’ve probably heard the question “What is the best software for genome sequence alignment?” or “What algorithm is the standard for sequence alignment in genetics?” While BLAST is probably the most popular tool for this, there are lots of other tools for mining, and aligning biological data.  So choosing the right solution for a given project is difficult even for the expert.


In this practical guide, you’ll get some tips on how to set up a reliable strategy using the omicX search engine to uncover the most accurate tools for interpreting your data.

1. Type in the right combination of keywords to get the top features


How it’s useful: Build simple requests to find the right volume of search results.


To make your life easier, the new front-end search engine looks at each piece of content (tool description, tool specification, title, reviews, etc) to feedback the appropriate tools. It also uses a query-term interpretation algorithm to run facets to retrieve the best results.  


With the previous search engine you had to use booleans operators and build complex requests such as “Bayes* theor*” and (medical or health) and (diagnos* or treatment) to obtain the optimal results, and you could use double quotes or negative keywords to limit your request. Now with the new omicX search engine, simply search the same way you search on Google – type a few keywords, and let the search engine do the rest. If you want to narrow down your results, just add a word or two to your search terms. If you find significantly fewer results than  you expect, just re-consider the terms you are using (MeSH ontology or other controlled vocabulary may help).


With the right answers, you’ll be in a position to make confident tools decisions – but there are more things that you can do to improve your search results – here are three of them. 


2. Use filters to broaden or narrow the results

How it’s useful: streamlining your research to ensure you find the most accurate biological answer for any topic, for free.  


Filters are a great way to discover better results on omicX. If you have missed a tool in the literature, there’s a good chance your favorite is available on the platform. You can now access software and databases in a separate list with individual filters. If there’s too much ‘noise’ around the results, you can use technology filters (operating system, programming language, interface) to find specific software coherent with your bioinformatic skills.




For databases, specific filters have been implemented to help you sort results according to taxonomy (if you work on targeted species), disease (if your work focuses on a specific disease), data access (the way you can use the data), the data management system on which the database is built (MySQL, NoSQL, PostgreSQL, etc), and the license. And on top of this, each result list is ranked by relevance.




Our methodology calculates a “tool score” based not only on the query-term proximity but also including several practical functional links, such as available documentation, maintenance, publications and reviews. And that’s not all – omicX will show you the current tools cited in the scientific literature, the contributors associated with the projects and institutions engaged in the funding (‘related users’) – so you have the complete picture of what’s going on in your area of bioinformatics.


3. Navigate to the analytical steps that are closely linked to the tools you’ve found


How it’s useful: The search results give you a complete framework or summary overview with the rights steps to perform your analyses, such as is reported by Leipzig for NGS analyses that “involve steps such as sequence alignment and genomic annotation that are both time-intensive and parameter-heavy” (in A review of bioinformatic pipeline frameworks. Briefings in Bioinformatics, 2017).


omicX has also implemented very simple technology that analyzes the step you’re focusing on to suggest which tools can lead to run a complete analysis from start to end. So with your search, you will now find a section with related steps (‘related software’), providing the most recent or the most popular tools to give you a complete chain of processes.


For example, if you work on ChIP-seq analysis, and you land on a page covering your task, you can browse throught the related steps to find tools for any step involved in the ChIP-seq workflow.


4. Make the most of input from the top scientific community


How it’s useful: Leverage the benefits of a highly-qualified community to quickly access the must-have tools.


A smart way to help you find the best answers to your biological questions is to target experts who are interested in what you have to say in field of research. The new omicX front-end search engine allows you to identify, view input from and interact with contributors who are active with omicX as a tool category expert or who are involved in tool development. In the ‘users’ tab of your results, you can filter contributors by their curent position, fields of interest or their location.




As a member, you can also add value to each data analysis step by upvoting and reviewing the tools or post new tools as they appear in the scientific literature.