Provides rapid data mining on taxonomy and metabolic function across a large number of metagenome datasets. Parallel-META is useful for: (1) vector-graph-based visualization and parallel computing; (2) interaction network construction; (3) bio-marker selection; (4) diversity statistics; (5) 16S rRNA based functional prediction; (6) 16S rRNA copy number calibration; (7) and 16S rRNA extraction for shotgun sequences.
Serves for eukaryotic sequence identification and can be applied to environmental samples. EukRep enables genome recovery, genome completeness evaluation and prediction of metabolic potential. Moreover, this classifier utilizes kmer composition of assembled sequences to detect eukaryotic genome fragments prior to gene prediction. It can also notice scaffolds whose analysis would benefit from a eukaryotic gene prediction algorithm.
Identifies and defines microbes via reads analysis. MICRA employs read mapping methods to make use of the increasing number of sequenced microbial genomes. The working consists in four parts: (1) pre-processing, (2) sequence identification, (3) identification of the closest reference genome and plasmids by the core part and (4) the post-analysis. This pipeline software is available as a download version and as a web interface.
Aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. mothur builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. Extensive community-supported documentation and support are available through a MediaWiki-based wiki and a discussion forum.
A user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database. AmrPlusPlus not only increases the accessibility of resistome analysis, but also provides users with 3 integrated tools (ResistomeAnalyzer, RarefactionAnalyzer and SNPFinder) which will help to bridge the gap between the bioinformatics and the statistical analysis of metagenomics data.
Serves for the automated, reference-independent binning and visualization of metagenomic data in the form of assembled contigs or long reads. BusyBee website works about population-level resolved analyses of metagenomic data. This tool helps the user to build confidence in the individual bins while simultaneously facilitating the identification of sequence groups requiring special attention.
Houses tools for researchers to process and analyze their own functional gene sequencing data. FGP offers a pipeline where researchers can assemble a set of analysis tools to process a nucleotide sequence file, filter chimeric sequences, translate the nucleotide sequences, align, and cluster the protein sequences and additionally run the optional cluster file analysis tools. FGP allows libraries of sequence reads to be analyzed through either reference-based or unsupervised approaches after common initial processing steps. Reference-based approaches, such as the FrameBot frameshift correction and nearest neighbor tool offered by FGP, require a set of representative sequences, which can be compiled using the FunGene Repository (FGR).
Classifies metagenomic datasets. MetaMeta allows user to obtain more precise or sensitive results by providing a single default parameter. It executes and integrates results from metagenome analysis tools. The tool facilitates the execution in many computational environments using Snakemake and BioConda. It can handle multiple large samples at the same time, with options to delete intermediate files and keep only necessary ones. MetaMeta is well suited to large scale projects.
Provides a lightweight back end pipeline that supports multiple dynamically loaded plugin extensions. PluMA intends to offer a solution for the lack of standardized framework for developing, testing and integrating plugins that are heterogeneous with respect to programming language. This software can assemble pipelines where stages can be plugged in and out.
Integrates different steps for better estimation of the taxonomic assignment. MetaABC is an integrated metagenomics platform for data adjustment, binning and clustering. This method incorporates (i) two means for removing artifacts, (ii) five tools for taxonomic binning, (iii) an approach to reanalyze unassigned reads using conserved gene adjacency, and (iv) an option to control sampling biases via genome length normalization.
Performs rarefaction analysis of large count matrices, as well as estimation and visualization of diversity, richness and evenness. RTK computes estimates of ecological diversity and provides appropriate visualizations of the results. It rarefies large high count datasets quickly and returns diversity measures. The tool can be applied to state of the art microbiomics applications and scales better than presently available tools.
Allows metagenomic sequence data to be analyzed with the fast, accurate RNA-Seq abundance estimator kallisto. Metakallisto contains python scripts and offers functions that compare the output of a range of metagenomic analysis tools such as kallisto to the ground truth of the illumina 100 metagenomic dataset. Both taxa identification and abundance estimation can be performed at the exact-genome level.
Scans viral metagenomes from hundreds of next generation sequencing (NGS) samples. ViraPipe employs data parallel computation strategy. It is able to processes genomic data in partitions at many levels. This tool can avoid false mappings which occurs when the sample reads are merged before the alignment. It is based on existing tools such as BWA aligner, MegaHit de novo assembler, BLAST or HMMER3.
Allows rapid viral read identification, genus-level read partition, read normalization, de novo assembly, sequence annotation and coverage profiling. In drVM, the first two procedures and sequence annotation rely on known viral genomes as a reference database. drVM has been tested on over 300 previously published sequencing runs, to provide complete viral genome assemblies for a variety of virus types including DNA viruses, RNA viruses and retroviruses. drVM is available for free download and is also assembled as a Docker container, an Amazon machine image and a virtual machine to facilitate seamless deployment.
Identifies CITES (the Convention on International Trade in Endangered Species of Wild Fauna and Flora) -listed species using Illumina paired-end sequencing technology. CITESspeciesDetect is a pipeline composed of five linked tools. It consists in three phases: (1) preprocessing of paired-end Illumina data involving quality trimming and filtering of reads, followed by sorting by DNA barcode, (2) Operational Taxonomic Unit (OTU) clustering by barcode, and (3) taxonomy prediction and CITES identification. The web interface allows stakeholders to perform the next-generation sequencing (NGS) data analysis of their own samples.
Assembles short Illumina reads into full-length COI barcode sequences. SOAPBarcode is a sequencing pipeline that transforms raw Illumina reads into full-length COI barcode sequences. It was coupled with with the HiSeq 2000, allowing to achieve a high recovery rate and assemble full COI barcodes and, consequently, deliver reliable and taxonomically informative metabarcoding outcomes for environmental bulk samples.
Facilitates fast, accurate functional profiling of metagenomic samples. ShortBRED consists of two components: (i) a method that reduces reference proteins of interest to short, highly representative amino acid sequences (“markers”) and (ii) a search step that maps reads to these markers to quantify the relative abundance of their associated proteins. Its markers are applicable to other homology-based search tasks, can be applied to profile a wide variety of protein families of interest.
A package dedicated to the object-oriented representation and analysis of microbiome census data in R. Phyloseq supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics, all in a manner that is easy to document, share, and modify. It simplifies many of the common data management and preprocessing tasks required during analysis of phylogenetic sequencing data. The phyloseq package also provides a set of powerful analysis and graphics functions, building upon related packages available in R and Bioconductor. It includes or supports some of the most commonly-needed ecology and phylogenetic tools, including a consistent interface for calculating ecological distances and performing dimensional reduction.
Allows analysis for metagenomic studies (phylogenetic marker genes). BMPOS is effective for sequences processing, sequences clustering, alignment, taxonomic annotation, statistical analysis, and plotting of metagenomic data. It aims to help researchers handle the most used bioinformatics packages dedicated to the study of microbial ecology. The tool can be used as a starting point for every researcher interested in performing microbiome studies based on Next Generation Sequencing (NGS) data.
A workflow for processing non-overlapping reads while retaining maximal information content. IM-TORNADO is used for carrying out common microbiome analyses leveraging the information of the paired reads provided by the Illumina sequencers to relate reads belonging to the same amplicon, making the use of these non-overlapping reads accessible to a broader base of users.
Generates an all-against-all comparison dataset between the reads and the reference database and then uses these results to generate cumulative statistics from combined local and global alignment. MetaGeniE is a pipeline which has been designed for accurate, sensitive and specific detection of taxa in complex microbial samples and to address all of the above limitations with typical metagenomic analyses. It also incorporates features such as comprehensive human read filtration and scalability to search large reference databases such as the microbial Refseq database.
Provides experimental design and analysis of viral metagenomes. MetLab permits to design the experiment and provides calculations for the coverage needed. It enables researchers to carefully prepare their experiments depending on the sequencing technology used. The tool can be used to test, validate and select external analysis tools to optimize experimentations. It can also create a statistical profile from real world sequencing data.
Offers both read matching and assembly-based annotation pipelines. MetaStorm is an online metagenomic analysis server. It enables customization of reference databases and allows users to upload databases containing curated genes of interest. It provides enhanced visualization of annotation results, and permits users to explore and manipulate taxonomic and functional annotations.
A stand-alone functional analysis pipeline for analyzing whole metagenomic and metatranscriptomic sequencing data. FMAP performs alignment, gene family abundance calculations, and statistical analysis (three levels of analyses are provided: differentially-abundant genes, operons and pathways). The resulting output can be easily visualized with heatmaps and functional pathway diagrams. FMAP functional predictions are consistent with currently available functional analysis pipelines.
Allows to visualize genome bins. ICoVeR allows to curate bin assignments based on multiple binning algorithms. It was tested on the refinement disparate of genome bins automatically generated by other binning algorithms for an anaerobic digestion metagenomic dataset. The tool renders the bin refinement process faster and more replicable. It permits to capture the provenance of changes derived in the course of an exploratory task.
Simulates amplicon-based microbiome experiments and tests classification software. DECARD allows to generate realistic synthetic datasets for which there is a known source of the sequences to be used as a gold standard when evaluating microbiome analysis software. For each classification pipeline considered, the software has modules that convert the pipeline output to a common table mapping each sequence to an operational taxonomic unit (OTU) and classification.
Brings together many aspects of today’s cutting-edge genomic, metagenomic, and metatranscriptomic analysis practices to address a wide array of needs. Anvi’o is an advanced analysis and visualization platform that offers automated and human-guided characterization of microbial genomes in metagenomic assemblies, with interactive interfaces that can link ‘omics data from multiple sources into a single, intuitive display. It empowers researchers without extensive bioinformatics skills to perform and communicate in-depth analyses on large ‘omics datasets.
Detects pathogen sequences from metagenome data derived from specimen material from patients. MePIC will trim low quality data, remove sequence data from hosts (i.e. host patients) and then perform megablast search against NCBI blast database, or BWASW or BWA aln search against a single sequence specified by user. The result can be further analyzed by third party software such as MEGAN, Tablet, or GenomeJack. The use of the MePIC pipeline will promote metagenomic pathogen identification and improve the understanding of infectious diseases.
Contains several resources to help researchers working with microbial sequencing data. Microbiome Helper is an assortment of scripts to help process and automate various microbiome and metagenomic bioinformatic tools. It contains a series of scripts that help process and automate various microbiome and metagenomic bioinformatic tools, workflows or standard operating procedures (SOPs) for analyzing 16S/18S rRNA and metagenomic data, tutorials (with test data, example output, and questions for different microbiome analyses) and a virtual box image that can be used to run the workflows and tutorials with little or no configuration.
Allows users to analyze Oxford Nanopore MinION runs and to detect bacterial infections from DNA. CRuMPIT is an analysis workflow able to differentiate predicted infection species and background contamination and can be run during sequencing for real-time species detection.
Performs common tasks in metagenomic data analysis from raw read quality control to bin extraction and analysis. MetaWRAP provides a collection of modules, each being a standalone program addressing one aspect of WMG data processing or analysis, including read quality control (QC), assembly, visualization, taxonomic profiling, and binning. Users can follow the intuitive workflow or use only specific functions. Its modularity gives the investigator flexibility in their analysis approach.
Accelerates the processing of large numbers of query sequences. Fungal internal transcribed spacer (ITS) Pipeline is a package developed to obtain extended functionality helpful for complementary, in-depth analyses. It assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. This pipeline is easily modified to operate on other molecular regions and organism groups.
Provides tools and workflows dedicated to microbiota analyses with an extensive documentation. ASaiM is an Open-Source opinionated Galaxy-based framework based on four pillars: 1) easy and stable dissemination via Galaxy, Docker and conda, 2) a comprehensive set of metagenomics related tools, 3) a set of predefined and tested workflows, and 4) extensive documentation and training to help scientists in their analyses.
Facilitates management and bioinformatics analysis of metagenomics data-samples. MetaGenSense integrates the capacity for large-scale genomic analysis and technical expertise in sequencing. It helps biologists to quickly obtain analysis results from High Throughput Sequencing (HTS) sequencing projects. The tool covers data processing up to presentation of data and results in a genome browser compatible data format.
Enables reproducible metagenomics analyses. YAMP is a package that facilitates collaborative projects. It is composed of three analysis blocks: (i) the quality control, (ii) complemented by several steps of assessment and (iii) visualization of data quality and the community characterisation. It also assists researchers with limited computational experience who are approaching metagenomics field of research.
Aims to facilitate the interpretation of microbial community composition data. theseus is an R package that allows users to visualize, analyze, and interpret (microbial) community composition data, specifically those originating from amplicon sequencing. The software can also assist researchers in the selection of read trimming by quality scores and the preprocessing/denoising of datasets.
Serves as a quantitative analysis tool for real-time species identification. WIMP can be used in combination with the portable MinION sequencer and identifies bacteria, archaea, viruses and fungi. This software processes by exploiting an unprocessed sample to generate sequence data and then classifies to subspecies. It can suit for monitoring health and well-being of dairy animals and also performing quality control of dairy produce.
Covers all steps of metagenomic/metatranscriptomic investigations in an automatically way. SqueezeM integrates multi-metagenome support permitting the co-assembly of related metagenomes and the discovery of individual genomes via binning procedures. It can compare open reading frame (ORF) sequences against several taxonomic and functional databases. This tool finds possible chimeric contigs and bins.
Aims to identify changes in community composition that are related with environmental factors. SIAMCAT analyses relation between microbial communities and host phenotypes. It supports data pre-processing, statistical association testing, statistical modelling. This tool provides functions for evaluation and interpretation of statistical models, such as cross validation, parameter selection, ROC analysis and diagnostic model plots.
Serves to obtain a detailed overview at any taxonomic level of microbiomes of different origins (human, agricultural and environmental). GAIA exploits BWA to map reads and pairs from any platform against one of the three custom-made databases created via NCBI as the main source. It organizes reads and pairs according to their most specific taxonomic level by using an in-house lowest common ancestor (LCA) algorithm.
Delivers comprehensive results on the composition of microbial communities and their associated metabolic pathways, genes and genomes. Real Time Genomics’ metagenomic pipeline was developed to estimate the abundance or frequency of a particular genome (typically a bacterial species) in a complex metagenomic sample. The calculations are performed on standard SAM files after reads have been mapped to a reference genome set containing thousands of genomes, many of which are highly related at the nucleotide level.
Allows for marine metagenomics analysis. META-pipe offers preprocessing, assembly, taxonomic classification and functional analysis. To reduce the effort to develop and deploy it, it has been integrated to existing biological analysis frameworks, and compute and storage infrastructure resources. META-pipe web service provides integration with identity provider services, distributed storage, computation on a Supercomputer, Galaxy workflows, and interactive data visualizations. The Galaxy based META-pipe is a powerful analysis pipeline for metagenomic samples which is intuitive and easy to use for biologists without extensive programming competence. META-pipe is flexible, modular, and it is integrated with large-scale computer systems and identity providers needed to operate a service with a large user base.
Aims to address a major challenge facing researchers today — namely, analyzing, transferring, and storing biomedical "big data" — through the use of cloud-based resources. Nephele is a project from the National Institutes of Health (NIH) that brings together microbiome data and analysis tools in a cloud computing environment. Nephele's advanced analysis pipelines include multiple stages of data processing, many of which can be configured by modifying parameters provided in the submission form.
A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching, and classification. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities.
Permits analyses of environmental diversity combined with numerous built-in functionalities like sampling saturation curves, rank abundance plots, and others. JAGUC is useful for ecological interpretations. It enables comparisons among different runs of the same sample with different user-defined parameters or sample subsets. This tool searches for sequences with common prefixes. It can provide useful basic statistical information on the data sets.
Automates the analysis of fungal internal transcribed spacer (ITS) sequences generated either by Sanger or Next Generation Sequencing (NGS) platforms. ITScan is an architectural model that can be used with bioinformatics third-party programs. It works with sequences derived from both Sanger and NGS technologies and can process single or as many as three datasets to compare distinct biological samples. The pipeline can process single or as many as three datasets to compare distinct biological samples.
Gene fusion detection in Plants
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions are well known in case of human. But, in plants, this phenomenon is not yet explored. We are planning to discover the fusion transcripts/gene fusions in different type of plants by using RNA-Seq datasets. Further, we are planning to understand the mechanism of gene fusion formation and significance of fusions in plants.
Whole genome and transcriptome sequencing data analysis of Plants
In this era of Next Generation Sequencing (NGS), there is huge amount of sequencing data available in the public domain. Any novel finding from these available datasets is major challenge for a computational biologist. We are interested in the analysis of whole genome and transcriptome sequencing data of different plants to fetch out the useful information from those datasets, with the help of bioinformatics tools. Currently, we are planning to study the gene clusters of secondary metabolite pathways in different plants.
Development of webservers, databases and computational pipelines for plant research
Development of database is necessary to compile and share the information with scientific community. We are dedicated to develop useful databases and webserver for plant research.
Another area of interest is to develop automated pipelines and tools for the analysis of high throughput genomics data, generated by NGS technologies.
Professional & Academic Background
Staff Scientist II (May 2017- present): National Institute of Plant Genome Research (NIPGR), New Delhi, India
Postdoctoral Research Associate (2015-2017): University Of Virginia, Charlottesville, VA, USA
Research Scientist (2014-2015): Sir Ganga Ram Hospital, New Delhi, India
PhD Bioinformatics (2009-2014): Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh under Jawaharlal Nehru University (JNU), New Delhi, India
M.Sc. Life Sciences (2007-2009): Jawaharlal Nehru University (JNU), New Delhi, India
B.Sc. Biotechnology (2004-2007): Jamia Millia Islamia (JMI), New Delhi, India
Awards and Fellowships
Junior and Senior Research Fellowship (2009-2014): Council of Scientific and Industrial Research (CSIR), New Delhi, India
GATE (Graduate Aptitude Test in Engineering): Qualified in years 2008 and 2009
Scientific Contributions/ Recognitions
Associate editor: Journal of Translational Medicine.
Editorial Board Member of Journal: Theoretical Biology and Medical Modelling.
Reviewer: PloS One, BMC Genomics, BMC Bioinformatics, BMC Biology, BMC Biotechnology, Frontiers in Physiology and several other journals.
Web Resources/ Databases (Developed/ Contributed)
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer (http://www.imtech.res.in/raghava/cancertope/)
GenomeABC: A webserver for benchmarking of genome assemblers. (http://crdd.osdd.net/raghava/genomeabc/).
Genomics web portal page. (http://crdd.osdd.net/raghava/genomesrs/).
Map/Alignment module of CancerDr: Cancer Drug Resistance Database. (http://crdd.osdd.net/raghava/cancerdr/).
Short reads and contigs alignment module of PCMDB: Pancreatic cancer methylation database. (http://crdd.osdd.net/raghava/pcmdb/).
Burkholderia sp. SJ98 database. (http://crdd.osdd.net/raghava/genomesrs/burkholderia/).
Rhodococcus imtechensis RKJ300 database. (http://crdd.osdd.net/raghava/genomesrs/rkj300/).
Genotrick: A pipeline for whole genome assembly and annotation of Genomes (http://crdd.osdd.net/raghava/genomesrs/genotrick/)
Development of Debian packages in OSDDlinux: A Customized Operating System for Drug Discovery. (http://osddlinux.osdd.net/).
A Web-Based Platform for Designing Vaccines against Existing and Emerging Strains of Mycobacterium tuberculosis. (http://crdd.osdd.net/raghava/mtbveb/).